US 2006.0068405A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0068405 A1 Diber et al. (43) Pub. Date: Mar. 30, 2006

(54) METHODS AND SYSTEMIS FOR Correspondence Address: ANNOTATING BOMOLECULAR Martin D. Moynihan SEQUENCES PRTSI, Inc. (76) Inventors: Alex Diber, Rishon-LeZion (IL); Sarah P.O. Box 16446 Pollock, Tel-Aviv (IL); Zurit Levine, Arlington, VA 22215 (US) Herzlia (IL); Sergey Nemzer, RaAnana (IL); Vladimir Grebinskiy, Highland Park, NJ (US); Brian Meloon, (21) Appl. No.: 11/043,860 Plainsboro, NJ (US); Andrew Olson, Northport, NY (US): Avi Rosenberg, Kfar Saba (IL); Ami Haviv, (22) Filed: Jan. 27, 2005 Hod-HaSharon (IL); Shaul Zevin, Mevaseret Zion (IL); Tomer Zekharia, Givataim (IL); Zipi Shaked, Tel-Aviv (IL); Moshe Olshansky, Haifa (IL); Related U.S. Application Data Ariel Farkash, Haifa (IL); Eyal Privman, Tel-Aviv (IL); Amit Novik, (60) Provisional application No. 60/539,129, filed on Jan. Beit-YeHoshua (IL); Naomi Keren, 27, 2004. Givat Shmuel (IL); Gad S. Cojocaru, Ramat-HaSharon (IL); Pinch as Akiva, Publication Classification Ramat-Gan (IL); Yossi Cohen, Surrey (GB); Ronen Shemesh, Modi In (IL); (51) Int. C. Osnat Sella-Tavor, Kfar-Kish (IL); CI2O I/68 (2006.01) Liat Mintz, East brunswick, NJ (US); G06F 9/00 (2006.01) Hanging Xie, Lambertville, NJ (US); Dvir Dahary, Tel-Aviv (IL); Erez (52) U.S. Cl...... 435/6; 702/20 Levanon, Petach-Tikva (IL); Shiri Freilich, Haifa (IL); Nili Beck, Kfar Saba (IL); Wei-Yong Zhu, Plainsboro, (57) ABSTRACT NJ (US); Alon Wasserman, New York, NY (US); Chen Chermesh, Mishmar HaShiva (IL); Idit Azar, Tel-Aviv (IL); Polypeptide sequences and polynucleotide sequences are Rotem Sorek, Rechovot (IL); Jeanne provided. Also provided are annotative information concern Bernstein, Kfar Yona (IL) ing Such sequences and uses for these sequences. Patent Application Publication Mar. 30, 2006 Sheet 1 of 50 US 2006/0068405 A1

S1GH-D

z

- Z.

N V

S. 545 - Z CN y

O O Ž- w seO M) Z.Sd CD O A. OD A. op

Z. O

O

N

ef a C9 s CD Op. s

- Patent Application Publication Mar. 30, 2006 Sheet 5 of 50 US 2006/0068405 A1 Figure 4 1 tumor 1.1 epithelial cell tumors 1.1.1 carcinoma 1.1.1.1 adenocarcinoma 1.1.1.2 lobular carcinoma 1.2 Mesenchimal cell tumors 1.2.1 sarcoma 1.2.1.1 liposarcoma 1.2.1.2. rhabdomyosarcoma 1.2.1.3 pnet 1.2.1.4 ewing sarcoma 1.3 blood tumors 1.3.1 lymphoma 1.3.2 leukemia 1.3.3 myeloma 1.4 endocrine tumors 1.4.1 pheocromocytoma 1.4.2 carcinoid Patent Application Publication Mar. 30, 2006 Sheet 6 of 50 US 2006/0068405 A1 Figure 4 2 endocrine system 2.1 adrenal 2.1.1 pheocromocytoma 2.2 pancreas 2.2.1 islets of Langerhans 2.3 neuroendocrine 2.3.1 hypothalamus 2.3.2 carcinoid 2.4 thyroid

3 vascular tissue 3.1 arteries 3.1.1 aorta 3.2 Vein Patent Application Publication Mar. 30, 2006 Sheet 7 of 50 US 2006/0068405 A1 Figure 4 4 genitourinary System 4.1 urinary system 4.1.1 bladder 4.1.2 kidney 4.2 genital system 4.2.1 women genital system 4.2.1.1 cervix 4.2.1.2 ovary 4.2.1.3 uterus 4.2. I.3.I endometrium 4.2.2 men gentile system 4.2.2.1 prostate 4.2.2.2 testis 4.2.2.2.1 epididymis

5 muscles 5.1 rhabdomyosarcoma

5.2 tongue - 5.3 bladder 5.4 heart 5.5 uterus Patent Application Publication Mar. 30, 2006 Sheet 8 of 50 US 2006/0068405 A1 Figure 4 6 Blood 6.1 peripheral blood 6.1.1 erythroid line 6.1.2 leukocyte 6.1.2.1 lymphoid system 6. I.2.1. I lymphoma 6.1.2.1.2 spleen 6. I.2.1.3 thalamus 6.2 stem cells 6.2.1 myeloid 6.2.2 myeloma 6.3 'Bone narrow 6.4 leukemia Patent Application Publication Mar. 30, 2006 Sheet 9 of 50 US 2006/0068405 A1 Figure 4 7 nerve system 7.1 CNS, central nervous system 7.1.1 brain 7.1.1.1 cerebrum 7.1.1.2 cerebellum 7.1.1.3 pitituary gland 7.1.1.4 hypothalamus 7.1.1.5 thalamus 7.1.1.6 olfactory 7.1.1.7 Hippocampus 7.1.1.8 amygdala 7.1.1.9 frontal lobe 7.1.1.10 pnet 7.2 Embryonal nerve system 7.2.1 primitive neuroectoderm 7.3 retina 7.3.1 retinobastoma Patent Application Publication Mar. 30, 2006 Sheet 10 of 50 US 2006/0068405 A1 Figure 4 8 breast 8.1 ductal breast 8.1.1 ductal carcinoma 8.2 lobular carcinoma 8.3 mammary

9 skeleton 9.1 bone 9.1.1 evving sarcoma 9.1.2 craniofacial 9.1.2.1 cavarium 9.2 connective tissue 9.2.1 trabeculae 9.2.2 cartilage

10 embryo 10.1 annion . 10.2 chorion 10.3prinitive neuroectodern 10.4 placenta Patent Application Publication Mar. 30, 2006 Sheet 11 of 50 US 2006/0068405 A1 Figure 4 11 exocrine system 11.1 pancreas 11.1.1 islets of Langerhans 11.2prostate 11.3salivary gland Patent Application Publication Mar. 30, 2006 Sheet 12 of 50 US 2006/0068405 A1 Figure 4 12 face organs 12.1 nose 12.2ear 12.2.1 cochlea 12.3eye 12.3.1 retina 12.3.1.1 retinoblastoma 12.3.2 lens 12.4mouth 12.5 tongue

13 gastrointestinal system. 13.1 mucosa 13.2stomach 13.3 intestine 13.3.1 colorectal 13.3.1.1 colon 13.4 hepatobiliary system 1341 liver 13.4.2 biliary system 13.4.2.1 gallbladder 13.5 pancreas 13.5.1 islets of Langerhans Patent Application Publication Mar. 30, 2006 Sheet 13 of 50 US 2006/0068405 A1 Figure 4 14 respiratory system 14.1 nasopharynx 14.2 lung 14.2.1 : Small cell lung carcinoma

15 skin 15.1 dernis 15.1.1 melanocyte

16 fat tissue 16.1 liposarcoma Patent Application Publication Mar. 30, 2006 Sheet 14 of 50 US 2006/0068405 A1

separe th as A s h

Figure 5

Patent Applicaication d Publicaication Mar. 30, 2006 Sheet 16 Of 5 O US 200 6/00684 05 A1

i

. Ste. bos lay oqdds seron, Xodso a Jsue stup ou O-IA lood Sea--M. Seupstup '08o to lood Pseup .. Epid

m PuoqPunod lso3.is lso O't2'sseto stop lose P?y a Puoq -S unse strop real Sop 10Pa?que po PJ to ulyae 'o Ioap - - amal "laoudtu.?zt linuru, 'suayapm seau loao 'Isoup lio3 S S &S & S. S. CRs swf/64ouárve ve ro 4 winn

Patent Application Publication Mar. 30, 2006 Sheet 18 of 50 US 2006/0068405 A1

brain Stomach placenta kidney spleen thymus ovary heart lung testis liver breast

pancreas

i muscle adenocarcinoma colon normal ...... CVS Duke A E. S. Duke B c 2 S Duke C 3 S 5. Duke D as B EE C. 93 : 3 D ... . genomic DNA Patent Application Publication Mar. 30, 2006 Sheet 19 of 50 US 2006/0068405 A1

Patent Application Publication Mar. 30, 2006 Sheet 20 of 50 US 2006/0068405 A1

eulosedOneN buoulouebouapw uood

IbuI.ION I.-9

eulobes fuAA auog

CN Ioun auog gld Jo IounL s s CD E o C (9is

O i Patent Application Publication Mar. 30, 2006 Sheet 21 of 50 US 2006/0068405 A1 Brain Stomach Placenta Kidney Spleen

Thymus i Adenocarcinoma

Colon Duke's D Patent Application Publication Mar. 30, 2006 Sheet 22 of 50 US 2006/0068405 A1

i

N go e N v was yes as a SeAe I uOSSeIdx poZIeUILION 9Alee Patent Application Publication Mar. 30, 2006 Sheet 23 of 50 US 2006/0068405 A1

A. euouple3 ful in SNA.it 9. Oue Oagppo 9. gy (ouapw) o9s)Ogil ty (ouapw) OZ500 til (ooo-)-eu ougeoo.uapi Ely Oule-uebSylpaw a oup JeuenbS gippo is euoupe) soulenbSoil 2uoupe) SoueinbS 6. i (lood-Og) ouped-uebS gy Og) euouple) souenbS, (u0)) sinouenbS fun 9; euouple) zi-90 Gil NIOSO (eluon fun El lood ugeupoiguon Bunzhi

O to a o R N tre a n w is SI3A9 I uOSSeIdx peZIIeu.ION 9Alee Patent Application Publication Mar. 30, 2006 Sheet 24 of 50 US 2006/0068405 A1

euoupe) fun SN ly 9 oupled-oaAIvyippon 9th G# (ouapi. O990 Gil it (Ouapozlso pay (ooo-eulou.Je3Ouapi Ely OuJIeuenbSylpa is oupeauenbs JON euouple3 souenbS Oy euou)120 SouebS 6: i ?oodog) ouped-uenbS 9; O) euouse) souebS: (uo) SIOuenbS fun 9 euloupe 90 gly NZOZ-9) reulon fun. Eit . lood ueu) og uon fun El

O a r in a v SeAe I uOSSeIdx peZeuION 9Alee

Patent Application Publication Mar. 30, 2006 Sheet 26 of 50 US 2006/0068405 A1

SuOX?Þ6suoxa6ºl 0-eçISQJnãIJ

Patent Application Publication Mar. 30, 2006 Sheet 28 of 50 US 2006/0068405 A1

sau?od9:VINCI—OT9TOOVVT9z?G?G800LN6O–––VILLOLOOWOO>?<ÐVLOÐOO?\,\! nu?OdI:LSE—0989LOIV–––VALLOLOOWOO>O<ÐVLOÐOOÐVV aufodI:LSE--OT9I00\!\,–––VILLOLOOVION>O<ÐVLOÐOOÐVV

StìSU3SUOO—V?LILOLOOVZOO»O<5)VILOÐOOÐVIVI

I%99O#Z%L9?ÇOIUO?ISOd18?NS

q9I31AI

Patent Application Publication Mar. 30, 2006 Sheet 31 of 50 US 2006/0068405 A1

- - - Patent Application Publication Mar. 30, 2006 Sheet 32 of 50 US 2006/0068405 A1

ÉO 6 OO V t t O L

R SR Patent Application Publication Mar. 30, 2006 Sheet 33 of 50 US 2006/0068405 A1 Fig. 18b (GCSF T2) -NO UNIQUE AGTCGTGGCCCCAGGTAATTTCCTCCCAGGCCTCCATGGGGTTATGTATAAAGGCCCCCC tagagctggg.ccccaaaacagc.ccggagcctgcagcccagccccacccaga ccc.gct gga CCtgccacccagagcCC catgaagctgatggccCtgcagctgctgctgtggCaCagt gcactctggacagtgcaggaagccaccoccotggg.ccctg.ccagctcc.ctg.ccc.cagag C tt CCtgCt CaagtgCtta gag Caagtgagga agatcCaggg CQatgg.cg Cag CQCtcCag gaga agctggcaggctgcttgagcca acticcatagcggCCttitt CCtctaccaggggctC ctgcagg.ccctggaagggatctoccc.cgagttggg toccaccttggacacactgcagctg gacgt. CCC gaCtttgCCaCCaCCatctgg Cag CagatggaagaaCtgggaatgg CCCCt gcc.ctgcagoccaccoagggtgccatgccggcct tcgcctctgcttitccag cqccgggca ggaggggtCCtggttgcCtccCatctgcagagct tcctggaggtgtcgtaccg.cgttcta cgccaccttgcccagccc. is gccaagccct coccatcccatgitatttatctotatttaa tatttatgtctatttaa.gc.ctcatatttaaagacagggaagagcagaacggagcccCagg CCtctgttgtcCttCCCtgcatttctgagttt Cattct CCtgCCtgtag Cagtgagaaaaa gctCctgtcCtccCatcCCCtggactgggagg tagataggtaaataccalagtatttatta Ctatgact gCtcCCCagCCCtggctctgcaatgggCaCtgggatgagcCgCtgtgagcCC CtggtCCtgagggtCCCCaC Ctggga CCCttgag agitat Cagg totCCCaCgtgggaga C aagaaatccctgtttaatatttaa acagcagtgttccccatctggg to CttgcaccCotC actCtggCCt Cagcc.gaCtgca Cag CQg CCCCtgCat CCCCttggCtgttgagg CCCCtgg a Caag Cagaggtggccagagctgggagg catggcCctgggg.tcCCaC gaatttgctgggg aatctogtttittcttcttaag acttittgggacatggtttgact.ccc.gaacat caccgacg tgttct CCtgtttitt Ctgggtgg CCtcggga Cacct gCCCtgCCCCCaC gagggtcaggaC tgttgactictttittagggccagg caggtgcctgga catttgccttgctggacggggactgg ggatgttgg gagggagcaga Caggagga at CatgtcaggCCtgttgttgttgaaaggalag CitcC actgtcaccctocacct cittcaccc.cccacticaccagtgtc.ccct coactgtcacattgt aactgaactticaggataataaagtgtttgcct coaaaaacgtoc Patent Application Publication Mar. 30, 2006 Sheet 34 of 50 US 2006/0068405 A1

Fig. 18c (GCSF T 2)-NO UNIQUE

MAGPATQSPMKLMALOLLLWHSALWTVQEATPLGPASSLPOSFLLKCLEOVRKIQGDGAA LOEKLAGCLSOLHSGLFLYOGLLQALEGISPELGPTLDTLOLDVADFATTIWQOMEELGM APALOPTOGAMPAFASAFORRAGGVLVASHLOSFLEVSYRVLRHLAOP Patent Application Publication Mar. 30, 2006 Sheet 35 of 50 US 2006/0068405 A1 Fig. 18d humgcsf. p3.pfs Sequence name: /dir/tp/CGC/DATA/analysis db/sw. fasta: CSF3 HUMAN Sequence documentation : Granulocyte colony-stimulating factor precursor (G-CSF) (Pluripoietin) (Filgrastim) (Lenograstim). Homo sapiens (Human). P09919; Alignment of: HUMGCSF P3 x CSF3 HUMAN 1 MAGPATOSPMKLMALOLLLWHSALWTVOEATPLGPASSLPOSFLLKCLEO 50 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | l MAGPATOSPMKLMALOLLLWHSALWTVOEATPLGPASSLPOSFLLKCLEO 50 51 WRKIQGDGAALQEK...... 64 51 VRKIOGDGAALOEKLVSECATYKLCHPEELVLLGHSLGIPWAPLSSCPSO 100 65 . . . LAGCLSQLHSGLFLYOGLLQALEGISPELGPTLDTLOLDVADFATTI 111 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 101 ALOLAGCLSOLHSGLFLYOGLLOALEGISPELGPTLDTLOLDVADFATTI 150 112 WOQMEELGMAPALQPTOGAMPAFASAFQRRAGGVLVASHLQSFLEVSYRV 161 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 151 WOOMEELGMAPALOPTOGAMPAFASAFORRAGGVLVASHLOSFLEVSYRV 200 162 LRHLAOP 168

201 LRHLAOP 207 Patent Application Publication Mar. 30, 2006 Sheet 36 of 50 US 2006/0068405 A1

(ZL)que?ueaAou ºu?uoddnsSLST

[3AON s?duosueuL Patent Application Publication Mar. 30, 2006 Sheet 37 of 50 US 2006/0068405 A1

s s Patent Application Publication Mar. 30, 2006 Sheet 38 of 50 US 2006/0068405 A1

Fig. 19b (IL7 T3) -no unique aag acqaa tagtttgatt tattagccaatt Cagataaatgtgcacgtggaagt Catagitt aaatattatcgtoagtttccacgtoctg.cgitttaatttggggtttgattittcCaaataca acact taccagattagg toggaccoacaggattatttittoctitgaggit ct cacCtgagCag gtgcatgta Cag cagacggag cagaaagag actgattagagaggttggagtgg tagaggg cgtgaccctottaatcattottcact tcct tttittaaaagacg acttgg catcgtocacc a catcc.gcggcaacgcct CCttggtgtcgt.ccgct tccaataa CC cagottgcgt.cCtgc acacttgtggctitccgtgcacacattaacaacticatggttctagotcccagtc.gc.caa.gc gttgccaaggcgttgagagat Catctgggaagtc.ttitta CC Caga attgCtttgatticag gccagctggitttitt cotgcggtgatt.cggaaatticgcga attcCtctggtCct CatCcag gtgcgcgggaag caggtgcccaggaga gaggggataatgaagatticcatgctgatgat CC caaagattgaacctgcagaccaag.cgcaaagtagaaactgaaagta CactgctggCggat. cctacggaagttatggaaaaggcaaag.cgcagagccacgcc.gtagtgtgtgcc.gcCCCCC ttgggatggatgaaactgcagtcgcgg.cgtggg talagaggaaccagotgcagagat CaCC ctg.cccaacacagacticggcaactcc.gcggaagaccagggtoctgggagtgactatgggc ggtgaga.gcttgcticcitgcticcagttgcgg toatcatgactacgc.ccgcct CCCg CagaC c ttccatgtttcttittaggtata totttggact tcctcccctgatccttgttctgtt gCC agtag CatCatctgattgttgatatt galagg taalagatggcaaacaatatgaga.gtgt tCtaatggit cag Catcgat Caattattgga Cag catgaaagaa attgg tag CaattgCCt gaataatga attta acttitt.ttaaaaga Catatctgttgatgctaataaggittaaaggaag aaaac Cag CtgCCCtgggtgaag CCCaaCCaacaaag agtttggalaga aaataa at Cttt aaaggala Cagaaaaaactgaatga Cttgttgttt CCtaaagaga Ct atta Calagagataa a aacttgttggaataaaattittgatgggcactaaagaacacecraaaatatggagtggcaa tatagaaacacgaactittagctgcatccticcaagaatctatotgctitatgcagtttitt.ca gag toggaatgct tcCtagaagttactgaatgCaCCatggtCaaaacggattaggg cattt gagaaatgcatattgtattact agaagatgaatacaaacaatggaaactgaatgCtcCag toaacaaactatttcttatatatgtgaacatttatcaatcagtata attctgtactgatt tttgtaagacaatccatgtaaggitatcagttgcaataatacttctoa aacctgtttaaat atttcaaga cattaaatctatgaagtatataatggitttcaaagattcaaaattgacattg citt tactgtcaaaataattittatggctoactatgaatctattatact.gitattaagagtga aaattgtcttcttctgtgctggagatgttttagagittaacaatgatatatggataatgcc ggtgagaataaga gag toataa acct taagtaagcaa.cagoataacaaggtocaagataC Ctaaaagagatttcaa.gagattta attaatcatgaatgttgtaaCaCagtgcCttcaataa atggtatagoaaatgttittgacatgaaaaaaggacaatttcaaaaaaataaaataaaata aaaataaatticaccitagt ctaaggatgctaa acct tag tactgagtta Cattgtcattta tatagattata acttgtctaaataagtttgcaatttgggagatatatttittaagataata atatatgtttaccttittaattaatgaaatatotg tatttaattittga cactatatctgta tataaaatatttitcatacagcattacaaattgct tactittggaatacatttct cotttga taaaataaatgagctatgt Patent Application Publication Mar. 30, 2006 Sheet 39 of 50 US 2006/0068405 A1 Fig. 19c (IL7 T3) -no unique

MFHVSFRYIFGL PPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCL NNEFNFFKRHICDANKVKGRKPAALGEAOPTKSLEENKSLKEQKKLNDLCFLKRLLQEIK TCWNKILMGTKEH Patent Application Publication Mar. 30, 2006 Sheet 40 of 50 US 2006/0068405 A1 Fig. 19d humil 7a p3.pfs

Sequence name: /dir/tp/CGC/DATA/analysis db/sw. fasta: IL7 HUMAN Sequence documentation : Interleukin-7 precursor (IL-7). Homo sapiens (Human). Pl3232; Alignment of : HUMIL7 A P3 x IL7 HUMAN 1 MFHVSFRYIFGLPPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLD 50 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1 MFHVSFRYIFGLPPLILVLLPVASSDCDIEGKDGKOYESVLMVSIDOLLD 50

51. SMKEIGSNCLNNEFNFFKRHICDANK...... 76 | | | | | | | | | | | | | | | | | | | | | | 51. SMKEIGSNCLNNEFNFFKRHICDANKEGMFLFRAARKLROFLKMNSTGDF 100

77 ...... VKGRKPAALGEAQPTKSLEENKSLKEQKKL. 106 | | | | | | | | | | | | | | | | | | | | | | | | 101 DLHLLKVSEGTTILLNCTGOVKGRKPAALGEAOPTKSLEENKSLKEQKKL 150

107 NDLCFLKRLLOEIKTCWNKILMGTKEH 33 | | | | | | | | | | | | | | | | | | | | | | | | | | 151 NDLCFLKRLEOEIKTCWNKILMGTKEH 177 Patent Application Publication Mar. 30, 2006 Sheet 41 of 50 US 2006/0068405 A1

96I31A

3u?uoddnsSLSg (€L)que?JeAAQu

[3AON Syd?IosueuL Patent Application Publication Mar. 30, 2006 Sheet 42 of 50 US 2006/0068405 A1

s s

Patent Application Publication Mar. 30, 2006 Sheet 44 of 50 US 2006/0068405 A1 Fig. 20c (vegf t 4) MSPLLRRLLLAALLQLAPAQAPVSOPDAPGHORKVVSWIDVYTRATCOPREVVVPLTVEL MGTVAKQLVPSCVTVQRCGGCCPDDGLECVPTGOHOVRMOILMIRYPSSOLGEMSLEE HS OCECRPKKKDSAVKPDRCRKLRR Patent Application Publication Mar. 30, 2006 Sheet 45 of 50 US 2006/0068405 A1

Fig. 20d t08411 p5. pfs Sequence name: /dir/tp/CGC/DATA/analysis db/sw.fasta: VEGB. HUMAN Sequence documentation: Vascular endothelial growth factor B precursor (VEGF-B) (VEGF related factor) (VRF). Homo sapiens (Human). P49765; Q16528; Alignment of: T084ll P5 x VEGB. HUMAN 1 MSPLLRRLLLAALLOLAPAQAPVSOPDAPGHORKVVSWIDVYTRATCOPR 50 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1 MSPLLRRLLLAALLOLAPAQAPWSOPDAPGHORKVVSWIDWYTRATCOPR 50 51 EVVVPLTWELMGTVAKOLVPSCVTVORCGGCCPDDGLECVPTGOHOVRMO 100 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 51 EVVVPLTWELMGTVAKOLVPSCVTVORCGGCCPDDGLECVPTGOHOVRMO 1 OO lOl ILMIRYPSSQLGEMSLEEHSOCECRPKKKDSAVKPDRCRKLRR 43 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 101 ILMIRYPSSOLGEMSLEEHSOCECRPKKKDSAVKPDR ...... 137 Patent Application Publication Mar. 30, 2006 Sheet 46 of 50 US 2006/0068405 A1

3u?uoddnsLR que?IBAAA9U9?, Patent Application Publication Mar. 30, 2006 Sheet 47 of 50 US 2006/0068405 A1

ŠSS

Ry WNS SNSSW SSANSW NWS

US 2006/0068405 A1 Mar. 30, 2006

METHODS AND SYSTEMIS FOR ANNOTATING 0010. To generate meaningful amounts of data, however, BIOMOLECULAR SEQUENCES high throughput template preparation, sequencing and analysis protocols must be applied. As such, the number of RELATIONSHIP TO EXISTING APPLICATIONS new genes identified as well as the statistical significance of 0001. The present application claims priority from U.S the data is proportional to the number of clones sequenced Provisional Patent Application No. 60/539,129 filed Jan. 27, as well as the complexity of the tissue being analyzed 2004, the contents of which are hereby incorporated by Adams et al. (1995) Nature 377:173-173; Hillier et al. reference. (1996) Genome Res. 6:807-828). 0011 Subtractive cloning Subtractive cloning offers an FIELD AND BACKGROUND OF THE inexpensive and flexible alternative to EST sequencing and INVENTION cDNA array hybridization. In this approach, double stranded cDNA is created from the two-cell or tissue popu 0002 The present invention relates, to systems and meth lations of interest, linkers are ligated to the ends of the cDNA ods useful for annotating biomolecular sequences. More fragments and the cDNA pools are then amplified by PCR. particularly, the present invention relates to computational The cDNA pool from which unique clones are desired is approaches, which enable systemic characterization of bio designated the “tester, and the cDNA pool that is used to molecular sequences and identification of differentially Subtract away shared-sequences is designated the “driver. expressed biomolecular sequences such as sequences asso Following initial PCR amplification, the linkers are removed ciated with a pathology. from both cDNA pools and unique linkers are ligated to the 0003. In the post-genomic era, data analysis rather than tester sample. The tester is then hybridized to a vast excess data collection presents the biggest challenge to biologists. of driver DNA and sequences that are unique to the tester Efforts to ascribe biological meaning to genomic data, cDNA pool are amplified by PCR. whether by identification of function, structure or expression pattern are lagging behind sequencing efforts Boguski MS 0012. The primary limitation of subtractive methods is that they are not, always comprehensive. The cDNAs iden (1999) Science 286:453-455). tified are typically those, which differ significantly in expres 0004. It is well recognized that elucidation of spatial and sion level between cell-populations and Subtle quantitative temporal patterns of gene expression in healthy and diseased differences are often missed. In addition each experiment is states may contribute immensely to further understanding of a pair wise comparison and since subtractions are based on disease mechanisms. a series of sensitive biochemical reactions it is difficult to 0005 Therefore, any observational method that can rap directly compare a series of RNA samples. idly, accurately and economically observe and measure the 0013 Differential display—Differential display is pattern of expression of selected individual genes or of another PCR-based differential cloning method Liang and whole genomes is of great value to Scientists. Pardee (1992) Science 257:967-70; Welsh et al. (1992) 0006. In recent years, a variety of techniques have been Nucleic Acids Res. 20:4965-70). In classical differential developed to analyze differential gene expression. However, display, reverse transcription is primed with either oligo-dT current observation and measurement methods are inaccu or an arbitrary primer. Thereafter an arbitrary primer is used rate, time consuming, labor intensive or expensive, often in conjunction with the reverse transcription primer to times requiring complex molecular and biochemical analy amplify cloNA fragments and the cDNA fragments are sis of numerous gene sequences. separated on a polyacrylamide gel. Differences in gene expression are visualized by the presence or absence of 0007 For example, observation methods for individual bands on the gel and quantitative differences in gene expres mRNA or cDNA molecules such as Northern blot analysis, sion are identified by differences in the intensity of bands. RNase protection, or selective hybridization to arrayed Adaptation of differential- display methods for fluorescent cDNA libraries see Sambrook et al. (1989) Molecular DNA sequencing machines has enhanced the ability to cloning. A laboratory manual, Cold Spring Harbor press, quantify differences in gene expression Kato (1995) N.Y. depend on specific hybridization of a single oligo Nucleic Acids Res. 18:3685-90). nucleotide probe complementary to the known sequence of an individual molecule. Since a single human cell is esti 0014) A limitation of the classical differential display mated to express 10,000-30,000 genes Liang et al. (1992) approach is that false positive results are often generated Science 257:967-971), single probe methods to identify all during PCR or in the process of cloning the differentially sequences in a complex sample are ineffective and laborious. expressed PCR products. Although a variety of methods have been developed to discriminate true from false posi 0008. Other approaches for high throughput analysis of tives, these typically rely on the availability of relatively differential gene expression are summarized infra. large amounts of RNA. 0009 EST sequencing The basic idea is to create cDNA 0015 Serial analysis of gene expression (SAGE)—this libraries from tissues of interest, pick clones randomly from these libraries and then perform a single sequencing reaction DNA sequence based method is essentially an accelerated from a large number of clones. Each sequencing reaction version of EST seqencing Valculescu et al. (1995) Sci generates 300 base pairs or so of sequence that represents a 270:484-8). In this method a digestible unique sequence tag unique sequence tag: for a particular transcript. An EST of 13 or more bases is generated for each transcript in the sequencing project is technically simple to execute since it cell or tissue of interest, thereby generating a SAGE library. requires only a cDNA library, automated DNA sequencing 0016 Sequencing each SAGE library creates transcript capabilities and Standard bioinformatics protocols. profiles. Since each sequencing reaction yields information US 2006/0068405 A1 Mar. 30, 2006

for twenty or more genes, it is possible to generate data comprising a database stored in a retrievable manner, the points for tens of thousands of transcripts in modest database including biomolecular sequence information as sequencing efforts. The relative abundance of each gene is set forth in files “Transcripts.gz’, and/or “Proteins.gz' of determined by counting or clustering sequence tags. The enclosed CD-ROM4, and biomolecular sequence annota advantages of SAGE over many other methods include the tions, as set forth in file "Annotations.gz' of enclosed high throughput that can be achieved and the ability to CD-ROM4. accumulate and compare SAGE tag data from a variety of 0025. According to another aspect of the present inven samples, however the technical difficulties concerning the tion there is provided a method of comparing an expression generation of good SAGE libraries and data analysis are level of a gene of interest in at least two types of tissues, the significant. method comprising: (a) obtaining a contig representing the 0017 Altogether, it is clear from the above that labora gene of interest, the contig being assembled from a plurality tory bench approaches are ineffective, time consuming, of expressed sequences; and (b) comparing a number of-the expensive and often times inaccurate in handling and pro plurality of expressed sequences corresponding to the contig cessing the vast amount of genomic information which is which are expressed in each of the at least two tissue types, now available. to thereby compare the expression level of the gene of 0018. It is appreciated, that much of the analysis can be interest in the at least two tissue types. effected by developing computational algorithms, which can 0026. According to further features in preferred embodi be applied to mining data from existing databases, thereby ments of the invention described below, the method further retrieving and integrating valuable biological information. comprises computationally aligning sequences expressed in 0019. To date, there are more than a hundred major each of the at least two types of tissue with the contig to biomolecule databases and application servers on the Inter thereby identify the expressed sequences corresponding to net and new sites are being introduced at an ever-increasing the contig prior to (b). rates Ashbumer and Goodman (1997) Curr. Opin. Genet. 0027 According to yet another aspect of the present Dev. 7:750-756; Karp (1998) Trends Biochem. Sci. 23:114 invention there is provided a method of comparing an 116). expression level of at least two splice variants of a gene of 0020. However, these databases are organized in interest in a tissue, the method comprising: (a) obtaining a extremely heterogeneous formats. These reflect the inherent contig having exonal sequences of the at least two splice complexity of biological data, ranging from-plain-text variants of the gene of interest, the contig being assembled nucleic acid and protein sequences, through the three dimen from a plurality of expressed sequences; (b) identifying at sional structures of therapeutic drugs and macromolecules least one contig sequence region unique to one of the at least and high resolution images of cells and tissues, to microar two splice variants of the gene of interest; and (c) comparing ray-chip outputs. Moreover data structures are constantly a number of the plurality of expressed sequences in the evolving to reflect new research and technology develop tissue having the at least one contig sequence region with a ment. number of the plurality of expressed sequences not-having 0021. The heterogeneous and dynamic nature of these the at least one contig sequence region, to thereby compare biological databases present major obstacles in mining data the expression level of the at least two splice variants of the relevant to specific biological queries. Clearly, simple gene of interest in the tissue. retrieval of data is not sufficient for data mining; efficient data retrieval requires flexible data manipulation and Sophis 0028. According to still further features in the described ticated data integration. Efficient data retrieval requires the preferred embodiments the plurality of expressed sequences use of complex queries across multiple heterogeneous data present complete exonal coverage of the gene of interest. Sources; data warehousing by merging data derived from 0029. According to still further features in the described multiple public sources and local (i.e., private) sources; and preferred embodiments the plurality of expressed sequences multiple data-analysis procedures that require feeding Sub present partial exonal coverage of the gene of interest. sets of data derived from different sources into various application programs for gene finding, protein-structure 0030. According to still further features in the described prediction, functional domain or motif identification, phy preferred embodiments the obtaining the contig is effected logenetic tree construction, graphic presentation and so by a sequence assembly software. forth. 0031. According to still further features in the described 0022. Current biological data retrieval systems are not preferred embodiments the method further comprising scor fully up to the demand of smooth and flexible data integra ing each of the plurality of the expressed sequences prior to tion Etzold et al. (1996) Methods Enzyol 266:t14-t28; (c), wherein the scoring is effected according to: Schuler et al. (1996) Methods Enzymol. 266:141-162: 0032 (i) expression level of each of the plurality of the Chung and Wong (1999) Trends Biotech. 17:351-355). expressed sequences; and 0023 There is thus a widely recognized need for, and it 0033 (ii) a quality of each of the plurality of the would be highly advantageous to have, systems and methods expressed sequences; which can be used for efficient retrieval and processing of 0034. According to still further features in the described data from biological databases thereby enabling annotation preferred embodiments comparing is effected using statisti of previously un-annotated biomolecular sequences. cal pairing analysis. SUMMARY OF THE INVENTION 0035. According to still further features in the described 0024. According to one aspect of the present invention preferred embodiments the statistical pairing analysis is there is provided a computer readable storage medium, Fisher exact test. US 2006/0068405 A1 Mar. 30, 2006

0036). According to still further features in the described 0047 According to still further features in the described preferred embodiments the tissue is selected from the group preferred embodiments the at least one oligonucleotide is consisting of a tissue of a pathological origin of interest, a designed and configured for DNA hybridization. tissue of a cellular composition of interest. 0048. According to still further features in the described 0037 According to still further features in the described preferred embodiments the at least one oligonucleotide is preferred embodiments the method further comprising com designed and configured for RNA hybridization. paring the number of the plurality of expressed sequences in 0049 According to yet an additional aspect of the present the tissue having the at least one contig sequence region with invention there is provided a system for generating a data a number of the plurality of expressed sequences of the base of differentially expressed genes, the system compris contig. ing a processing unit, the processing unit executing a soft 0038 According to still another aspect of the present ware application configured for: (a) obtaining contigs invention there is provided a computer readable storage representing genes of interest, each of the contigs being medium comprising data stored in a retrievable manner, the assembled from a plurality of expressed sequences; (b) data including sequence information of differentially comparing a number of the plurality of expressed sequences expressed mRNA sequences as set forth in files “Tran corresponding to each of the contigs, which are expressed in scripts.gz’, and/or “Proteins.gz' of enclosed CD-ROM4. each of at least two tissue types, to thereby compare the and sequence annotations as set forth in annotation catego expression level of the genes of interest in the at least two ries “HTS”, “HTAA’ and/or “HTAAT, in the file “Annota tissue types; and (c) storing contigs which are Supported by tions.gz' of enclosed CD-ROM4. different numbers of the plurality of expressed sequences in each of the at least two tissue types, to thereby generate the 0039. According to still further features in the described database of differentially expressed genes. preferred embodiments the database further includes infor mation pertaining to generation of the data and potential 0050. According to still an additional aspect of the uses of the data. present invention there is provided an isolated polynucle otide comprising a nucleic acid sequence being at least 80% 0040 According to still further features in the described identical to a nucleic acid sequence of the sequences set preferred embodiments the medium is selected from the forth in file “Transcripts.gz' of the enclosed CD-ROM4. group consisting of a magnetic storage medium; an optical Storage medium and an optico-magnetic storage medium. 0051. According to still further features in the described preferred embodiments the nucleic acid sequence is set forth 0041 According to still further features in the described in the file “Transcripts.gz of the enclosed CD-ROM4. preferred embodiments the database further includes infor 0052 According to a further aspect of the present inven mation pertaining to gain and/or loss of function of the tion there is provided an isolated polynucleotide comprising differentially expressed mRNA splice variants or polypep a nucleic acid sequence encoding a polypeptide having an tides encoded thereby. amino acid sequence at least 80% homologous to a sequence 0042. According to an additional aspect of the present set forth in the file “Proteins.gz” of the enclosed CD-ROM4. invention there is provided a kit useful for detecting differ entially expressed polynucleotide sequences, the kit com 0053 According to yet a further aspect of the present prising at least one oligonucleotide being designed and invention there is provided an isolated polynucleotide com configured to be specifically hybridizable with a polynucle prising a nucleic acid sequence at least 80% identical to a otide sequence selected from the group consisting of sequence set forth in the file “Transcripts.gz' of the enclosed sequence files “Transcripts.gz' of enclosed CD-ROM4 CD-ROM4. under moderate to stringent hybridization conditions. 0054 According to still a further aspect of the present invention there is provided an isolated polypeptide having 0043. According to still further features in the described an amino acid sequence at least 80% homologous to a preferred embodiments the at least one oligonucleotide is sequence set forth in the file “Proteins.gz' of the enclosed labeled. CD-ROM4. 0044 According to still further features in the described 0055 According to still a further aspect of the present preferred embodiments the at least one oligonucleotide is invention there is provided use of a polynucleotide or attached to a solid substrate. polypeptide set forth in the file “Transcripts.gz' or “Pro 0045 According to still further features in the described teins.gz' of the enclosed CD-ROM4 for the diagnosis and/or preferred embodiments the solid substrate is configured as a treatment of the diseases listed in herein. microarray and whereas the at least one oligonucleotide 0056. The present invention successfully addresses the includes a plurality of oligonucleotides each being capable shortcomings of the presently known configurations by of hybridizing with a specific polynucleotide sequence of the providing methods and systems useful for systematically polynucleotide sequences set forth in the files “Tran uncovering and annotating biomolecular sequences. scripts.gz' of enclosed CD-ROM4 under moderate to strin gent hybridization conditions. 0057. Unless otherwise defined, all technical and scien tific terms used herein have the same meaning as commonly 0046 According to still further features in the described understood by one of ordinary skill in the art to which this preferred embodiments each of the plurality of oligonucle invention belongs. Although methods and materials similar otides is being attached to the microarray in a regio-specific or equivalent to those described herein can be used in the a. practice or testing of the present invention, Suitable methods US 2006/0068405 A1 Mar. 30, 2006

and materials are described below. In addition, the materials, colorectal cancer-specific transcript. The following cell and methods, and examples are illustrative only and not intended tissue samples were tested: B-colon carcinoma cell line to be limiting. SW480 (ATCC-228); C colon carcinoma cell line SW620 (ATCC-22.7); D colon carcinoma cell line colo-205 BRIEF DESCRIPTION OF THE DRAWINGS (ATCC-222). Colon normal tissue indicates a pool of 10 0058. The invention is herein described, by way of different samples, (Biochain, cat no A406029). The adeno example only, with reference to the, accompanying draw carcinoma sample represents a pool of spleen, lung, stomach ings. With specific reference now to the drawings in detail, and kidney adenocarcinomas, obtained from patients. Each it is stressed that the particulars shown are by way of of the tissues (i.e., colon carcinoma samples Duke's A-D; example and for purposes of illustrative discussion of the and normal muscle, pancreas, breast, liver, testis, lung, heart, preferred embodiments of the present invention only, and are ovary, thymus, spleen kidney, placenta, stomach, brain) presented in the cause of providing what is believed to be the were obtained from 3-6 patients and pooled. most useful and readily understood description of the prin 0068 FIG. 8 illustrates results from RT-PCR analysis of ciples and conceptual aspects of the invention. In this regard, the expression pattern of the AA513157 (SEQ ID NO: 7) no attempt is made to show structural details of the invention Ewing sarcoma specific transcript. The (+) or (-) symbols, in more detail than is necessary for a fundamental under indicate presence or absence of reverse transcriptase in the standing of the invention, the description taken with the reaction mixture. A molecular weight standard is indicated drawings making apparent to those skilled in the art how the by M. Tissue samples (i.e., Ewing sarcoma samples, spleen several forms of the invention may be embodied in practice. adenocarcinoma, brain, prostate and, thymus) were obtained 0059) In the drawings: from patients. The Ln-CAP human prostatic. adenocarci 0060 FIG. 1a illustrates a system designed and config noma cell line was obtained from the ATCC (Manassas, Va.). ured for generating a database of annotated biomolecular 0069 FIG. 9 is an autoradiogram of a northern blot sequences according to the teachings of the present inven analysis depicting tissue distribution and expression levels tion. of AA513157 (SEQ ID NO: 7) Ewing sarcoma specific transcript. Arrows indicate the molecular weight of 28S and 0061 FIG. 1b illustrates a remote configuration of the 18S ribosomal RNA subunits. The indicated tissue samples system described in FIG. 1a. were obtained from patients and SK-ES-1—Ewing sarcoma 0062 FIG. 2 illustrates a gastrointestinal tissue hierarchy cell-line was obtained from the ATCC (CRL-1427). dendrograni generated according to the teachings of the 0070 FIG. 10 illustrates results from semi quantitative present invention. RT-PCR analysis of the expression pattern of the AA469088 0063 FIG. 3 is a scheme illustrating multiple alignment (SEQ ID NO: 40) colorectal specific transcript. Colon nor of alternatively spliced expressed sequences with a genomic mal was obtained from Biochain, cat no: A406029. The sequence including 3 exons (A, B and C) and two introns. adenocarcinoma sample represents a pool of spleen, lung, Two alternative splicing events are described: One from the stomach and kidney adenocarcinomas, obtained from donor site, which involves an AB junction, between donor patients. Each of all other tissues (i.e., colon carcinoma and proximal acceptor and an AC junction, between donor samples Duke’s A-D; and normal thymus, spleen, kidney, and distal acceptor; A Second alternative splicing event is placenta, stomach, brain) were obtained from 3-6 patients described from the acceptor site, which involves AC junc and pooled. tion, between distal donor and acceptor and BC junction, 0071 FIG. 11 is a histogram depicting Real-Time RT between proximal donor and acceptor. PCR quantification of copy number, of a lung specific transcript, (SEQ ID NO: 15). Amplification products, 0064 FIG. 4 is a tissue hierarchy dendogram generated obtained from the following tissues were quantified; normal according to the teachings of the present invention. The salivary gland from total RNA (Clontech, cat no:64110-1); higher annotation levels are marked with a single number, lung normal from pooled adult total RNA (BioChain, cat i.e., 1-16. The lower annotation levels are marked within the no: A409363); lung tumor squamous cell carcinoma (Clon relevant category as one-four numbers after the point (e.g. tech, cat no:64013-1); lung tumor squamos cell carcinoma 4. genitourinary system; 4.2 genital system; 4.2.1 women (BioChain, cat no: A409017); pooled lung tumor squamos genital system; 4.2.1.1 cervix). cell carcinoma (BioChain, cat no: A411075); moderately 0065 FIG. 5 is a graph illustrating a correlation between differentiated squamos cell carcinoma (BioChain, cat no: LOD scores of textual information analysis and accuracy of A409091); well differentiated squamos cell carcinoma (Bio ontological annotation prediction. Results are based on Chain, cat no: A408175); pooled adenocarcinoma (Bio self-validation studies. Only predictions made with LOD Chain, cat no: A411076); moderately differentiated alveolus scores above 2 were evaluated and used for GO annotation cell carcinoma (BioChain, cat no: A409089); non-small cell process. lung carcinoma cell line H1299; The following normal and tumor samples were obtained from patients: normal lung 0.066 FIGS. 6a-c are histograms showing the distribu (internal number-CG-207N), lung carcinoma (internal num tion of proteins (closed squares) and contigs (opened ber-CG-72), squamos cell carcinoma (internal number-CG squares) from Ensembl version 1.0.0 in the major nodes of 196), squamos cell carcinoma (internal number-CG-207), three GO categories—cellular component (FIG. 6a). lung adenocarcinoma (internal number-CG-120), lung molecular function (FIG. 6b), and biological process (FIG. adenocarcinoma (internal number-CG-160). Copy number 6c). was normalized to the levels of expression of the house 0067 FIG. 7 illustrates results from RT-PCR analysis of keeping genes Proteasome 26S Subunit (dark columns) and the expression pattern of the AA535072 (SEQ ID NO:39) GADPH (bright columns). US 2006/0068405 A1 Mar. 30, 2006

0072 FIG. 12 is a histogram depicting Real-Time RT mous cell carcinoma (BioChain, CDP-064004B: A503187). PCR quantification of copy number, of the lung specific Sample 18 is lung Squamous cell carcinoma (BioChain, transcript (SEQ ID NO: 32). Amplification products CDP-064004B: A503386). Samples 20-21 are lung moder obtained from the following tissues and cell-lines were ately differentiated squamous cell carcinoma (BioChain, quantified: lung normal from pooled adult total RNA (Bio CDP-064004B; A503387, A503383). Sample 22 is lung Chain, cat no: A409363); lung tumor squamos cell carci squamous cell carcinoma pooled (BioChain, CDP-064004B; noma (Clontech, cat no:64013-1); lung tumor squamos cell * A411075). Samples 23-26 and sample 31 are lung squa carcinoma (BioChain, cat no: A409017); pooled lung tumor mous cell carcinoma obtained from patients. Sample 27 is squamos cell carcinoma (BioChain, cat no: A411075); mod lung Squamous cell carcinoma (Clontech, 64013-1). Sample erately differentiated Squamos cell carcinoma (BioChain, cat 28 is lung squamous cell carcinoma (BioChain, A409017). no: A409091); well differentiated squamos cell carcinoma Sample 29 is lung moderately differentiated squamous cell (BioChain, cat no: A408175); pooled adenocarcinoma (Bio carcinoma (BioChain, CDP-064004B: A409091). Sample Chain, cat no: A411076); moderately differentiated alveolus 30 is lung well differentiated Squamous cell carcinoma cell carcinoma (BioChain, cat no: A409089); non-small cell (BioChain, CDP-064004B: A408175). Samples 32-35 are lung carcinoma cell line H1299; The following normal and lung small cell carcinoma (BioChain, CDP-064004D: tumor samples were obtained from patients: normal lung A504115, A501390, A501389, A501391). Sample 36-37 are (internal number-CG-207N), lung carcinoma (internal num lung large cell carcinoma (BioChain, CDP-064004C: ber-CG-72), squamos cell carcinoma (internal number-CG A504113, A504114). Sample 38 is lung moderately differ 196), squamos cell carcinoma (internal number-CG-207), entiated alveolus cell carcinoma (BioChain, A409089). lung adenocarcinoma (internal number-CG-120), lung Sample 39 is lung carcinoma obtained from patient. Sample adenocarcinoma (internal number-CG-160). Copy number 40 is lung H1299 non-small cell carcinoma cell line. Sample was normalized to the levels of expression of the house 41 is normal salivary gland sample (Clontech, 64110-1). keeping genes Proteasome 26S Subunit (dark columns) and Copy number was normalized to the levels of expression of GADPH (bright columns). the housekeeping genes Proteasome 26S Subunit (dark col 0073 FIG. 13 is a histogram depicting Real-Time RT umns) and GADPH (bright columns). PCR quantification of copy number, of the lung specific 0075 FIGS. 15a-c are schematic illustrations depicting transcript (SEQ ID NO: 18). Amplification products the methodology undertaken for finding exon-skipping obtained from the following tissues and cell-lines were events which are conserved between human and mice quantified; lung normal from pooled adult total RNA (Bio genomes. 3,583 exon skipping events were found in the Chain, cat no: A409363); lung tumor squamos cell carci human genome using the methodology described in Sorek noma (Clontech, cat no:64013-1); lung tumor squamos cell (2602) Genome Res. 12:1060-1067. FIG. 15a for 980 of carcinoma (BioChain, cat no: A409017); pooled lung tumor these human exons, a mouse EST spanning the intron which squamos cell carcinoma (BioChain, cat no: A41.1075); represents the exon-skipping variant was found. Human moderately differentiated squamos cell carcinoma (Bio ESTs are designated in purple. Mouse ESTs are denoted by Chain, cat no: A409091); well differentiated squamos. cell light blue. FIGS. 15b-c depict two approaches for identify carcinoma (BioChain, cat no: A4081.75); pooled adenocar ing exon conservation between mice and human. FIG. 15b cinoma (BioChain, cat no: A411076); moderately differen depicts the identification of mouse ESTs which contain the tiated alveolus cell carcinoma (BioChain, cat no: A409089); exon as well as the two flanking exons. FIG. 15c illustrates non-small cell lung, carcinoma cell: line H1299; The fol a specific embodiment wherein the exon is absent in the lowing normal and tumor samples were obtained from mouse ESTs, in this case the human exon sequence is patients: normal lung (internal number-CG-207N), lung searched against the intron spanned by the skipping mouse carcinoma (internal number-CG-72), Squamos cell carci EST on the mouse genome. If a significant conservation (i.e., noma (internal number-CG-196), squamos cell carcinoma above 80%) was found and the alignment spanned the full (internal number-CG-207), lung adenocarcinoma (internal length of the human exon, the exon was considered con number-CG-12Q), lung adenocarcinoma (internal number served. CG-160). Copy number was normalized to the levels of expression of the housekeeping genes Proteasome 26S Sub 0076 FIGS. 16a-d illustrate the stepwise methodology unit (dark columns) and GADPH (bright columns). which is used to uncover true SNPs, as described in Example 0074 FIG. 14 is a histogram depicting Real-Time RT 22 of the Examples section. PCR quantification of copy number, of a lung specific 0077 FIG. 17 is a schematic illustration, depicting transcript (SEQ ID NO: 21). Amplification products grouping of transcripts of a given contig based on presence obtained from the following tissues and cell-lines were or absence of unique sequence regions. Region 1: common quantified; Samples 1-6 are commercial normal lung to all transcripts, thus it is not considered; Region 2: Specific samples (BioChain, CDP-061010; A503205, A503384, to T 1: T 1 unique regions (2+6) against T 2+3 unique A503385, A503204, A503206, A409363). Sample 7 is lung regions (3+4); Region 3: Specific to T 2+3: T 2+3 unique well differentiated adenocarcinoma (BioChain, CDP regions (3+4) against T1 unique regions (2+6); Region 4: 064004A. A504117). Sample 8 is lung moderately differen specific to T 3: T 3 unique regions (4) (against T1+2 tiated adenocarcinoma (BioChain, CDP-064004A. unique regions (2+5+6); Region 5: Specific to T 1+2: A504119). Sample 9 is lung moderately to poorly differen T 1+2 unique regions (2+5+6) against T3 unique regions tiated adenocarcinoma (BioChain, CDP-064004A. (4); Region 6: specific to T 1: same as region 2. A504116). Sample 10 is lung well differentiated adenocar cinoma (BioChain, CDP-064004A. A504118). Samples 0078 FIG. 18a is a schematic illustration depicting the 11-16 are lung adenocarcinoma samples obtained from GCSF splice variant (SEQID NO: 68) as compared to the patients. Sample 17 is lung moderately differentiated squa wild-type gene product. US 2006/0068405 A1 Mar. 30, 2006

0079 FIG. 18b present the nucleic acid sequence of the 0091 FIG. 20d is a sequence alignment depicting the GCSF splice variant (SEQ ID NO: 71), which was uncov protein product of a VEGF-B splice variant (SEQ ID NO: ered using the teachings of the present invention. Start and 70) as compared to the wild-type protein (GenBank acces stop codons are highlighted. sion No. VEGB. HUMAN). 0080 FIG. 18c present the amino acid sequence of the 0092 FIG. 20e is an illustration depicting a graphical GCSF splice variant (SEQ ID NO: 68), which was uncov viewer scheme presenting the a splice variant of VEGF-B ered using the teachings of the present invention. (SEQ ID NO: 70) uncovered by the present invention as compared to the wild type mRNA of VEGF-B. ESTs sup 0081 FIG. 18d is a sequence alignment depicting the porting the variant are indicated. The transcript indicated as protein product of a GCSF splice variant (SEQ ID NO: 68) “0” represents known mRNA. The color code is as follows: as compared to the wild-type protein (Ref.sec. Accession No. red designates genomic DNA; pink designates Refseq MN000759). mRNA; light blue designates known GenBank mRNAs. 0082 FIG. 18e is an illustration depicting a graphical purple designates ESTs which are aligned in the same viewer scheme presenting the a splice variant of GCSF directionality as their annotation; black designates ESTs (SEQ ID NO: 68) uncovered by the present invention as aligned in a direction opposite to the annotation; gray compared to the wild type mRNA of GCSF. ESTs supporting designates ESTs without direction annotation; dark blue the variant are indicated. The transcript indicated as “O'” designates predicted transcripts; turquoise designates the represents known mRNA. The color code is as follows: red predicted polypeptide. designates genomic DNA; pink designates Refseq mRNA: 0093 FIG. 21 is an illustration depicting schematic light blue designates known GenBank mRNAs; purple des alignment of the nucleic acid sequences of wild type Tropo ignates ESTs which are aligned in the same directionality as nin transcript (GenBank Accession No. NM 003283) and their annotation; gray designates ESTs without direction variants 1, 4, 6, 9, 10, 14 and 16 (SEQ ID NOS. 75, 77, 79, annotation; dark blue designates predicted transcripts; tur 81, 83, 66 and 67, respectively). Coding regions are marked quoise designates the predicted polypeptide. by green. Sequence region 4a codes for the unique amino 0.083 FIG. 19a is a schematic illustration depicting the acid sequence and is marked by light green and diagonal IL-7 splice variant (SEQ ID NO: 69) as compared to the stripes. Other regions marked in light green code for addi wild-type gene product. tional novel amino acids sequences. Red arrows indicate the 0084 FIG. 19b present the nrucleic acid sequence of the location of the primers and SEQ ID NOS. thereof, which IL-7 splice variant (SEQID NO: 72), which was uncovered were used for real-time PCR validation. using the teachings of the present invention. Start and stop 0094 FIG. 22 is a histogram depicting the expression of codons are highlighted. troponin transcripts of the present invention in normal, benign and tumor derived ovarian samples as determined by 0085 FIG. 19.c present the amino acid sequence of the real time PCR using a troponin-S69208 unique region IL-7 splice variant (SEQID NO: 69), which was uncovered derived fragment (SEQID NOs: 44—amplicon). Expression using the teachings of the present invention. was normalized to the averaged expression of four house 0.086 FIG. 19d is a sequence alignment depicting the keeping genes PBGD, HPRT, GAPDH and SDHA. protein product of an IL-7 splice variant (SEQ ID NO: 69) 0095 FIG. 23 is a histogram depicting the expression of as compared to the wild-type protein (GenBank Accession troponin transcripts of the present invention in normal and No. IL7 HUMAN). tumor derived lung samples as determined by real time PCR 0087 FIG. 19e is an illustration depicting a graphical using a troponin-S69208 unique region derived fragment viewer scheme presenting the a splice variant of IL-7 (SEQ (SEQID NO: 44 amplicon). Expression was normalized to ID NO: 69) uncovered by the present invention as compared the averaged expression of four housekeeping genes PBGD, to the wild type mRNA of IL-7. ESTs supporting the variant HPRT, Ubiquitin and SDHA. are indicated. The transcript indicated as “0” represents 0096 FIG. 24 is a histogram depicting the expression of known mRNA. The color code is as follows: red designates troponin transcripts of the present invention in non-cancer genomic DNA; pink designates Refseq mRNA: light blue ous, and tumor derived colon samples as determined by real designates known GenBank mRNAS; purple designates time PCR using a troponin-S69208 unique region derived ESTs which are aligned in the same directionality as their fragment (SEQ ID NOs: 44 amplicon). Expression was annotation; gray designates ESTs without direction annota tion; dark blue designates predicted transcripts; turquoise normalized to the averaged expression of four housekeeping designates the predicted polypeptide. genes PBGD, HPRT, RPS27A and G6PD. DESCRIPTION OF THE PREFERRED 0088 FIG. 20a is a schematic illustration depicting the EMBODIMENTS VEGF-B splice variant (SEQID NO:70) as compared to the wild-type gene product. 0097. The present invention is of methods and systems, which can be used for annotating biomolecular sequences. 0089 FIG. 20b present the nucleic acid sequence of the Specifically, the present invention can be used to identify VEGF-B splice variant (SEQID NO: 73) which was uncov and annotate differentially expressed biomolecular ered using the teachings of the present invention. Start and sequences, such as differentially expressed alternatively stop codons are highlighted. spliced sequences. 0090 FIG. 20c present the amino acid sequence of the 0098. The principles and operation of the present inven VEGF-B splice variant (SEQID NO: 70) which was uncov tion may be better understood with reference to the drawings ered using the teachings of the present invention. and accompanying descriptions. US 2006/0068405 A1 Mar. 30, 2006

0099 Before explaining at least one embodiment of the 0.107 As used herein the phrase “functionally altered invention in detail, it is to be understood that the invention biomolecular sequences’ refers to expressed sequences, is not limited in its application to the details of construction (e.g., alternatively spliced sequences) which protein prod and the arrangement of the components set forth in the ucts exhibit gain of function or loss of function or modifi following description or illustrated in the drawings. The cation of the original function. invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be 0108) As used herein the phrase “gain of function” refers understood that the phraseology and terminology employed to any gene product (e.g., product of alternative splicing, herein is for the purpose of description and should not be product of RNA editing), which exhibits increased function regarded as limiting. ality as compared to the wild type gene product. Such again of function may have a dominant effect on the wild-type 0100 Terminology gene product. An alternatively spliced variant of Max, a 0101. As used herein, the term "oligonucleotide refers to binding partner of the Myc oncogene, provides a typical a single stranded or double stranded oligomer-or polymer of example for a “gain of function' alteration. This variant is ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) or truncated at the COOH-terminus and while is still capable of mimetics thereof. This term includes oligonucleotides com binding to the CACGTG motif of c-Myc, it lacks the nuclear posed of naturally-occurring bases, Sugars and covalent localization signal and the putative regulatory domain of internucleoside linkages (e.g., backbone) as well as oligo Max. When tested in a myc-ras cotransformation assay in rat nucleotides having non-naturally-occurring portions which embryo fibroblasts, wild-type Max suppressed cellular trans function similarly. Such modified or substituted oligonucle formation, whereas the above-described Max splice variant otides are often preferred over native forms because of enhanced transformation Makela T P Koskinen PJ. Vastrik desirable properties such as, for example, enhanced cellular I, Alitalo K., Science. 1992 April 17:256(5055):373-7). uptake, enhanced affinity for nucleic acid target and 0.109 As used herein the phrase “loss of function” refers increased stability in the presence of nucleases. to any gene product (mRNA or protein), which exhibits total 0102) The phrase “complementary DNA (cDNA) refers or partial reduction in function as compared to the wild type to the double stranded or single stranded DNA molecule, gene product. Loss of function can also manifest itself which is synthesized from a messenger RNA template. through a dominant negative effect. 0103) The term “contig” refers to a series of overlapping 0110. As used herein the phrase “dominant negative' sequences with Sufficient identity to create a longer contigu refers to the dominant negative effect of a gene product (e.g., ous sequence. A plurality of contigs may form a cluster. product of alternative splicing, product of RNA editing) on Clusters are generally formed based upon a specified degree the activity of wild type protein. For example, a protein of homology and overlap (e.g., a stringency), and/or based product of an altered splice variant may bind a wild type on prior knowledge of ESTs from different contigs derived target protein without enzymatically activating it (e.g., from the same mRNA also known as clone mates. The receptor dimers), thus blocking and preventing the active different contigs in a cluster do not typically represent the from binding and activating the target protein. This entire sequence of the gene, rather the gene may comprise mode of action provides a mechanism to the dominant one or more unknown intervening sequences between the negative action of soluble receptors on wild-type membrane defined contigs. anchored receptors. Such soluble receptors may compete with wild-type receptors on ligand-binding and as Such may 0104. The term “cluster” refers to a nucleic acid sequence be used as antagonists. For example, two splice variants of cluster or a protein sequence cluster. The former refers to a guanylyl cyclase-B receptor were recently described (GC group of nucleic acid sequences which share, a requisite B1, Tamura N and Garbers D. L. J. Biol. Chem. (2003) level of homology and or other similar traits according to a 278(49):48880-9). One form has a 25 amino acid deletion in given clustering criterion; and the latter refers to a group of the kinase homology domain. This variant binds the ligand protein sequences which share a requisite level of homology but fails to activate the cyclase. A second-Variant includes and/or other similar traits according to a given clustering only a portion of the extracellular domain. This form fails to criterion. bind the ligand. Both variants. When co-expressed with the 0105. A process and/or method to group nucleic acid or wild-type receptor both act as dominant negative isoforms protein sequences as Such is referred to as clustering, which by virtue of blocking formation of active GC-B1 is typically performed by a clustering (i.e., alignment) homodimers. application program implementing a cluster algorithm. 0.111) A dominant negative effect may also be exerted by 0106. As used herein the phrase “biomolecular miss-localization of the altered variant or by multiple modes sequences’ refers to amino acid sequences (i.e., peptides, of action. For example, the splice variants of wild-type polypeptides) and nucleic acid sequences, which include but mytogen activated protein kinase 5a, ERK5b and mERK5c are not limited to genomic sequences, expressed sequence act as dominant negative inhibitors based on inhibition of tags, contigs, complementary DNA (cDNA) sequences, pre mERK5a kinase activity and mERK5a-mediated MEF2C messenger RNA (mRNA) sequences, and mRNA sequences. transactivation. The C-terminal tail, which contains a puta Expressed sequences include also products of alternative tive nuclear localization signal, is not required for activation splicing or RNA editing events which are well known for and kinase activity but is responsible for the activation of contributing to gene product diversity Krevintseva Trends nuclear transcription factor MEF2C due to nuclear targeting. Genet. (2003) 19(3):124-8; Keegan (2001) Nat. Rev. Genet. In addition, the N-terminal domain spanning amino acids 2:869-78; Schaub (2002) Biobhimie 84:791-803; Adler (aa) 1-77 is important for cytoplasmic targeting; the domain (1994) Curr. Opin. Genet. Dev. 4:316-22). from aa 78 to 139 is required for association with the US 2006/0068405 A1 Mar. 30, 2006

upstream kinase MEK5; and the domain from aa 140-406 is types of tissues, such as healthy and diseased, which may necessary for oligomerization Yan et al. J Biol Chem. contribute immensely to further understanding of disease (2001) 276(14): 10870-8). mechanisms and allow use thereof in the configuration of therapeutic and diagnostic applications. 0112 The phrase “modification of the original function' may be exemplified by a changing a receptor function to a 0118. As is further described hereinunder, the present ligand function. For example, a soluble secreted receptor invention encompasses several novel approaches for anno may exhibit change in functionality as compared to a tating biomolecular sequences which can be.individually membrane-anchored wild-type receptor by acting as a applied or in combination. ligand, activating parallel signaling pathways by trans 0119) “Annotating” refers to the act of discovering and/or signaling e.g., the signaling reported for soluble IL-6R. assigning an annotation (i.e., critical or explanatory notes or Kallen Biochim Biophys Acta. (2002) November 11:1592(3):323-43), stabilizing ligand-receptor interactions comment) to a biomolecular sequence of the present inven or protecting the ligand or the wild-type receptor from tion. degradation and/or prolonging their half-life. In this case the 0.120. The term “annotation” refers to a functional or soluble receptor will function as an agonist. structural description of a sequence, which may include identifying attributes such as locus name, keywords, Med 0113 As used herein the term “modulator refers to a line references, cloning data, single nucleotide polymor molecule which inhibits (i.e., antagonist, inhibitor, Suppres phism data, information of coding region, regulatory sor) or activates (i.e., agonist, stimulant, activator) a down regions, catalytic regions, name of encoded protein, Subce stream molecule to thereby modulate it’s activity. lular localization of the encoded protein, protein hydropho 0114. As used herein the phrase “functional domain bicity, protein function, mechanism of protein function, refers to a region of a biomolecular sequence, which dis information on metabolic pathways, regulatory pathways, plays a particular function. This function may give rise to a protein-protein interactions, tissue expression profile, dis biological, chemical, or physiological consequence which eases and disorders (i.e., indications), therapies, pharmaco may be reversible or irreversible and which may include logical activities and diagnostic applications. protein-protein interactions (e.g., binding interactions) involving the functional domain, a change in the conforma The Ontological Annotation Approach tion or a transformation into a different chemical state of the 0121 An ontology refers to the body of knowledge in a functional domain or of molecules acted upon by the func specific knowledge domain or discipline Such as molecular tional domain, the transduction of an intracellular or inter biology, microbiology, immunology, Virology, plant sci cellular signal, the regulation of gene or protein expression, ences, pharmaceutical chemistry, medicine, * neurology, the regulation of cell growth or death, or the activation or endocrinology, genetics, ecology, genomics, proteomics, inhibition of an immune response. cheminformatics, pharmacogenomics, bioinformatics, com 0115 With the presentation of the human genome work puter Sciences, statistics, mathematics, chemistry, physics ing draft, data analysis rather than data collection presents and artificial intelligence. the biggest challenge to biologists. Efforts to ascribe bio 0122) An ontology includes domain-specific concepts - logical meaning to genomic data, include the development referred to herein as Sub-ontologies. A Sub-ontology may be of advanced wet laboratorial techniques as well as comput classified into Smaller and narrower categories. erized algorithms. While the former are limited due to inaccuracy, time consumption, labor intensiveness and costs 0123 The ontological annotation approach of the the latter are still unfeasible due to the poor organization of present- invention is effected as follows. on hand sequence databases as well as the composite nature 0.124 First, biomolecular sequences are computationally of biological data. clustered according to a progressive homology range, 0116. As is further described hereinbelow, the present thereby generating a plurality of clusters each being of a inventors have developed a computer-based approach for the predetermined homology of the homology range. functional, spatial and temporal analysis of biological data. 0.125 Progressive homology according to this aspect of The present methodology generates comprehensive data the present invention is used to identify meaningful homolo bases, which greatly facilitate the use of available genetic gies among biomolecular sequences and thereby assign new information in both research and commercial applications. ontological annotations to sequences, which share requisite 0117 By applying the algorithms described hereinbelow levels of homnologies. Essentially, a biomolecular sequence and in the Examples section, which follows, the present is assigned to a specific cluster if displays a predetermined inventors collected sequence information and corresponding homology to at least one member of the cluster (i.e., single sequence annotations as set forth in the files “Transcripts linkage). As used herein “progressive homology range' nucleotide seqs part 1 of CD ROM1, “Transcripts refers to a range of homology thresholds, which progress via nucleotide seqs part2. “Transcripts nucleotide seqs predetermined increments from a low homology level (e.g. part3. “protein seqs'. “ProDG seqs', 35%) to a high homology level (e.g. 99%). Further descrip “Transcripts nucleotide seqs part4” of CD-ROM2, "sum tion of a progressive homology range is provided in the mary table' of CD-ROM3, “Annotations.gz'. “Tran Examples section which follows. scripts.gz’, and “Proteins.gz' of enclosed CD-ROM4. This 0.126 Following generation of clusters, one or more comprehensive database allows simple elucidation of yet ontologies are assigned to each cluster. Ontologies are unknown function of mass gene products and illustrates derived from an annotation preassociated with at least one spatial and temporal patterns of gene expression in various biomolecular sequence of each cluster, and/or generated by US 2006/0068405 A1 Mar. 30, 2006 analyzing (e.g., text-mining) at least one biomolecular a cluster if this sequence shares a sequence homology above sequence of each cluster thereby annotating biomolecular a certain threshold to one member of the cluster. The Sequences. threshold increments from a high homology level to a low 0127. Any annotational information identified and/or homology level with a predetermined resolution. Preferably generated according to the teachings of the present invention the, homology range is selected from 99% -35%. can be stored in a database which can be generated by a 0.134 Computational clustering can be effected using any Suitable computing platform. commercially available alignment Software including the 0128. Thus, the method according to this aspect of the local homology algorithm of Smith & Waterman, Adv. Appl. present invention provides a novel approach for annotating Math. 2.482 (1981), using the homology alignment algo biomolecular sequences even on a scale of a genome, a rithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), transcriptom (i.e., the repertoire of all messenger RNA using the search for similarity method of Pearson & Lipman, molecules transcribed from a genome) or a proteom (i.e., the Proc. Natl. Acad. Sci. USA 85:2444 (1988), or using repertoire of all proteins translated from messenger RNA computerized implementations of algorithms GAP, BEST molecules). This enables transcriptome-wise comparative FIT, FASTA, and TFASTA in the Wisconsin Genetics Soft analyses (e.g., analyzing chromosomal distribution of ware Package Release 7.0, Genetics Computer Group, 575 human genes) and cross-transcriptome comparative studies Science Dr., Madison, Wis. (e.g., comparing expressed, data across species) both of 0.135 Another example of -an algorithm which is suitable which may involve various Subontologies such as molecular for sequence alignment is the BLAST algorithm, which is function, biological process and cellular localization. described in Altschul et al., J. Mol. Biol. 215:403-410 0129 Biomolecular sequences which can be used as (1990). Software for performing BLAST analyses- is pub working material for the annotating process according to this licly available through the National Center for Biotechnol aspect of the present invention can be obtained from a ogy Information (http://www.ncbi.hlm.nih.gov/). biomolecular sequence database. Such a database can include protein sequences and/or nucleic acid sequences 0.136. Since the present invention requires processing of derived from libraries of expressed messenger R i.e., large amounts of data, sequence alignment is preferably expressed sequence tags (EST). cDNA clones, contigs, effected using assembly Software. pre-mRNA, which are prepared from specific tissues or 0.137. A number of commonly used computer software cell-lines or from whole organisms. fragment read assemblers capable of forming clusters of 0130. This database can be a pre-existing publicly avail expressed sequences, and aligning members of the cluster able database i.e., GenBank database maintained by the (individually or as an assembled contig) with other National Center for Biotechnology Information (NCBD, sequences (e.g., genomic database) are now available. These part of the National Library of Medicine, and the TIGR packages include but are not limited to. The TIGR Assem database maintained by The Institute for Gendmic Research, bler Sutton G. et al. (1995) Genome Science and Technol Blocks database maintained by the Fred Hutchinson Cancer ogy 1:9-19), GAP Bonfield J. K. et al. (1995) Nucleic Acids Research Center, Swiss-Prot site maintained by the Univer Res. 23:4992-4999), CAP2 Huang X. et al. (1996) Genom sity of Geneva and GenPept maintained by NCBI and ics 33:21-31), the Genome Construction Manager Laurence including public protein-sequence database which contains C B. Et al. (1994) Genomics 23:192-2011, Bio Image all- the protein databases from GenBank, or private data Sequence Assembly Manager, SeqMan Swindell S R. and bases (i.e., the LifeSeq.TM and PathoSeq.TM databases avail Plasterer J. N. (1997) Methods Mol. Biol. 70:75-89), and able from Incyte Pharmaceuticals, Inc. of Palo Alto, Calif.). LEADS and GenCarta (Compugen Ltd. Israel). Optionally, biomolecular sequences of the present invention 0.138. It will be appreciated that since applying sequence can be assembled from a number of pre-existing databases homology analysis on large number of sequences is com as described in Example 5 of the Examples section. putationally intensive, local alignment (i.e., the alignment of 0131 Alternatively, the database can be generated from portions of protein sequences) is preferably effected prior to sequence libraries including, but, not limited to, cDNA global alignment (alignment of protein sequences along their libraries, EST libraries, mRNA libraries and the like. entire length), as described in Example 6 of the Examples 0132) Construction and sequencing of a cDNA library is section. one approach for generating a database of expressed mRNA 0.139. Once progressive clusters are formed, one or more sequences. cDNA library construction is typically effected ontological annotations (i.e., assigning an ontology) are by tissue or cell sample preparation, RNA isolation, cDNA assigned to each cluster. sequence construction and sequencing. 0140) Systematic and standardized ontological nomen 0133. It will be appreciated that such cDNA libraries can clature is preferably used. Such nomenclature (i.e., key be constructed from RNA isolated from whole organisms, words) can be obtained from several sources. For example, tissues, tissue sections, or cell populations. Libraries can ontological annotations derived from three main ontologies: also be constructed from a tissue reflecting a particular molecular function, biological process and cellular compo pathological or physiological state Once raw sequence data nent are available from the Gene Ontology Consortium is obtained, biomolecular sequences are computationally (www.geneontology.org). clustered according to a progressive homology range using one or more clustering algorithms. To obtain progressive 0.141 Alternatively a list of homogenized ontological clusters, the biomolecular sequences are clustered through nomenclature can be obtained from AcroMed—a computer single linkage. Namely, a biomolecular sequence belongs to generated database of biomedical acronyms and the associ US 2006/0068405 A1 Mar. 30, 2006

ated long forms extracted from the recent Medline abstracts man, Ido Dagan, and Haym Hirsh, Proceedings of the 1995 (http://www.expasy.org/tools/). Workshop on Knowledge Discovery in Databases, “Finding Associations in Collections of Text, Ronen Feldman and 0142 Optionally, various conversion tables which link Haym Hirsh, Machine Learning and Data Mining: Methods Commission number, InterPro protein motifs and and Applications, edited by R. S. Michalski, I. Bratko, and SwissProt keywords to gene ontology nodes are also avail M. Kubat, John Wiley & Sons, Ltd., 1997 "Technology Text able from www.geneontology.org and can be used with the Mining, Turning Information Into Knowledge: A White present method. Paper from IBM,” edited by Daniel Tkach, Feb. 17, 1998, 0143 Ontologies, sub ontologies, and their ontological each of which is fully incorporated herein by reference. relations (i.e., inherent relation the sub-ontology “IS THE” 0151. It will be appreciated that text mining may be ontology or composite relation the ontology “HAS’ the performed, in this and other embodiments of the present Sub ontology) can be organized into various computer data invention, for the text terms extracted from the definitions of structures such as a tree, a map, a graph, a stack or a list. gene or protein sequence records, retrievable from databases These may also be presented in various data format Such as, such as GenBank and Swiss-Prot and title line, abstract of text, table, html, or extensible markup language (XML) Scientific papers, retrievable from Medline database, (e.g., 0144 Ontologies and/or Subontologies assigned to a spe http://www.ncbi.nlm.nih.gov/PubMed/). cific biomolecular sequence can be derived from an anno 0152 Computer-dedicated software for biological text tation, which is preassociated with at least one biomolecular analysis is available from http://www.expasy.org/tools/. sequence in a cluster generated as described hereinabove. Examples include, but are not limited to, MedMiner—A 0145 For example, biomolecular sequences obtained Software system which extracts and organizes relevant sen from an annotated database are typically preassociated with tences in the literature based on a gene, gene-gene or an annotation. An "annotated database' refers to a database gene-drug query; Protein Annotators Assistant—A Software of biomolecular sequences, which are at least partially system which assists protein annotators in the task of characterized with respect to functional or structural aspects assigning functions to newly sequenced proteins; and of the sequence. Examples of annotated databases include XplorMed—A software system which explores a set of but are not limited to: GenBank (www.ncbi.nlm.nih.gov/ abstracts derived from a bibliographic search in MEDLINE. GenBank/), Swiss-Prot (www.expasy.cbfsprot/sprot 0153. Alternatively, assignment of ontological annota top.html), GDB (www.gdb.org/), PIR (www.mips.biochem tions may be effected by analyzing molecular, cellular mpg.de/proj/prostseqdb/), YDB and/or functional traits of the biomolecular sequences. Pre (www.mips.biochem.mpg.de/proj/yeast/), MIPS (www.mi diction of cellular localization may be done using any ps.biochem.mpg.de/proj/human) HGI (www.tigr.org/tdb/ computer-dedicated software. For example prediction of hgi/), Celera Assembled Human Genome (www.celera.com/ products/human ann.cfm and LifeSeq Gold (https:// cellular localization can be done using the ProLoc compu lifeseq gold.incyte.com). Additional specialized annotated tational platform Einat Hazkani-Covo, Erez. Levanon, Galit databases include annotative information on metabolic Rotman, Dan Graur and Amit Novik; (2004). Evolution of (http://www.genome.adjp/kegg/metabolism.html) and regu multicellularity in metazoa: comparative analysis of the latory pathways (http://www.genome.adjp/kegg/regula Subcellular localization of proteins in Saccharomyces; Drosophila and Caenorhabditis. Cell Biology International tion.html), and protein-protein interactions (http:/dip.doe (in press), which predicts protein localization based on mbi.ucla.edu/), etc. various parameters including, protein domains (e.g., predic 0146 Alternatively, ontologies can be generated from an tion of trans-membranous regions and localization thereof analysis of at least one biomolecular sequence in each of the within the protein), pI, protein length, amino acid compo clusters of the present invention. sition, homology to pre-annotated proteins, recognition of sequence patterns which direct the protein to a certain 0147 Preferably, analysis of the biomolecular sequence organelle (such as, nuclear localization signal, NLS, mito is effected by literature text mining. Since manual review of chondria localization signal), * signal peptide and anchor related-literature may be a daunting task, computational modeling and using unique domains from Pfam that are extraction of text information is preferably effected. specific to a single compartment. 0148 Thus, the method of the present invention can also process literature and other textual information and utilize 0154) Other examples for cellular localization prediction processed textual data for generating additional ontological software include PSORT Prediction of protein sorting sig annotations. For example, text information contained in the nals and localization sites and TargetP Prediction of sub sequence-related publications and definition lines in cellular location, both available from http://www.expasy sequence records of sequence databases can be extracted and .org/tools/, See also Example 22. processed. Ontological annotations derived from processed 0.155 Prediction of functional annotations may also be text data are then assigned to the sequences in the corre effected by motif analysis of the biomolecular sequences of sponding clusters. the present invention. Thus for example, by implementing any motif analysis software, which is based on protein 0149 Ontological annotations can also be extracted from homology (see for example, http://motif.genome.ad.jp? and sequence associated Medical subject heading (MeSH) terms http://www.accelrys.com/products/grailprofindex.html), it is which are assigned to published papers. possible to predict functional motifs of DNA sequences 0150. Additional information on text mining is provided including repeats, promoter sequences and CpG islands and in Example 7 of the Examples section and is disclosed in of encoded proteins such as Zinc finger and leucine Zipper. “Mining Text Using Keyword Distributions. Ronen Feld Such functional annotations may also be extracted from US 2006/0068405 A1 Mar. 30, 2006

databases of protein families, domains and functional sites 0.165 Although the present methodology can be effected such as InterPro (http:Hwww.ebi.ac.uk/interpro/). using prior art systems modified for Such purposes, due to 0156 Functional annotations may also be extracted by the large amounts of data processed and the vast amounts of adopting annotations from ortholohgous species (i.e., from processing needed, the present, methodology is preferably different species) such as, for example, from viral proteoms. effected using a dedicated computational system. Viral proteins have evolved to defy the host immune system 0166 Thus, according to another aspect of the present and as such may provide functional annotations to ortholo invention and as illustrated in FIGS. 1a-b, there is provided hgous proteins which exhibit sufficient level of homology in a system for generating a database of annotated biomolecu at least functional domains thereof. As such, such an anno lar sequences. tation may be, for example, “immune system related'. 0.167 System 10 includes a processing unit 12, which Detailed description of the method which is used to obtain executes a Software application designed and configured for Such annotations is provided in U.S. patent application No. annotating biomolecular sequences, as described herein 60/480,752. above. System 10 further serves for storing biomolecular 0157 Due to the progressive nature of the clusters of the sequence information and annotations in a retrievable/ present invention, ontology assignment starts at the highest searchable database 18. Database 18 further includes infor level of homology. Any biomolecular sequence in the clus mation pertaining to database generation. ter, which shares identical level of homology compared to an ontologically annotated protein in the cluster is assigned the 0168 System 10 may also include a user interface 14 same ontological annotation. This procedure progresses (e.g., a keyboard and/or a mouse, monitor) for inputting from the highest level of homology to a lower threshold database or database related information, and for providing level with a predetermined increment resolution. Newly database information to a user. discovered homologies enable assigninent of existing onto 0.169 System 10 of the present invention may be any logical annotations to biomolecular sequences sharing computing platform known in the art including but not homologous sequences and being previously unannotated or limited to a personal computer, a work station, a mainframe partially annotated (see Examples 5-9 of the Examples and the like. section). 0170 Preferably, database 18 is stored on a computer 0158. Once assignment of an annotation is effected, readable media Such as a magnetic optico-magnetic or annotated clusters are disassembled resulting in annotation optical disk. of each biomolecular sequence of the cluster. 0171 System 10 of the present invention may be used by 0159. Such annotated biomolecular sequences are then a user to query the stored database of annotations and tested for false annotation. This is effected using the fol sequence information to retrieve biomolecular sequences lowing scoring parameters: stored therein according to inputted annotations or to 0160 (i). A degree of homology characterizing the pro retrieve annotations according to a biomolecular-sequence gressive cluster—accuracy of the annotation directly corre query. lates with the homology level used for the annotation 0.172. It will be appreciated that the connection between process (see Examples 7-9 and 22 of the Examples section). user interface 14 and processing unit 12 is bidirectional. Likewise, processing unit 12 and database 18 also share a 0161 (ii) Relevance of annotation to information two-way communication channel, wherein processing unit obtained from literature text mining—each assigned onto 12 may also take input from database 18 in performing logical annotation which results from literature text mining annotations and iterative annotations. Further, user interface or functional or cellular prediction is assessed using scoring 14 is linked directly to database 18, such a user may dispatch parameters such as LOD score (For further details see queries to database 18 and retrieve information stored Example 7 of the Examples section). therein. As such, user interface 14 allows a user to compile 0162 The present invention also enables the use of the queries, send instructions, view querying results and per homologies identified according to the teachings of the forming specific analyses on the results as needed. present invention to annotate more sensitively and rapidly a 0.173) In performing ontological annotations, processing query sequence. Essentially this involves building a unit 12 may take input from one or more application sequence profile for each annotated cluster. A profile enables modules 16. Application module 16 performs a specific scoring of a biomolecular sequence according to functional operation and produced a relevant annotative input for domains along a sequence and generally makes searches processing unit 12. For example, application module 16 may more sensitive. Essentially, clustered sequences are also perform cellular localization analysis on a biomolecular tested for relevance to the cluster based upon shared func sequence query thereby determining the cellular localization tional domains and other characteristic sequence features. of the encoded protein. Such a functional annotation is then 0163 Ontologically annotated biomolecular sequences input to and used by processing unit 12. Examples for are stored in a database for further use. Additional informa application Software for cellular localization prediction are tion on generation and contents of Such databases is pro provided hereinabove. vided hereinunder. 0.174 System 10 of the present invention may also be 0164. Such a database can be used to query functional connected to one or more external databases 20. External domains and sequences comprising thereof. Alternatively, database 20 is linked to processing unit 12 in a bidirectional the database can be used to query a sequence, and retrieve manner, similar to the connection between database 18 and the compatible annotations. processing unit 12. External database 20 may include any US 2006/0068405 A1 Mar. 30, 2006 background information and/or sequence information that Bank synonyms, building anatomical hierarchies, enabling pertains to the biomolecular sequence query. External data flexible distinction between tissue types (normal versus base 20 may be a proprietary database or a-publicly avail pathology) and tissue classification levels (organs, systems, able database which is accessible through a public network cell types, etc.). such as the Internet. External database 20 may feed relevant information to processing unit 12 as it effects iterative 0182. It will be appreciated that the dendrogram of the ontological annotation. External database 20 may also present invention can be illustrated as a graph, a list, a map receive and store ontological annotations generated by pro or a matrix or any other graphic or textual organization, cessing unit 12. In this case external database 20 may which can * describe a dendrogram. An example of a interact with other components of system 10 like database dendrogram illustrating the gastrointestinal tissue hierarchy 18. is provided in FIG. 2. 0175. It will be appreciated that the databases and appli 0183 In a second step, each of the biomolecular cation modules of system 10 can be directly connected with sequences is assigned to at least one specific node of the processing unit 12 and/or user interface 14 as is illustrated dendrogram. in FIG. 1a, or such a connection can be achieved via a 0.184 The biomolecular sequences according to this network 22, as is illustrated in FIG. 1b. aspect of the present invention can be annotated biomolecu 0176 Network 22 may be a private network (e.g., a local lar sequences, unannotated biomolecular sequences or par area network), a secured network, or a public network (Such tially annotated biomolecular sequences. as the Internet), or a combination of public and private 0185 Annotated biomolecular sequences can be and/or secured networks. retrieved from pre-existing annotated databases as described 0177 Thus, the present-invention provides a well-char hereinabove. acterized approach for the systemic annotation of biomo lecular sequences. The use of text information analysis, 0186 For example, in GenBank, relevant annotational annotation scoring system and robust sequence clustering information is provided in the definition and keyword fields. procedure enables, for the first time, the creation of the best In this case, classification of the annotated biomolecular -possible annotations and assignment thereof to a vast sequences to the dendrogram nodes is directly effected. A number of biomolecular sequences sharing homologous search for Suitable annotated biomolecular sequences is sequences. The availability of ontological annotations for a performed using a set of keywords which are designed to significant number of biomolecular sequences from different classify the biomolecular sequences to the hierarchy (i.e., species can provide a comprehensive account of sequence, same keywords that populate the dendrogram). structural and functional information pertaining to the bio 0187. In cases where the biomolecular sequences are molecular sequences of interest. unannotated or partially annotated, extraction of additional annotational information is effected prior to classification to The Hierarchical Annotation Approach dendrogram nodes. This can be effected by sequence align 0178 "Hierarchical annotation” refers to any ontology ment, as described hereinabove. Alternatively, annotational and Subontology, which can be hierarchically ordered. information can be predicted from structural studies. Where Examples include but are not limited to a tissue expression needed, nucleic acid sequences can be transformed to amino hierarchy, a developmental expression hierarchy, a patho acid sequences to thereby enable more accurate annotational logical expression hierarchy, a cellular expression hierarchy, prediction. an intracellular expression hierarchy, a taxonomical hierar 0188 Finally, each of the assigned biomolecular chy, a functional hierarchy and so forth. sequences is recursively classified to nodes hierarchically 0179 According to another aspect of the present inven higher than the, specific nodes, such that the root node of the tion there is provided a method of annotating bioiholecular dendrogram encompasses the full biomolecular sequence sequences according to a hierarchy of interest. The method set, which can be classified according to a certain hierarchy, is effected as follows. while the offspring of any node represent a partitioning of the parent set. 0180 First, a dendrogram representing the hierarchy of interest is computationally constructed. As, used herein a 0189 For example, a biomolecular sequence found to be “dendrogram' refers to a branching diagram containing specifically expressed in "rhabdomyosarcoma', will be clas multiple nodes and representing a hierarchy of categories sified also to a higher hierarchy level, which is "sarcoma', based on a degree of similarity or number of shared char and then to "Mesenchimal cell tumors” and finally to a acteristics. highest hierarchy level “Tumor. In another example, a sequence found to be differentially expressed in 0181. Each of the multiple nodes of the dendrogram is endometrium cells, will be classified also to a higher hier annotated by at least one keyword describing the node, and archy level, which is “uterus', and then to “women genital enabling literature and database text mining, as is further system’’ and to 'genital system” and finally to a highest described hereinunder. A list of keywords can be obtained hierarchy level “genitourinary system'. The retrieval can be from the GO Consortium (www.geneontlogy.org); measures performed according to each one of the requested levels. are taken to include as many keywords, and to include keywords which might be out of date. For example, for 0.190 Since annotation of publicly available databases is tissue annotation (see FIG. 4), a hierarchy was built using all at times unreliable, newly annotated biomolecular sequences available tissue/libraries sources available in the GenBank, are confirmed using computational or laboratory approaches while considering the following parameters: ignoring Gen as is further described hereinbelow. US 2006/0068405 A1 Mar. 30, 2006

0191 It will be appreciated that once temporal or spatial application designed and configured for hierarchically anno annotations of sequences are established using the teachings tating biomolecular sequences as described hereinabove. of the present invention, it is possible to identify those The system further serves for storing biomolecular sequence sequences, which are differentially expressed (i.e., exhibit information and annotations in a retrievable/searchable data spatial or temporal pattern of expression in diverse cells or base. tissues). Such sequences are assigned to only a portion of the nodes, which constitute the hierarchical dendrogram. 0198 Annotation of differentially expressed alternatively spliced sequences Although numerous methods have been 0192 Changes in gene expression are important deter developed to identify differentially expressed genes, none of minants of normal cellular physiology, including cell cycle these addressed splice variants, which occur in over 50% of regulation, differentiation and development, and they human genes. Given the common sequence features of directly contribute to, abnormal cellular physiology, includ splice variants it is very difficult to identify splice variants ing developmental anomalies, aberrant programs of differ which expression is differential, using prior art methodolo entiation and cancer. Accordingly, the identification, cloning gies. Therefore assigning unique sequence features to dif and characterization of differentially expressed genes can ferentially expressed splice variants may have an important provide relevant and important insights into the molecular impact to the understanding of disease development and determinants of processes such as growth, development, may serve as valuable markers to various pathologies. aging, differentiation and cancer. Additionally, identification of Such genes can be useful in development of new drugs 0199 Thus, according to another aspect of the present and diagnostic methods for treating or preventing the occur invention there is provided a method of identifying sequence rence of Such diseases. features unique to differentially expressed mRNA splice 0193 Newly annotated sequences identified according to variants. The method is effected as follows. the present invention are tested under physiological condi 0200 First, unique sequence features are computation tions (i.e., temperature, pH, ionic strength, Viscosity, and like ally identified in splice variants of alternatively spliced biochemical parameters which are compatible with a viable expressed sequences. organism, and/or which typically exist intracellularly in a viable cultured yeast cell or mammalian cell). This can be 0201 As used herein the phrase “splice variants' refers to effected using various laboratory approaches such as, for naturally occurring nucleic acid sequences and proteins example, FISH analysis, PCR, RT-PCR, real-time PCR, encoded therefrom which are products of alternative splic Southern blotting, northern blotting, electrophoresis and the ing. Alternative splicing refers to intron inclusion, exon like (see Examples 13-20 and 27 of the Examples section) or exclusion, or any addition or deletion of terminal sequences, more elaborate approaches which are detailed in the Back which results in sequence dissimilarities between the splice ground section. variant sequence and the wild-type sequence. 0194 It will be appreciated that true involvement of 0202 Although most alternatively spliced variants result differentially expressed genes in a biological process is from alternative exon usage. Some result from the retention better confirmed using an appropriate cell or animal model, of introns not spliced-out in the intermediate stage of RNA as further described hereinunder. transcript processing. 0.195 The hierarchical annotation approach enables to 0203 As used herein the phrase “unique sequence fea assign an appropriate annotation level even in cases where tures' refers to donor/acceptor concatenations (i.e., exon expression is not restricted to a specific tissue type or cell exon junctions), intron sequences, alternative exon type. For example, differentially expressed sequences of a sequences and alternative polyadenylation sequences. single contig which are annotated as being expressed in several different tissue types of a single specific, organ or a 0204. Once a unique sequence feature is identified, the specific system, are also annotated by the present invention expression pattern of the splice variant is determined. If the to a higher hierarchy level denoting association with the splice variant is differentially expressed then the unique specific organ or system. In Such cases usinig keywords feature thereof is annotated accordingly. alone would not efficiently identify differentially expressed sequences. Thus for example, a sequence found to be 0205 Alternatively spliced expressed sequences of this expressed in sarcoma, Ewing sarcoma tumors, pnet, rhab aspect of the present invention, can be retrieved from domyosarcoma, liposarcoma and mesenchymal cell tumors, numerous publicly available databases. Examples include can not be assigned to specific sarcomas, but still can be but are not limited to ASDB an alternative splicing data annotated as mesenchymal cell tumor specific. Using this base generated using GenBank and Swiss-Prot annotations hierarchical annotation approach in combination with (http://cbc.g.nersc.gov/asdb, ASMaml)B-a database of advanced sequence clustering and assembly algorithms, alternative splices in human, mouse and rat (http:// capable of predicting alternative splicing, may facilitate a 166.111.30.65/ASMAMDB.html). Alternative splicing data simple and rapid identification of gene expression patterns. base—a database of alternative splices from literature (http://cgsigm.cshl.org/new alt exon db2/), Yeast intron 0196. Although the present methodology can be effected database - Database of intron in yeast (http://www.cse.uc using prior art systems modified for Such purposes, due to sc.edu/research/compbio/yeast introns.html). The Intron the large amounts of data processed and the vast amounts of erator—alternative splicing in C. elegans based on analysis processing needed, the present methodology is preferably of EST data (http://www.cse.ucsc.edu/~kent/intronerator), effected using a dedicated computational system. ISIS Intron Sequence Information System including a sec 0.197 Such a system is described hereinabove. The sys tion of human alternative splices (http://isis.bit.uq.edu.au/), tem includes a processing unit which executes a software TAP Transcript Assembly Program result of alternative US 2006/0068405 A1 Mar. 30, 2006

splicing (http:/stl.wustl.edu/-zkan/TAP/) and HASDB da be insufficient, as the latter can be an un-spliced EST rather tabase of alternative splices detected in human EST data. than a biological significant intron inclusion. Therefore measures are taken to focus on mutually exclusive splice 0206. Additionally, alternative splicing sequence data variants, two different splice variants observed in different utilized by this aspect of the present invention can be ESTs, which overlap in a genomic sequence. A more strin obtained by any of the following bioinformatical gent filtering may be applied by requiring two splice variants approaches. (i) Genomically aligned ESTs—the method to share one splice site but differ in another. Another filter identifies ESTs which come from the same gene and looks which can be used to identify true splicing events is for differences between them that are consistent with alter sequence conservation. Essentially, exons and the borders of native splicing, such as large insertion or deletion in one human introns which are identified in mice genome and/or EST. Each candidate splice variant can be -further assessed Supported by mouse ESTs are considered true splicing by aligning the ESTs with respective genomic sequence. events (see Example 21 of the Examples section). This reveals candidate exons (i.e., matches to the genomic sequence) separated by candidate splices (i.e., large gaps in 0212. Once splice variants are identified, identification of the EST-genomic alignment). Since intronic sequences at unique sequence features there within can be effected com splice junctions (i.e., donor/acceptor concatenations) are putationally by identifying insertions, deletions and donor highly conserved (essentially 99.24% of introns have a acceptor concatenations in ESTs relative to mRNA and GT-AG at their 5' and 3' ends, respectively) sequence data preferably genomic sequences. can be used to verify candidate splices Burset et al. (2000) 0213 As mentioned hereinabove, once alternatively Nucleic Acids Res. 28:4364-75 LEADS module Shoshan, spliced sequences (having unique sequence features) are et al. Proceeding of SPIE (eds. M. L. Bittner, Y. Chen, A. N. identified, determination of their expression patterns is Dorsel, E. D. Dougherty) Vol. 4266, pp. 86-95 (2001).R. effected in order to assign an annotation to the unique Sorek, G. Ast, D. Graur, Genome Res. In press; Compugen sequence feature thereof. Ltd. U.S. patent application Ser. No. 09/133,987). 0214) Expression pattern identification may be effected 0207 (ii) Identification based on intron information— by qualifying annotations which are preassociated with the The method creates a database of individual intron alternatively spliced expressed sequences, as described here sequences annotated in GenBank and utilizes Such inabove. This can be accomplished by scoring the annota sequences to search for EST sequences which include the tions. For example scoring pathological expression annota intronic sequences Croft et al. (2000) Nat. Genet. 24:340 tions can be effected according to: (i) prevalence of the 1. alternatively spliced expressed sequences in normal tissues; (ii) prevalence of the alternatively spliced expressed 0208 (iii) EST alignment to expressed sequences—looks sequences in pathological tissues; (iii) prevalence of the for insertions and deletions in ESTs relative to a set of alternatively spliced expressed sequence in total tissues; and known mRNAs. Such a method enables to uncover alterna (iv) number of tissues and/or tissue types expressing the tively spliced variants with having to align ESTs with alternatively spliced expressed sequences. Preferably genomic sequence Brett et al. (2000) FEBS Lett. 474-83 expression pattern of alternatively spliced sequences is 86). determined as described in the “Frequency-based annotative 0209. It will be appreciated that in order to avoid false approach' section, which follows. positive identification of novel splice isoforms, a set of 0215. Alternatively, identifying the expression pattern of filters is applied. For example, sequences are filtered to the alternatively spliced expressed sequences of the present exclude ESTs having sequence deviations, such as chimer invention, is accomplished by detecting the presence of the ism, random variation in which a given EST sequence or unique sequence feature in biological samples. This can be potential vector contamination at the ends of an EST. effected by any hybridization-based technique known in the 0210 Filtering can be effected by aligning ESTs with art, such as northern blot, dot blot, RNase protection assay, corresponding genomic sequences. Chimeric ESTs can be RT-PCR and the like. easily excluded by requiring that each EST aligns com 0216) To this end oligonucleotides probes, which are pletely to a single genomic locus. Genomic location found Substantially homologous to nucleic acid sequences that by homology search and alignment can often be checked flank and/or extend across the unique sequence features of against radiation hybrid mapping data Muneer et al (2002) the alternatively spliced expressed sequences of the present Genomic 79:344-8). Furthermore, since the genomic regions invention are generated. which align with an EST sequence correspond to exon sequences and alignment gaps correspond to introns, the 0217 Preferably, oligonucleotides which are capable of putative splice sites at exon/intron boundaries can be con hybridizing under Stringent, moderate or mild conditions, as firmed. Because splice donor and acceptor sites primarily used in any polynucleotide hybridization assay are utilized. reside within the intron sequence, this methodology can Further description of hybridization conditions is provided provide validation which is independent of the EST evi hereinunder. dence. Reverse transcriptase artifacts or other cDNA syn 0218. Oligonucleotides generated by the teachings of the thesis errors may also be filtered out using this approach. present invention may be used in any modification of nucleic Improper inclusion of genomic sequence in ESTs can also be acid hybridization based techniques, which are further excluded by requiring pairs of mutually exclusive splices in detailed hereinunder. General features of oligonucleotide different ESTs. synthesis and modifications are also provided hereinunder. 0211 Additionally, it will be appreciated that observing a 0219 Aside from being useful in identifying specific given splice variant in one EST but not in a second EST may splice variants, oligonucleotides generated according to the: US 2006/0068405 A1 Mar. 30, 2006 teachings of the present invention may also be widely used 0226 Identification of functional domains can be effected as diagnostic, prognostic and therapeutic agents in a variety by comparing a wild-type gene product with a series of of disorders which are associated with the polynucleotides profiles prepared by alignment of well characterized proteins of the present invention (e.g., specific splice variants). from a number of different species. This generates a con 0220 For example, regulation of splicing is involved in sensus profile, which can then be matched with the query 15% of genetic diseases Krawzczak et al. (1992) Hum. sequence. Examples of programs Suitable for Such identifi Genet. 90:41-54) and may contribute for example to cancer cation include, but are not limited to, InterPro Scan— mis-splicing of exon 18 in BRCA1, which is caused by a Integrated search in PROSITE. Pfam, PRINTS and other polymorphism in an exonic enhancer Liu et al. (2001) family and domain databases; ScanProSite—Scans a Nature Genet. 27:55-58). sequence against PROSITE or a pattern against SWISS 0221) Thus, oligonucleotides generated according to the PROT and TrEMBL: MotifScan Scans a sequence against teachings of the present invention can be included in diag protein profile databases (including PROSITE); Frame-Pro nostic kits. Such kits, may include oligonucleotides which fileScan Scans a short DNA sequence against protein are directed to the newly uncovered splice variant alone and profile databases (including PROSITE); Pfam HMM also to previously uncovered splice variants or wild-type search—Scans a sequence against the Pfam protein families (w.t) sequences of the same gene which were previously database: FingerPRINTScan Scans a protein sequence associated with a disease of interest. For example, oligo against the PRINTS Protein Fingerprint Database: FPAT nucleotides sets pertaining to a specific disease associated Regular expression searches in protein databases; PRATT with differential expression of an alternatively spliced tran Interactively generates conserved patterns from a series of Script can be packaged in a one or more containers with unaligned proteins; PPSEARCH-Scans a sequence against appropriate buffers and preservatives along with Suitable PROSITE (allows a graphical output); at EBI; PROSITE instructions for use and used for diagnosis or for directing scan Scans a sequence against PROSITE (allows mis therapeutic treatment. Additional information on Such diag matches); at PBIL: PATTINPROT Scans a protein nostic kits is provided hereinunder. sequence or a protein database for one or several pattern(s): at PBIL: SMART Simple Modular Architecture Research 0222. It will be appreciated that an ability to identify Tool; at EMBL: TEIRESIAS Generate patterns from a alternatively spliced sequences, also facilitates identification collection of unaligned protein or DNA sequences; at IBM, of the various products of alternative splicing. all available from http://www.expasy.org/tools/. 0223 Recent studies indicate that most alternative splic 0227. It will be appreciated that functionally altered ing events result in an altered protein product International splice variants may also include a sequence alteration at a human genome sequencing consortium (2001) Nature post-translation modification consensus site. Such as, for 409:860-921; Modrek et al. (2001) Nucleic Acids Res. example, a tyrosine Sulfation site, a glycosylation site, etc. 29:2850-2859). The majority of these changes appear to Examples of post-translational modification prediction pro have a functional relevance (i.e., up-regulating or down grams include but are not limited to: SignalP Prediction of regulating activity, "gain of function' or “loss of function'. signal peptide cleavage sites; ChloroP Prediction of chlo respectively. See terminology section). Such as the replace roplast transit peptides: MITOPROT Prediction of mito ment of the amino or carboxyl terminus, or in-frame addition chondrial targeting sequences; Predotar—Prediction of and removal of a functional domain. For example, alterna mitochondrial and plastid targeting sequences; NetOGlyc— tive splicing can lead to the use of a different site for Prediction of type O-glycosylation sites in mammalian pro translation initiation (i.e., alternative initiation), a different teins; DictyCGlyc Prediction of GlcNAc O-glycosylation translation termination site due to a frameshift (i.e., trunca sites in Dictyostelium; YinOYang O-beta-GlcNAc attach tion or extension), or the addition or removal of a stop codon ment sites in eukaryotic protein sequences; big-PI Predic in the alternative coding sequence (i.e., alternative termini tor GPI Modification Site Prediction; DGPI Prediction ation). Additionally, alternative splicing can change an inter of GPI-anchor and cleavage sites (Mirror site); NetPhos— nal sequence region due to an in-frame insertion or deletion. Prediction of Serine. Threonine and Tyrosine phosphoryla One example of the, latter is the new FC receptor B-like tion sites in eukaryotic proteins; NetPicoRNA Prediction protein, whose C-terminal transmembrane domain and cyto of protease cleavage sites in picornaviral proteins; NMT - plasmic tail, which is important for signal transduction in Prediction of N-terminal N-myristoylation; Sulfinator Pre this class of receptors, is replaced with a new transmem diction of tyrosine sulfation sites all available from http:// brane domain and tail by alternative polyadenylation. www.expasy.org/tools/. Another example is the truncated Growth Hormone Recep tor, which lacks most of its intracellular domain and has 0228. Once putative functionally altered splice variants been shown to heterodimerize with the full-length receptor, are identified, they are validated by experimental verifica thus causing inhibition of signaling by Growth Hormone tion and functional studies, using methodologies well known Ross, R. J. M., Growth hormone & IGF Research, 9:42-46, in the art. (1999). 0229. The Examples section which follows illustrates 0224 Thus, identifying splice variants having unique identification and annotation of splice variants. Identified sequence features enables annotation and thus identification and annotated sequences are contained within the enclosed of functionally altered variants. CD-ROMs 1-4. Some of these sequences represent (i.e., are 0225 Identification of putative functionally altered splice transcribed from) entirely new splice variants, while others variants, according to this aspect of the present invention, represent new splice variants of known sequences. In any can be effected by identifying sequence deviations from case, the sequences contained in the enclosed CD-ROMs are functional domains of wild-type gene products. novel in that they include previously undisclosed sequence US 2006/0068405 A1 Mar. 30, 2006 regions in the context of a known gene or an entirely new expressed in each of the at least two tissue types is prefer sequence in the context of an unknown gene. ably effected by statistical pairing analysis. Examples of statistical tests which can be used in accordance with the Frequency-Based Annotative Approach present invention include, but are not limited to, chi square, Fisher's exact test, phi, Yule's Q. Lambda and Tau b. 0230. The present invention also contemplates spatial Preferably; to calculate an exact p-value for a two by two and temporal gene annotations through comparing relative frequency table with a small number of expected frequen abundance in libraries of different origins. cies, Fisher's exact test is used. 0231. Thus, according to still another aspect of the 0241 Genes exhibiting differential pattern of expression present invention there is provided a method of comparing uncovered using the methodology of the present invention an expression level of a gene of interest in at least two types can be efficiently utilized as tissue markers and as putative of tissues. drug targets. 0232. As used herein the phrase “at least two types of 0242. As mentioned above, alternatively spliced tran tissues” refers to tissues of different developmental origin, Scripts may be extremely useful as cancer markers and different pathological origin or different cellular composi draugs, since it appears likely that there may be striking tion. contrasts in usage of alternatively spliced transcript variants 0233. The method is effected by obtaining a contig between normal and tumor tissue in alterations in the general assembled from a plurality of expressed sequences (e.g., levels of gene expression Caballero Dis Markers. ESTs, mRNAs) representing the gene of interest; and com (2001):17(2):67-75). paring the number of the plurality of expressed sequences 0243 For example, members of the CD44 family of cell corresponding to the contig, which are expressed in each of Surface hyaluronate-binding proteins have been implicated the at least two tissue types, to thereby compare the expres in cell migration, cell-matrix interactions and tumor pro sion level of the gene of interest in the at least two tissue gression. Interestingly, normal spinal nerves and primary types. Schwann cell cultures express standard CD44 (CD44s) but 0234 Expressed sequences for generating the contig of not alternatively spliced variant isoforms. In contrast, this aspect of the present invention can be retrieved from Schwann cell tumors express both “wild-type” CD44 and a pre-existing publicly available databases or generated as number of variants. Implicating a role for CD44 splice described in the “ontological annotation approach” section variants in cancer and as such in the development of potent hereinabove. diagnostic and therapeutic tools. 0235 A number of sequence assembly software are 0244 Thus, the present invention also envisages compar known in the art, which can be used to generate the contig ing an expression level of at least two splice variants of a of the gene of interest. Such software are described in the gene of interest in a tissue. The method is effected by: 'ontological annotation approach' section hereinabove. 0245) Obtaining a contig including exonal sequence pre 0236 Alternatively, the contig of this aspect of the sentation of the at least two splice, variants of the gene of present invention can be obtained from pre-existing publicly interest, the contig being assembled from a plurality of available databases. Examples include, but are not limited expressed sequences; to, the TIGR database (www.tigr.org), the SANBI database 0246 Identifying at least one contig sequence region (http://www.za.embnet.org/), the SIB database which gen unique to a portion (i.e., at least one and not all) of the at erates contig sequence information from Unigene clusters, least two splice variants of the gene of interest. Identification the MIPS database (http://mips.g.sf.de/,) and the DoTS data of Such unique sequence region is effected using computer base (http://www.allgenes.org/). alignment software such as described hereinabove. 0237. It will be appreciated that the contig according to 0247 Comparing a number of the plurality of expressed this aspect of the present invention can be composed of a sequences in the tissue having the at least one contig plurality of expressed sequences, which present partial or sequence region with a number of the plurality of expressed complete exonal coverage of the gene of interest. sequences not-having the at least one contig sequence 0238 Prior to, concomitant with or following contig region, to thereby compare the expression level of the at assembly, expressed sequences are filtered to exclude least two splice variants of the gene of interest in the tissue. sequences of poor quality (i.e., vector contaminants, low complexity sequences, sequences which originate from 0248 One configuration of the above-described method Small libraries e.g., Smaller than 1000 sequences), and to ology is described in details in Example 23c of the score true expression in the at least two types of tissues. Examples section which follows. 0249 Biomolecular sequences (i.e., nucleic acid and 0239 Expressed sequences, which originate from polypeptide sequences) uncovered using the above-de samples wherein clone frequency reflects mRNA abundance scribed methodology are annotated using the teachings of are highly scored. Thus expressed sequences from “non the present invention. Thus, for example, the hierarchical normalized expression libraries are highly scored, while annotation approach can be used to assign a differentially expressed sequences from “normalized’ libraries are poorly expressed gene product to higher hierarchies. For example, scored. Such scoring rules are described in details in gene products identified by the “Frequency-based annotative Example 23 of the Examples section which follows. approach engine as being overexpresed in prostate tumor, 0240 Comparing the number of the plurality of lung tumor, head and neck tumor, stomach tumor, colon expressed sequences corresponding to the contig which are tumor, mammary tumor, kidney tumor, ovary tumor, uterus/ US 2006/0068405 A1 Mar. 30, 2006

cervix tumor, thyroid tumor, adrenal tumor, pancreas tumor, for example, those listed in. Table 15, below, can be used to liver tumor and skin tumor might also be specific to other treat Such syndromes. Typically, autoimmune disorders are types of epithelial tumors. Gene products identified by the characterized by a number of different autoimmune mani engine as being overexpressed in bone and muscle tumors festations (e.g., multiple endocrine syndromes). For these might also be specific to other types of sarcomas. Gene reasons secreted variants may be used to treat any combi products identified by the engine as being overexpressed in nation of autoimmune phenomena of a disease as detailed in bone marrow tumor, blood cancer, T cell tumor and lymph Table 15, below. The therapeutic effect of these splice nodes tumor may also be specific to other types of blood variants may be a result of (i) competing with autoantigens cancers. Sequence data uncovered by the above described for binding with autoantibodies; (ii) antigen-specific immu methodologies and corresponding annotative data are stored notherapy, essentially Suggesting that systemic administra in a database for future use (see, for example, files “Tran tion of a protein antigen can inhibit the Subsequent genera Scripts nucleotide seqs part1. “Transcripts nucle tion of the immune response to the same antigen (has been otide seqs part2”, “Transcripts nucleotide seqs part3. proved in mice models for Myasthenia Gravis and type I “Transcripts nucleotide seqs part4. “protein seqs, Diabetes). “ProDG seqs”, “Summary table', 'Annotations.gz'. “Transcripts.gz’, and “Proteins.gz' of the enclosed CD 0255 In addition, any novel variant of autoantigens, not necessarily secreted, may be used for 'specific immunoad ROMs 1-4). Sorption' leading to a specific immunodepletion of anti 0250) As mentioned hereinabove, biomolecular bodies when used in immunoadsorption columns. sequences uncovered using the methodology of the present invention can be efficiently utilized as tissue or pathological 0256. It will be appreciated that splice variants of autoan markers and as putative drugs or drug targets for treating or tigens may also have diagnostic value. The diagnosis of preventing the disease. many autoimmune disorders is based on looking for specific autoantibodies to autoantigens known to be associated with 0251 Some examples are summarized infra: a autoimmune condition. Most of the diagnostic techniques 0252 For example, gene products (nucleic acid and/or are based on having a recombinant form of the autoantigen protein products), which exhibit tumor specific expression and using it to look for serum autoantibodies. It is possible (i.e., tumor associated antigens, TAAS) can be utilized for that what is considered an autoantigen is not the “true' in-vitro generation of antibodies and/or for in-vivo immu autoantigen but rather a variant thereof For example, TPO is nization/cancer Vaccination, essentially eliciting an immune a known autoantigen in thyroid autoimmunity. It has been response against Such gene products and cells expressing shown that its variant TPOzanelli also take part in the same (see e.g., U.S. Pat. No. 4,235,877, Vaccine preparation autoimmune process and can bind the same antibodies as is generally described in, for example, M. F. Powell and M. TPO Biochemistry. 2001 February 27:40(8):2572-9.). Anti J. Newman, eds., “Vaccine Design (the Subunit and adjuvant bodies formed against the true autoantigen may bind to other approach).” Plenum Press (NY, 1995); Other references variants of the same gene due to sequence overlap but with describing adjuvants, delivery vehicles and immunization in reduced affinity. Novel splice variant of the genes in Table general include Rolland, Crit. Rev. Therap. Drug Carrier 15 may be revealed as true autoantigens, therefore their use Systems 15:143-198, 1998; Fisher-Hoch et al., Proc. Natl. for detection of autoantibodies is expected to result in a more Acad. Sci. USA 86:317-321, 1989: Flexner et al., Ann. N.Y sensitive and specific test. Acad. Sci. 569:86-103, 1989; Flexner et al., Vaccine 8:17 0257 Apart of clinical applications, the biomolecular 21, 1990; U.S. Pat. Nos. 4,603,112, 4,769,330, and 5,017, sequences of the present invention can find other commer 487; WO 89/01973: U.S. Pat. No. 4,777,127; GB 2,200,651: cial uses such as in the food, agricultural, electro-mechani EP 0,345,242: WO 91/02805; Berkner, Biotechniques cal, optical and cosmetic industries http://www.physic 6:616-627, 1988: Rosenfeld et al., Science 252:431-434, S.unc.edu/~rsuper/XYZweb/XYZchipbiomotors.rs1.doc. 1991: Kolls et al., Proc. Natl. Acad. Sci. USA 91:215-219, http://www.bio.org/er/industrial.asp). For example, newly 1994; Kass-Eisler et-al., Proc. Natl. Acad. Sci. USA uncovered gene products, which can disintegrate connective 90:11498-11502, 1993; Guzman et al., Circulation 88:2838 tissues, can be used as potent anti Scarring agents for 2848, 1993; and Guzman et al., Cir. Res. 73:1202-1207, cosmetic purposes. Other applications include, but are not 1993; Ulmer et al., Science 259:1745-1749, 1993: Cohen, limited to, the making of gels, emulsions, foams and various Science 259:1691-1692, 1993; U.S. Pat. Nos. 4,436,727: specific products, including photographic films, tissue 4,877,611; 4.866,034 and 4,912,094; U.S. Pat. Nos. 6,008, replacers and adhesives, food and animal feed, detergents, 200 and 5,856,462; Zitvogel et al., Nature Med. 4:594-600, textiles, paper and pulp, and chemicals manufacturing (com 1998O. modity and fine, e.g., bioplastics). 0253) The tumor-specific gene products of the present invention, in particular membrane bound, can be utilized as 0258. The nucleic acid sequences of the invention can be targeting molecules for binding therapeutic toxins, antibod "isolated or “purified.” In the event the nucleic acid is ies and Small molecules, to thereby specifically target the geriomic DNA, it is considered "isolated when it does not tumor cell. Alternatively, neoplastic properties of the tumor include coding sequence(s) of a gene or genes immediately specific tumor specific gene products (nucleic acid and/or adjacent thereto in the naturally occurring genome of an protein products) of the present invention, may be benefi organism; although some or all of the 5' or 3' non-coding cially used in the promotion of wound healing and neovas sequence of an adjacent gene can be included. For example, an isolated nucleic acid (DNA or RNA) can include some or cularization in ischemic conditions and diabetes. all of the 5' or 3' non-coding sequence that flanks the coding 0254 Secreted splice variants of known autoantigens sequence (e.g., the DNA sequence that is transcribed into, or associated with a specific autoimmune syndrome. Such as the RNA sequence that gives rise to, the promoter or an US 2006/0068405 A1 Mar. 30, 2006

enhancer in the mRNA). For example, an isolated nucleic (a) incorporated into a vector (e.g., an autonomously repli acid can contain less than about 5 kb (e.g., less than about cating plasmid or virus), (b) incorporated into the genomic 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb) of the 5' and/or 3' DNA of a prokaryote or eukaryote, or (c) part of a hybrid sequence that naturally flanks the nucleic acid molecule in a gene that encodes an additional polypeptide sequence (i.e., cell in which the nucleic acid naturally occurs. In the event a sequence that is heterologous to the nucleic acid sequences the nucleic acid is RNA or mRNA, it is "isolated’ or of the present invention or fragments, other mutants, or “purified from a natural source (e.g., a tissue) or a cell variants thereof). culture when it is substantially free of the cellular compo 0263. The present invention includes naturally occurring nents with which it naturally associates in the cell and, if the sequences of the nucleic acid sequences described above, cell was cultured, the cellular components and medium in allelic variants (same locus; functional or non-functional), which the cell was cultured (e.g., when the RNA or mRNA homologs (different locus), and orthologs (different organ is in a form that contains less than about 20%, 10%, 5%, 1%, ism) as well as degenerate variants of those sequences and or less, of other cellular components or culture medium). fragments thereof. The degeneracy of the genetic code is When chemically synthesized, a nucleic acid (DNA or well known, and one of ordinary skill in the art will be able RNA) is “isolated” or “purified” when it is substantially free to make nucleotide sequences that differ from the nucleic of the chemical precursors or other chemicals used in its acid sequences of the present invention but nevertheless synthesis (e.g., when the nucleic acid is in a form that encode the same proteins as those encoded by the nucleic contains less than about 20%, 10%, 5%, 1%, or less, of the acid sequences of the present invention. The variant chemical precursors or other chemicals). sequences (e.g., degenerate variants) can be used in the same 0259 Variants, fragments, and other mutant nucleic acids manner as naturally occurring sequences. For example, the are also envisaged by the present invention. As noted above, variant DNA sequences of the invention can be incorporated where a given biomolecular sequence represents a new gene into a vector, into the genomic DNA of a prokaryote or (rather than a new splice variant of a known gene), the eukaryote, or made part of a hybridgene. Moreover, variants nucleic acids of the invention include the corresponding (or, where appropriate, the proteins they encode) can be used genomic DNA and RNA. Accordingly, where a given SEQ in the diagnostic assays and therapeutic regimes described ID represents a new gene, variations or mutations can occur below. not only in that nucleic acid sequence, but in the coding 0264. The sequence of nucleic acids of the invention can regions, the non-coding regions, or both, of the genomic also be varied to maximize expression in a particular expres DNA or RNA from which it was made. sion system. For example, as few as one and as many as 0260 The nucleic acids of the invention can be double about 20% of the codons in a given sequence, can be altered Stranded or single-stranded and can, therefore, either be a to optimize expression in bacterial cells (e.g., E. coli).yeast, sense strand, an antisense strand, or a portion (i.e., a frag human, insect, or other cell types (e.g., CHO cells). ment) of either the sense or the antisense strand. The nucleic 0265. The nucleic acids of the invention can also be acids of the invention can be synthesized using standard shorter or longer than those disclosed on CD-ROMs 1, 2 and nucleotides or nucleotide analogs or derivatives (e.g., 4. Where the nucleic acids of the invention encode proteins, inosine, phosphorothioate, or acridine Substituted nucle the protein-encoding sequences can differ from those rep otides), which can alter the nucleic acids ability to pair with resented by specific sequences of file “Protein.seqs' in complementary sequences or to resist nucleases. Indeed, the CD-ROM 2 and “Proteins.gz” in CD-ROM4. For example, stability or solubility of a nucleic acid can be altered (e.g., the encoded proteins can be shorter or longer than those improved) by modifying the nucleic acids base moiety, encoded by one of the nucleic acid sequences of the present Sugar moiety, or phosphate backbone. For example, the invention. Nucleotides can be deleted from, or added to, nucleic acids of the invention can be modified as taught by either or both ends of the nucleic acid sequences of the ToulméNature Biotech. 19:17, (2001) or Faria et al. Na present invention or the novel portions of the sequences that ture Biotech. 19:40-44, (2001), and the deoxyribose phos represent new splice variants. Alternatively, the nucleic phate backbone of nucleic acids can be modified to generate acids can encode proteins in which one or more amino acid peptide nucleic acids PNAS::-see Hyrup et al., (1996) residues have been added to, or deleted from, one or more Bioorganic & Medicinal Chemistry 4:5-23). sequence positions within the nucleic acid sequences. 0261 PNAS are nucleic acid “mimics': the molecule's 0266 The nucleic acid fragments can be short (e.g., natural backbone is replaced by a pseudopeptide backbone 15-30 nucleotides). For example, in cases where peptides are and only the four nucleotide bases are retained. This allows to be expressed therefrom such polynucleotides need only specific hybridization to DNA and RNA under conditions of contain a Sufficient number of nucleotides to encode novel low ionic strength. PNAS can be synthesized using standard antigenic epitopes. In cases where nucleic acid fragments Solid phase peptide synthesis protocols as described, for serve as DNA or RNA probes or PCR primers, fragments, example by Hyrup et al. (supra) and Perry-O'Keefe et al. are selected of a length Sufficient for specific binding to one Proc. Natl. Acad. Sci. USA (1996) 93:14670-675). PNAs of of the sequences representing a novel gene or a unique the nucleic acids described herein can be used in therapeutic portion of a novel splice variant. and diagnostic applications. 0267 Nucleic acids used as probes or primers are often 0262 Moreover, the nucleic acids of the invention referred to as oligonucleotides, and they can hybridize with include not only protein-encoding nucleic acids perse (e.g., a sense or antisense strand of DNA or RNA. Nucleic acids coding sequences produced by the polymerase chain reac that hybridize to a sense Strand (i.e., a nucleic acid sequence tion (PCR) or following treatment of DNA with an endo that encodes protein, e.g., the coding strand of a double nuclease), but also, for example, recombinant DNA that is: stranded cDNA molecule) or to an mRNA sequence are US 2006/0068405 A1 Mar. 30, 2006 referred to as antisense oligonucleotides. Oligonucleotides 0275 More recently, antisense-mediated suppression of which specifically hybridize with the troponin variants of the human heparanase gene expression has been reported to present invention (SEQ ID NOs: 74, 76, 78. 80, 82, 84 and inhibit pleural dissemination of human cancer cells in a 66) and not with wild-type tropoinin are preferably directed mouse model Uno et al. (2001) Cancer Res 61 (21):7855 at the unique nucleic acid sequence set forth in SEQID NO: 60). 87. Alternatively, such oligonucleotides can be directed at a 0276 Thus, the current consensus is that recent develop nucleic acid sequence which bridges the unique sequence ments, in the field of antisense technology which, as with common upstream or downstream sequences (see FIG. described above...have led to the generation of highly accu 21). rate antisense design algorithms and a wide variety of 0268 Antisense oligonucleotides can be used to specifi oligonucleotide delivery systems, enable an ordinarily cally inhibit transcription of any of the nucleic acid skilled artisan to design and implement antisense approaches sequences of the present invention. Suitable for downregulating expression of known sequences without having, to resort to undue trial and error experimen 0269. Design of antisense molecules must be effected tation. while considering two aspects important to the antisense approach. The first aspect is delivery of the oligonucleotide 0277 Antisense oligonucleotides can also be ceano into the cytoplasm of the appropriate cells, while the second meric. nucleic acids, which form specific double-stranded aspect is design of an oligonucleotide which specifically hybrids with complementary RNA in which, contrary to the binds the designated mRNA within cells in a way which usual b-units, the strands run parallel to each other Gaultier inhibits translation thereof. et al., Nucleic Acids Res. 15:6625-6641, (1987). Alterna tively, antisense nucleic acids can comprise a 2'-O-methyl 0270. The prior art teaches of a number of delivery ribonucleotide Inoue et al., Nucleic Acids Res. 15:6131 strategies which can be used to efficiently deliver oligo 6148, (1987) or a chimeric RNA-DNA analogue Inoue et nucleotides into a wide variety of cell types see, for al., FEBS Lett. 215:327-330, (1987). example, Luft (1998) J Mol Med 76(2):-75-6; Kronenwett et 0278. The nucleic acid sequences described above can al. (1998) Blood 91(3): 852-62; Rajur et al. (1997) Biocon also include ribozymes catalytic sequences. Such a jug Chem 8(6): 935-40; Lavigne et al. (1997) Biochem ribozyme will have specificity for a protein encoded by the Biophys Res Commun 237(3): 566-71 and Aokietal. (1997) novel nucleic acids described herein (by virtue of having one Biochem Biophys Res Commun 231(3): 540-5). or more sequences that are complementary to the cDNAs 0271 In addition, algorithms for identifying those that represent novel genes or the novel portions (i.e., the sequences with the highest predicted binding affinity for portions not found in related splice variants) of the their target mRNA based on a thermodynamic cycle that sequences that represent new splice variants. These accounts for the energetics of structural alterations in both ribozymes can include a catalytic sequence encoding a the target mRNA and the oligonucleotide are also available protein that cleaves mRNA see U.S. Pat. No. 5,093,246 or see, for example, Walton et al. (1999) Biotechnol Bioeng Haselhoff and Gerlach, Nature 334:585-591, (1988). For 65(1): 1-9). example, a derivative of a tetrahymena L-19 IVS RNA can be constructed in which the nucleotide sequence of the 0272 Such algorithms have been successfully used to is complementary to the nucleotide sequence to implement an antisense approach in cells. For example, the be cleaved in an mRNA of the invention (e.g., one of the algorithm developed by Walton et al. enabled scientists to nucleic acid sequences of the present invention; see, U.S. Successfully design antisense oligonucleotides for rabbit Pat. Nos. 4,987,071 and 5,116,742). Alternatively, the beta-globin (RBG) and mouse tumor necrosis factor-alpha mRNA sequences of the present invention can be used to (TNF-C) transcripts. The same research group has more select a catalytic RNA having a specific ribonuclease activ recently reported that the antisense activity of rationally ity from a pool of RNA molecules see, e.g., Bartel and selected oligonucleotides against three model target mRNAS Szostak, Science 261:1411–1418, (1993); see also Krolet al., (human lactate dehydrogenase A and B and rat gp130) in cell Bio-Techniques 6:958-976, (1988). culture as evaluated by a kinetic PCR technique proved effective in almost all cases, including tests against three 0279 Alternatively, small interfering RNA oligonucle different targets in two cell types with phosphodiester and otides can be used to specifically inhibit transcription of any phosphorothioate oligonucleotide chemistries. of the nucleic acid sequences of the present invention. RNA interference is a two step process the first step, which is 0273. In addition, several approaches for designing and termed as the initiation step, input dsRNA is digested into predicting efficiency of specific oligonucleotides using an in 21-23 nucleotide (nt) small interfering RNAs (siRNA), vitro system were also published (Matveeva et al. (1998) probably by the action of Dicer, a member of the RNase III Nature Biotechnology 16, 1374- 1375). family of dsRNA-specific ribonucleases, which processes 0274 Several clinical trials have demonstrated safety, (cleaves) dsRNA (introduced directly or via a transgene or feasibility and activity of antisense oligonucleotides. For a virus) in an ATP-dependent manner. Successive cleavage example, antisense oligonucleotides suitable for the treat events degrade the RNA to 19-21 bp duplexes (siRNA), each ment of cancer have been successfully used (Holmund et al. with 2-nucleotide 3' overhangs Hutvagner and Zamore (1999) Curr Opin Mol Ther 1(3):372-85), while treatment of Curr. Opin. Genetics and Development 12:225-232 (2002); hematological malignancies via antisense oligonucleotides and Bernstein Nature 409:363-366 (2001). targeting c-myb gene, p53 and Bcl-2 had entered clinical 0280. In the effector step, the siRNA duplexes bind to a trials and had been shown to be tolerated by patients nuclease complex to from the RNA-induced silencing com Gerwitz (1999) Curr Opin Mol Ther 1(3):297-306). plex (RISC). An ATP-dependent unwinding of the siRNA US 2006/0068405 A1 Mar. 30, 2006 20 duplex is required for activation of the RISC. The active 0286 DNAZymer molecules are capable of specifically RISC then targets the homologous transcript by, base pairing cleaving an mRNA transcript or DNA sequence of interest. interactions and cleaves the mRNA into 12 nucleotide DNAZymes are single-stranded polynucleotides which are fragments from the 3' terminus of the siRNAHutvagner and capable of cleaving both single and double stranded target Zamore Curr. Opin. Genetics and Development 12:225-232 sequences (Breaker, R. R. and Joyce, G. Chemistry and (2002); Hammond et al. (2001) Nat. Rev. Gen. 2:110-119 Biology 1995:2:655; Santoro, S. W. & Joyce, G. F. Proc. (2001); and Sharp Genes. Dev. 15:485-90 (2001). Although Natl. Acad. Sci. USA 1997:943:4262) A general model (the the mechanism of cleavage is still to be elucidated, research “10-23 model) for the DNAZyme has been proposed. indicates that each RISC contains a single siRNA and an “10-23' DNAZymes have a catalytic domain of 15 deoxyri RNase Hutvagner and Zamore Curr. Opin. Genetics and bonucleotides, flanked by two substrate-recognition Development 12:225-232 (2002)). domains of seven to nine deoxyribonucleotides each. This 0281 Because of the remarkable potency of RNAi, an type of DNAZyyme can effectively cleave its substrate RNA amplification step within the RNAi pathway has been sug at purine:pyrimidine junctions (Santoro, S. W. & Joyce, G. gested. Amplification could occur by copying of the input F. Proc. Natl, Acad. Sci. USA 199; for rev of DNAZymes see dsRNAs which would generate more siRNAs, or by repli Khachigian, L. M. Curr Opin Mol Ther 4:119-21 (2002). cation of the siRNAs formed. Alternatively or additionally, 0287 Examples of construction and amplification of syn amplification could be effected by multiple turnover events thetic, engineered DNAZymes recognizing single and of the RISC Hammond et al. Nat. Rev. Gen. 2:110-119 double-stranded target cleavage sites have been disclosed in (2001), Sharp Genes. Dev. 15:485-90 (2001); Hutvagner and U.S. Pat. No. 6,326,174 to Joyce et al. DNAZymes of similar Zamore Curr. Opin. Genetics and Development 12:225-232 design directed against the human Urokinase receptor were (2002). For more information on RNAi see the following recently observed to inhibit Urokinase receptor expression, reviews Tuschl ChemBiochem. 2:239-245 (2001); Cullen and Successfully inhibit colon cancer cell metastasis in vivo Nat. Immunol. 3:597-599 (2002); and Brantl Biochem. (Itoh et al., 20002, Abstract 409, Ann Meeting Am Soc Gen Biophys. Act. 1575:15-25 (2002). Ther www.ast.org). In another application, DNAZymes 0282 Synthesis of RNAi molecules suitable for use with complementary to bcr-abl oncogenes were successful in the present invention can be effected as follows. First, the an inhibiting, the oncogenes expression in leukemia cells, and mRNA sequence of interest is scanned downstream of the lessening relapse rates in autologous bone marrow transplant AUG start codon for AA dinucleotide sequences. Occur in cases of CML and ALL. rence of each AA and the 3' adjacent 19 nucleotides is recorded as potential siRNA target sites. Preferably, siRNA 0288 Oligonucleotides having as few as 9-10 nucleotides target sites are selected from the open reading frame, as (e.g., 12-14, 15-17, 18-20, 21-23, or 24-27 nucleotides)-can untranslated regions (UTRs) are richer in regulatory protein be useful as probes or expression templates-and are within binding sites. UTR-binding proteins and/or translation ini the scope of the present invention. Indeed, fragments that tiation complexes may interfere with binding of the siRNA contain about 15-20 nucleotides can be used in Southern endonuclease complex Tuschl Chemliochem. 2:239-245). blotting, Northern blotting, dot or slot blotting, PCR ampli It will be appreciated though, that siRNAs directed at fication methods (where naturally occurring or mutant untranslated regions may also be effective, as demonstrated nucleic acids are amplified; e.g., RT-PCR), colony hybrid for GAPDH wherein siRNA directed at the 5' UTR mediated ization methods, in situ hybridization, and the like. about 90% decrease in cellular GAPDH mRNA and com 0289. The present invention also encompasses pairs of pletely abolished protein level (www.ambion.com/techlib/ oligonucleotides (these can be used, for example, to amplify tn/91/912.html). the new genes, or portions thereof, or the novel portions of 0283 Second, potential target sites are compared to an the splice variant in, for example, potentially diseased appropriate genomic database (e.g...human, mouse, rat etc.) tissue) and groups of oligonucleotides (e.g., groups that using any sequence alignment software. Such as the BLAST exhibit a certain degree of homology (e.g., nucleic acids that software available from the NCBI server (www.ncbi.nlm are 90% identical to one another) or that share one or more .nih.gov/BLAST/). Putative target sites which exhibit sig functional attributes). nificant homology to other coding sequences are filtered out. 0284 Qualifying target sequences are selected as tem 0290 When used, for example, as probes, the nucleic plate for siRNA synthesis. Preferred sequences are those acids of the invention can be labeled with a radioactive including low G/C content as these have proven to be more isotope (e.g., using polynucleotide kinase to add P-labeled effective in mediating gene silencing as compared to those ATP to the oligonucleotide used as the probe) or an enzyme. with G/C content higher than 55%. Several target sites are Other labels, such as chemiluminescent, fluorescent, or preferably selected along the length of the target gene for colorimetric, labels can be used. evaluation. For better evaluation of the selected siRNAs, a 0291. As noted above, the invention features nucleic negative control is preferably used in conjunction. Negative acids that are complementary to those represented by the control siRNA preferably include the same nucleotide com nucleic acid sequences of the present invention or novel position as the siRNAS but lack significant homology to the portions thereof (i.e., novel fragments) and as such are genome. Thus, a scrambled nucleotide sequence of the capable of hybridizing therewith. In many cases, nucleic siRNA is preferably used, provided it does not display any acids that are used as probes or primers are absolutely or significant homology to any other gene. completely complementary to all, or a portion of the target 0285) DNAZyme molecules can also be used to specifi sequence. However, this is not always necessary. The cally inhibit transcription of any of the nucleic acid sequence of a useful probe or primer can differ from that of sequences of the present invention. a target sequence so long as it hybridizes with the target US 2006/0068405 A1 Mar. 30, 2006

under the Stringency conditions described herein (or the acids with basic side chains (e.g., lysine, arginine, histidine), conditions routinely used to amplify sequences by PCR) to acidic side chains (e.g., aspartic acid, glutamic acid), form a stable duplex. uncharged polar side chains (e.g., glycine, asparagine, 0292 Hybridization of a nucleic acid probe to sequences glutamine, serine, threonine, tyrosine, cysteine), nonpolar in a library or other sample of nucleic acids is typically side chains (e.g., alanine, Valine, leucine, isoleucine, proline, performed under moderate to high Stringency conditions. phenylalanine, methionine, tryptophan), beta-branched side Nucleic acid duplex or hybrid stability is expressed as the chains (e.g., threonine, Valine, isoleucine) and aromatic side melting temperature (Tim), which is the temperature at which chains (e.g., tyrosine, phenylalanine, -tryptophan, histidine). a probe dissociates from a target DNA and, therefore, helps The invention includes polypeptides that include one, two, define the required stringency conditions. To identify three, five, or more conservative amino acid Substitutions, sequences that are related or Substantially identical to that of where the resulting mutant polypeptide has at least one a probe, it is useful to first establish the lowest temperature biological activity that is the same, or Substantially the same, at which only homologous hybridization occurs with a as a biological activity of the wild type polypeptide. particular concentration of salt (e.g., SSC or SSPE). (The 0297 Fragments or other mutant nucleic acids can be terms “identity” or “identical as used herein are equated made by mutagenesis techniques well known in the art, with the terms “homology” or “homologous'). Then, assum including those applied to polynucleotides, cells, or organ ing a 1% mismatch re.quires a 1° C. decrease in the Tm, the isms (e.g., mutations can be introduced randomly along all temperature of the wash (e.g., the final wash) following the or part of the nucleic acid, sequences of the present inven hybridization reaction is reduced accordingly. For example, tion by Saturation mutagenesis). The resultant mutant pro if sequences having at least 95% identity with the probe are teins can be screened for biological activity to identify those sought, the final wash temperature is decreased by 5° C. In that retain activity or exhibit altered activity. practice, the change in Tm can be between 0.5° C. and 1.5° 0298. In certain embodiments, nucleic acids of the inven C. per 1% mismatch tion differ from the nucleic acid sequences provided in files 0293. The hybridization conditions described here can be “Transcripts nucleotide seqs part1'. “Transcripts nucle employed when the nucleic acids of the invention are used otide seqs part2. in, for example, diagnostic assays, or when it is desirable to “Transcripts nucleotide seqs part3:new'. “Transcripts identify, for example, the homologous genes that fall within nucleotide seqs part4”, “ProDG seqs, and “Tran the scope of the invention (as stated elsewhere, the invention scripts.gz” (provided in CD-ROM1, CD-ROM2 and CD encompasses allelic variants, homologues and orthologues RQM4) by at least one, but less than 10, 20, 30, 40, 50, 100, of the sequences that represent new genes). Homologous or 200 nucleotides or, alternatively, at less than 1%. 5%, genes will hybridize with the sequences that represent new 10% or 20% of the nucleotides in the subject nucleic acid genes under a stringency condition described herein. (excluding, of course, splice variants known in the art). 0294 The following is an example of “high stringency-- Similarly, in certain embodiments, proteins of the invention can differ from those encoded by those included in Files hybridization conditions: :68° C. in (a) 5xSSC/5x Den “Protein.seqs' and “Proteins.gz” (provided in CD-ROM2 hardt’s solution/1.0% SDS, (b) 0.5 M NaHPO (pH 7.2)1 and CD-ROM4) by at least one, but less than 10, 20, 30, 40, mM EDTA/7% SDS, or (c) 50% formamide/0.25 M 50, 100, or 200 amino acid residues or, alternatively, at less NaHPO, (pH 7.2)/0.25 MNaC1/1 mM EDTA/7% SDS, and than 1%. 5%, 10% or 20% of the amino acid residues in a washing is carried out with (a) 0.2xSSC/0.1% SDS at room Subject protein (excluding, of course, proteins encoded. by temperature or at 42°C., (b) 0.1xSSC/0.1% SDS at 68°C., splice variants known in the art (proteins of the invention are or (c) 40 mM NaHPO (pH 7.2)/1 mM EDTA and either 1% described in more detail below)). If necessary for this or 5% SDS at 50° C. analysis (or any other test for homology or substantial 0295) “Moderately stringent hybridization conditions identity described herein), the sequences should be aligned constitute, for example, the hybridization conditions for maximum homology, as described elsewhere here. described above and one or more washes in 3xSSC at 42°C. 0299 The present invention also encompasses mutants Of course, salt concentration and temperature can be varied e.g., naturally-accuning or synthetic nucleic acids that to achieve the optimal level of identity between the probe exhibit...an identity level of at least 50%, at least 55%, at least and the target nucleic acid. This is well known in the art, and 60%, at least 65%, at least 70%, at least 75%, at least 80%, additional guidance is available in, for example, Sambrook at least 85%, at least 90%, say 95-100% to any of the nucleic et al., 1989, Molecular Cloning, A Laboratory Manual, Cold acid sequences set forth in the files “Transcripts nucle Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., otide seqs part1'. “Transcripts nucleotide seqs part2. and Ausubel et al. (eds.), 1995, Current Protocols in Molecu “Transcripts nucleotide seqs part3. “Transcripts nucle lar Biology, John Wiley & Sons, New York, N.Y. otide seqs part4”, “ProDG seqs', and “Transcripts.gz of 0296. As mentioned hereinabove, the nucleic acid the enclosed CD-ROM1, CD-ROM2 and CD-ROM4, as sequences of the present invention can be modified to determined using the BlastN software of the National Center encode substitution mutants of the wild type forms. Substi of Biotechnology Information (NCBI). using default param tution mutants can include amino acid residues that repre eters), which encode proteins that retain substantially at least sent either a conservative or non-conservative change (or, one, or preferably substantially all of the biological activities where more than one residue is varied, possibly both). A of the referenced protein (i.e., encoding a polypeptide hav “conservative' substitution is one in which one amino acid ing an amino acid sequence which exhibits a homology level residue is replaced with another having a similar side chain. of at least 50%, at least 55%, at least 60%, at least 65%, at Families of amino acid residues having similar side chains least 70%, at least 75%, at least 80%, at least 85%, at least have been defined in the art. These families include amino 90%, say 95-100% to any of the amino acid sequences set US 2006/0068405 A1 Mar. 30, 2006 22 forth in the files “protein seqs' and “Proteins.gz' of the moiety. Other useful vectors include pMAL (New England enclosed CD-ROM2 and CD-ROM4, as determined using Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscat the BlastP software of the National Center of Biotechnology away, N.J.), which fuse maltose E binding protein and Information (NCBI) using default parameters). What con protein A, respectively, to a protein of the invention. stitutes “substantially all may vary considerably. For example, in Some instances, a variant or mutant protein may 0304. A signal sequence, when present, can facilitate be about 5% as effective as the protein from which it was secretion of the fusion protein from a cell, and can be derived. But if that level of activity is sufficient to achieve cleaved off by the host cell. The nucleic acid sequences of a biologically significant result (e.g., transport of a Sufficient the present invention can also be fused to “inactivating number of ions across a cell membrane), the variant or sequences, which render the fusion protein encoded, as a mutant protein is one that retains Substantially all of at-least whole, inactive. Such proteins can be referred to as “pre one of the biological activities of the protein from which it proteins,” and they can be converted into an active form of was derived. A "biologically active variant or mutant (e.g., the protein by removal of the inactivating sequence. fragment) of a protein can participate in an intra- or inter 0305 The present invention also encompasses expression molecular interaction that can be characterized by specific constructs (e.g., plasmids, cosmids, and other vectors that binding between molecules two or more identical molecules transport nucleic acids) that include a nucleic acid of the (in which case, homodimerization could occur) or two or invention in a sense or antisense orientation. The nucleic more different molecules (in which case, heterodimerization acids can be operably linked to a regulatory sequence (e.g., could occur). Often, a biologically active fragment will be a promoter, enhancer, or other expression control sequence, recognizable by virtue of a recognizable domain or motif. Such as a polyadenylation signal) that facilitates expression and one can confirm biological activity experimentally. of the nucleic acid. The vector can replicate autonomously More specifically, for example, one can make (by synthesis or integrate into a host genome, and can be a viral vector, or recombinant techniques) a nucleic acid fragment that Such as a replication defective retrovirus, an adenovirus, or encodes a potentially biologically active portion of a protein an adeno-associated virus. of the present invention by inserting the active fragment into an expression vector, and expressing the protein (expression 0306 When present, the regulatory sequence can direct constructs and expression systems are described further constitutive or tissue-specific expression of the nucleic acid. below), and finally assessing the ability of the protein to Tissue-specific promoters include, for example, the liver function. specific albumin promoter (Pinkert et al., Genes Dev. 1:268 277, 1987), lymphoid-specific promoters (Calame and 0300. The present invention also encompasses chimeric Eaton, Adv. Immunol. 43:235-275, 1988), such as those of nucleic acid sequences that, encode fusion proteins. For T cell receptors (Winoto and Baltimore, EMBO J. 8:729 example, a nucleic acid sequence of the invention can 733, 1989) and immunoglobulins (Baneiji et al., Cell include a sequence that encodes a hexa-histidine tag (to 33:729-740, 1982; Queen and Baltimore, Cell 33:741-748, facilitate purification of bacterially-expressed proteins) or a 1983), the neuron-specific neurofilament promoter (Byrne hemagglutinin tag (to facilitate purification of proteins and Ruddle, Proc. Natl. Acad. Sci. USA 86:5473-5477, expressed in eukaryotic cells). 1989), pancreas-specific promoters (Edlund et al., Science 0301 The fused heterologous sequence can also encode 230:912-916, 1985), and mammary gland-specific promot a portion of an inimunoglobulin (e.g., the constant region ers (e.g., milk whey promoter; see U.S. Pat. No. 4,873.316 (Fc) of an IgG molecule), a detectable marker, or a signal and European Application Publication No. 264,166). Devel sequence (e.g., a sequence that is recognized and cleaved by opmentally-regulated promoters can also be used. Examples a signal peptidase in the host cell in which the fusion protein of such promoters include the murine hoX promoters (Kessel is expressed). Fusion proteins containing an Fc region can be and Gruss, Science 249:374-379, 1990) and the fetoprotein purified using a protein A column, and they have increased promoter (Campes and Tilghman, Genes Dev. 3:537-546, stability (e.g., a greater circulating half-life) in vivo. 1989). Moreover, the promoter can be an inducible pro moter. For example, the promoter can be regulated by a 0302) Detectable markers are well known in the art and steroid hormone, a polypeptide hormone, or some other can be used in the context of the present invention. For polypeptide (e.g., that used in the tetracycline-inducible example, the expression vector puR278 (Ruther et al., system, "Tet-On” and "Tet-Off: see, e.g., Clontech Inc. EMBO J., 2:1791, 1983) can be used to fuse a nucleic acid (Palo Alto, Calif.); Gossen and Bujard Proc. Natl. Acad. Sci. of the invention to the lacZ gene (which encodes B-galac USA.89:5547, 1992, and Paillard, Human Gene Therapy tosidase). 9:983, 1989). 0303 A nucleic acid sequence of the invention can also 0307 The expression vector will be selected or designed be fused to a sequence that, when expressed, improves the depending on, for example, the type of host cell to be quantity or quality (e.g., solubility) of the fusion protein. For transformed and the level of protein expression desired. For example, pGEX vectors can be used to express the proteins example, when the host cells are mammalian cells, the of the, invention fused to glutathione S- (GST). In expression vector can include viral regulatory elements, general. Such fusion proteins are soluble and can be easily Such as promoters derived from polyoma, Adenovirus 2, purified from lysed cells by adsorption to glutathione cytomegalovirus and Simian Virus 40. The nucleic acid agarose beads followed by elution in the presence of free inserted (i.e., the sequence to be expressed) can also be glutathione. The pGEX vectors (Pharmacia Biotech Inc: modified to encode residues that are preferentially utilized in Smith and Johnson, Gene 67:31-40, 1988) are designed to E. coli (Wada et al., Nucleic Acids Res. 20:2111-2118, include thrombin or factor Xa protease cleavage sites so that 1992). These modifications can be achieved by standard the cloned target gene product can be released from the GST DNA synthesis techniques. US 2006/0068405 A1 Mar. 30, 2006

0308 Expression vectors can be used to produce the 4.215,051). If desired, mammalian cells can be used in lieu proteins encoded by the nucleic acid sequences of the of insect cells, provided the virus is engineered so that the invention ex vivo (e.g., the expressed proteins can be nucleic acid is placed under the control of a promoter that is purified from expression systems such as those described active in mammalian cells. herein) or in vivoa (in, for example, whole organisms). 0312 Useful mammalian cells include rodent cells, such Proteins can be expressed in vivo in a way that restores as Chinese hamster ovary cells (CHO) or COS cells, primate expression to within normal limits and/or restores the tem cells, such as African green monkey kidney cells, rabbit poral or spatial patterns of expression normally observed. cells, or pig cells). The mammalian cells can also be human Alternatively, proteins can be aberrantly expressed in vivo cells (e.g., a hematopoietic cell, a fibroblast, or a tumor cell). (i.e., at a time or place, or to an extent, that does not normally For example, HeLa cells, 293 cells. 3T3 cells, and WI38 occur in vivo). For example, proteins can be over expressed cells are useful. Other suitable host cells are known to those or under expressed with respect to expression in a wild-type skilled in the art and are discussed further in Goeddel Gene state; expressed at a different developmental stage; Expression Technology: Methods in Enzymology 185, Aca expressed at a different time during- the cell cycle; or expressed in a tissue or cell type where expression does not demic Press, San Diego, Calif., (1990). normally occur. 0313 Proteins can also be produced in plant cells, if desired. For plant cells, viral expression vectors (e.g., cau 0309 The present invention also encompasses various liflower mosaic virus and tobacco mosaic virus) and plasmid engineered cells, including cells that have been engineered expression vectors (e.g., Tiplasmid) are suitable. These cells to express or over-express a nucleic acid sequence described and other types are available from a wide range of Sources herein. Accordingly, the cells can be transformed with a e.g., the American Type Culture Collection, Manassas, Va., expression construct, Such as those described above. A see also, e.g., Ausubel et al., Current Protocols in Molecular “transformed cell is a cell into which (or into an ancestor Biology, John Wiley & Sons, New York, (1994). The of which) one has introduced a nucleic acid that encodes a optimal methods of transformation (by, for example, trans protein of the invention. The nucleic acid can be introduced fection) and, as noted above, the choice of expression by any of the art-recognized techniques for introducing vehicle will depend on the host system selected. Transfor nucleic acids into a host cell (e.g., calcium phosphate or mation and transfection methods are described in, for calcium chloride co-precipitation, DEAE-dextran-mediated example, Ausubel et al., Supra; expression vehicles can be transfection, lipofection, or electroporation). chosen from those provided in, for example, Pouwels et al., 0310. The phrases “transformed cell' or “host cell” refer Cloning Vectors: A Laboratory Manual, (1985), Supp. not only to the particular Subject cell, but also to the progeny (1987). The host cells harboring the expression vehicle can or potential progeny of Such cells. Mutations or environ be cultured in conventional nutrient media, adapted as mental influences may modify the cells in Succeeding gen needed for activation of a chosen nucleic acid, repression of erations and, even though Such progeny may not be identical a chosen nucleic acid, selection of transformants, or ampli to the parent cell, they are nevertheless within the scope of fication of a chosen nucleic acid. the invention. The cells of the invention can be "isolated 0314 Expression systems can be selected based on their cells or “purified preparations' of cells (e.g., an in vitro ability to produce proteins that are modified (e.g., by phos preparation of cells), either of which can be obtained from phorylation, glycosylation, or cleavage) in Substantially the multicellular organisms such as plants and animals (in which same way they would be in a cell in which they are, naturally case the purified preparation would constitute a subset of the expressed. Alternatively, the system can be one in which cells from the organism). In the case of unicellular micro naturally occurring modifications do not occur, or occur in organisms (e.g., microbial cells), the preparation is purified a different position, or to a different extent, than they when at least 10% (e.g., 25%, 50%, 75%, 80%, 90%, 95% otherwise would. or more) of the cells within it are the cells of interest (e.g., 0315) If desired, the host cells can be those of a stably the cells that express a protein of the invention). transfected cell line. Vectors suitable for stable transfection 0311. The expression vectors of the invention can be of mammalian cells are available to the public (see, e.g., designed to express proteins in prokaryotic or eukaryotic Pouwels et al. (Supra) as are methods for constructing them cells. For example, polypeptides of the invention can be (see, e.g., Ausubel et al. (Supra). In one example, a nucleic expressed in bacterial cells (e.g., E. coli), fungi, yeast, or acid of the invention is cloned into an expression vector that insect cells (e.g., using baculovirus expression vectors). For includes the dihydrofolate reductase (DHFR) gene. Integra example, a baculovirus such as Autographa Californica tion of the plasmid and, therefore, the nucleic: acid it nuclear polyhedrosis virus (AcNPV), which grows in contains, into the host cell chromosome is selected for by Spodoptera frugiperda cells, can be used as a vector to including 0.01-300 mM methotrexate in the cell culture express foreign genes. A nucleic acid of the invention can be medium (as described in Ausubel et al., Supra). This domi cloned into a non-essential region (for example the polyhe nant selection can be accomplished in most cell types. drin gene) of the viral genome and placed under control of 0316 Moreover, recombinant protein expression can be a promoter (e.g., the polyhedrin promoter). Successful inser increased by DHFR-mediated amplification of the trans tion of the nucleic acid results in inactivation of the poly fected gene. Methods for selecting cell lines bearing gene hedrin gene and production of non-occluded recombinant amplifications are described in Ausubel et al. (Supra) and virus (i.e., virus lacking the proteinaceous coat encoded by generally involve extended culture in medium containing the polyhedrin gene). These recombinant viruses are then gradually increasing levels of methotrexate. DHFR-contain typically used to infect insect cells (e.g., Spodoptera frugi ing expression vectors commonly used for this purpose petda cells) in which the inserted gene is expressed (see, include pCVSEII-DHFR and p AdD26SV(A) (which are e.g., Smith et al., J. Virol. 46:584, 1983 and U.S. Pat. No. also described in Ausubel et al., Supra). US 2006/0068405 A1 Mar. 30, 2006 24

0317. A number of other selection systems can be used. 742. When a steroid hormone receptor-regulated promoter is These include those based on herpes simplex virus thymi used, protein production can be regulated in the Subject by dine kinase, hypoxanthine-guanine phosphoribosyl-trans administering a steroid hormone to the Subject. Implanted ferase, and adenine phosphoribosyltransferase genes, which recombinant cells can also express and secrete an antibody can be employed in tk, hgprt, or aprt cells, respectively. In that specifically binds to one of the proteins encoded by the addition, gpt, which confers resistance to mycophenolic acid nucleic acid sequences of the present invention. The anti (Mulligan et al., Proc. Natl. Acad. Sci. USA, 78:2072, body can be any antibody or any antibody derivative 1981); neo, which confers resistance to the aminoglycoside described herein. An antibody “specifically binds to a G-418 (Colberre-Garapin et al., J. Mol. Biol. 150:1, 1981): particular antigen when it binds to that antigen but not, to a and hygro, which confers resistance to hygromycin (San detectable level, to other molecules in a sample (e.g., a tissue terre et al., Gene 30:147, 1981), can be used. or cell culture) that naturally includes the antigen. 0318. In view of the foregoing, it is clear that one can 0324 While, the host cells described above express synthesize proteins encoded by the nucleic acid sequences of recombinant proteins, the invention also encompasses cells the present invention (i.e., recombinant proteins). Methods in which gene expression is disrupted (e.g., cells in which a of generating and recombinant proteins are well known in gene has been knocked out). These cells can serve as models the art. Recombinant protein purification can be effected by of disorders that are related to mutated or mis-expressed affinity. Where a-protein of the invention has been fused to alleles and are also useful in drug screening. a heterologous protein (e.g., a maltose binding protein, a 0325 Protein expression can also be regulated in cells B-galactosidase protein, or a trpE protein), antibodies or without using the expression constructs described above. other agents that specifically bind to the latter can facilitate Instead, one can modify the expression of an endogenous purification. The recombinant protein can, if desired, be gene within a cell (e.g., a cell line or microorganism) by further purified (e.g., by high performance liquid chroma inserting a heterologous DNA regulatory element into, the tography or other standard techniques see, Fisher, Labora genome of the cell such that the element is operably linked tory Techniques. In Biochemistry And Molecular Biology. to the endogenous gene. For example, an endogenous gene Eds. Work and Burdon, Elsevier, (1980)). that is “transcriptionally silent.” (i.e., not expressed at detectable levels) can be activated by inserting a regulatory 0319. Other purification schemes are known as well. For element that promotes the expression of a normally example, non-denatured fusion proteins can be purified from expressed gene product in that cell. Techniques such as human cell lines as described by Janknecht et al. (Proc. Natl. Acad. Sci. USA, 88:8972, 1981). In this system, a nucleic targeted homologous recombination can be used to insert the acid is Subdloned into a vaccinia recombination plasmid heterologous DNA (see, e.g., U.S. Pat. No. 5.272,071 and Such that it is translated, in frame, with a sequence encoding WO 91/06667). an N-terminal tag consisting of six histidine residues. 0326. The polypeptides of the present invention include Extracts of cells infected with the recombinant vaccinia the protein sequences contained in the Files “Protein.seqs' virus are loaded onto Ni" nitriloacetic acid-agarose col of CD-ROM2 and “Proteins.gz” of the enclosed CD-ROM4 umns, and histidine-tagged proteins are selectively eluted and those encoded by the nucleic acids described herein (so with imidazole-containing buffers. long as those nucleic acids contain coding sequence and are not wholly limited to an untranslated region of a nucleic acid 0320 Alternatively, Chemical synthesis can also be uti sequence), regardless of whether they are recombinantly lized to generate the proteins of the present invention e.g., produced (e.g., produced in and isolated from cultured proteins can be synthesized by the methods described in cells), otherwise manufactured (by, for example, chemical Solid Phase Peptide Synthesis, 2nd Ed., The Pierce Chemi synthesis), or isolated from a natural biological source (e.g., cal Co., Rockford, Ill., (1984). a cell or tissue) using standard protein purification tech 0321) The invention also features expression vectors that niques. can be transcribed and translated in vitro using, for example, 0327. The terms “peptide,”“polypeptide,” and “protein' a T7 promoter and T7 polymerase. Thus, the invention are used herein interchangeably to refer to a chain of amino encompasses methods of making the proteins described acid residues, regardless of length or post-translational herein in vitro. modification (e.g., glycosylation or phosphorylation). Pro teins (including antibodies that specifically bind to the 0322 Sufficiently purified proteins can be used as products of those nucleic acid sequences that encode protein described herein. For example, one can administer the or fragments thereof) and other compounds can be “iso protein to a patient, use it in diagnostic or screening assays, lated' or “purified.” The proteins and compounds of the or use it to generate antibodies (these methods are described present invention are "isolated or “purified' when they further below). exist as a composition that is at least 60% (e.g., 70%, 75%, 0323 The cells perse can also be administered to patients 80%, 85%, 90%, 95%, or 99% or more) by weight the in the context of replacement therapies. For example, a protein or compound of interest. Thus, the proteins of the nucleic acid of the present invention can be operably linked invention are substantially free from the cellular material (or to an inducible promoter (e.g., a steroid hormone receptor other biological or cell culture material) with which they regulated promoter) and introduced into a human or non may have, at one time, been associated (naturally or other human (e.g., porcine) cell and then into a patient. Optionally, wise). Purity can be measured by any appropriate standard the cell can be cultivated for a time or encapsulated in a method (e.g., column chromatography, polyacrylamide gel biocompatible material. Such as poly-lysine alginate. See, electrophoresis, or HPLC analysis e.g., Lanza, Nature Biotechnol. 14:1107, (1996); Joki et al. 0328. The proteins of the present invention also include Nature Biotechnol. 19:35, 2001; and U.S. Pat. No. 5,876, those encoded by novel. fragments or other mutants (i.e., US 2006/0068405 A1 Mar. 30, 2006

naturally-accurring or synthetic) or variants of the protein of the invention to a limited extent (e.g., by at least one but encoding sequences of the present invention. Thus, the less than 5, 10 or 15 amino acid residues). As with other, present invention envisages polypeptide sequences having more extensive mutations, the differences can be introduced amino acid sequences which exhibit a homology level of at by adding, deleting, and/or substituting one or more amino least 50%, at least 55%, at least 60%, at least 65%, at least acid residues. Alternatively, the mutant proteins can differ 70%, at least 75%, at least 80%, at least 85%, at least 90%, from the wild type proteins from which they were derived by say 95-100% to any of the polypeptide sequences set forth at least one residue but less than 5%, 10%, 15% or 20% of in the files “protein seqs', and “Proteins.gz' of the enclosed the residues when analyzed as described herein. If the CD-ROM2 and CD-ROM4, as determined using the BlastP mutant and wild type proteins are different lengths, they can software of the National Center of Biotechnology Informa be aligned and analyzed using the algorithms described tion (NCBI) using default parameters. These proteins can above. retain substantially all (e.g., 76%, 80%, 90%. 95%, or 99%) of the biological activity of the full-length protein from 0332 Useful variants, fragments, and other mutants of which they were derived and can, therefore, be used as the proteins encoded by the nucleic acids of the invention agonists or mimetics of the. proteins from which they were can be identified by screening combinatorial libraries of derived. The manner in which biological activity can be these variants, fragments, and other mutants for agonist or determined is described generally herein, and specific assays antagonist activity. For example, libraries of fragments (e.g., (e.g., assays of enzymatic activity or ligand-binding ability) N-terminal. C-terminal, or internal fragments) of one or are known to those of ordinary skill in the art. In some more of the proteins of the invention can be used to generate instances, retention of biological activity is not necessary or populations of fragments that can be screened and, once desirable. For example, fragments that retain little, if any, of identified, isolated. The proteins can include those in which the biological activity of a full-length protein can be used as one or more cysteine residues are added or deleted, or in immunogens, which, in turn, can be used as therapeutic which a glycosylated residue is added or deleted. Methods agents (e.g., to generate an immune response in a patient), for screening libraries (e.g., combinatorial libraries of pro teins made from point mutants or cDNA libraries) for diagnostic agents (e.g., to detect the presence of antibodies proteins or genes having a particular property are known in or other proteins in a tissue sample obtained from a patient), the art. These methods can be adapted for rapid screening. or to generate or test antibodies that specifically bind the Recursive ensemble mutagenesis (REM), a new technique proteins of the invention. that enhances the frequency of functional mutants in librar 0329. In other instances, the proteins encoded by nucleic ies, can be used in combination with Screening assays to acids of the invention can be modified (e.g., fragmented or identify useful variants of the proteins of the present inven otherwise mutated) so their activities oppose those of the tion Arkin and Yourvan, Proc. Natl. Acad. Sci. USA naturally occurring protein (i.e., the invention encompasses 89:7811-7815, (1992); Delgrave et al., Protein Engineering variants of the proteins encoded by nucleic acids of the 6:327-331, (1993)). invention that are antagonistic to a biological process). One of ordinary skill in the art will recognize that the more 0333 Cell-based assays can be exploited to analyze var extensive the mutation, the more likely it is to affect the iegated libraries constructed from one or more of the pro biological activity of the protein (this is not to say that minor teins of the invention. For example, cells in a cell line (e.g.: modifications cannot do so as well). Thus, it is likely that a cell line that ordinarily responds to the protein(s) of mutant proteins that are agonists of those encoded by wild interest in a Substrate-dependent manner) can be transfected type proteins will differ from those wild type proteins only with a library of expression vectors. The transfected cells are at non-essential residues or will contain only conservative then contacted with the protein and the effect of the expres Substitutions. Conversely, antagonists are likely to differ at sion of the mutant on signaling by the protein (Substrate) can an essential residue or to contain non-conservative Substi be detected (e.g., by measuring redox activity or protein tutions. Moreover, those of ordinary skill in the art can folding). Plasmid DNA can then be recovered from the cells engineer proteins so that they retain desirable traits (i.e., that score for inhibition, or alternatively, potentiation of those that make: them efficacious in a particular therapeutic, signaling by the protein (Substrate). Individual clones are diagnostic, or screening regime) and lose undesirable traits then further characterized. (i.e., those that produce side effects, or produce false 0334 The invention also contemplates antibodies (i.e., positive results through non-specific binding). immunoglobulin molecules) that specifically bind (see the definition above) to the proteins described herein and anti 0330. In the event a protein of the invention is encoded by body fragments (e.g., antigen-binding fragments or other a new gene, the invention encompasses proteins -that arise immunologically active portions of the antibody). For following alternative transcription, RNA splicing, transla example, an antibody which specifically binds the troponin tional- or post-translational events (e.g., the invention variants of the present invention is preferably directed to the encompasses splice variants of the new genes). In the event unique amino acid sequence region which is not shared by a protein of the invention is encoded by a novel splice wild-type troponin (see FIG. 21, SEQ ID NO: 87). Alter variant, the invention encompasses proteins that arise fol natively, such an antibody can be directed to an amino acid lowing alternative translational- or post-translational events sequence which bridges the unique sequence region and (i.e., the invention does not encompass proteins encoded by common sequence regions. Antibodies are proteins, and known splice variants, but does encompass other variants of those of the invention can have at least one or two heavy the novel splice variant). Post-translational modifications are chain variable regions (VH), and at least one or two light discussed above in the context of expression systems. chain variable regions (VL). The VH and VL regions can be 0331. The fragmented or otherwise mutant proteins of the further subdivided. into regions of hypervariability, termed invention can differ from those encoded by the nucleic acids “complementarity determining regions” (CDR), which are US 2006/0068405 A1 Mar. 30, 2006 26 interspersed with more highly conserved “framework antibody (i.e., part of a homogeneous population of anti regions’ (FR). These regions have been precisely defined bodies to a particular antigen), either of which can be see, Kabat et al., Sequences of Proteins of Immunological recombinantly produced (e.g., produced by phage display. Interest, Fifth Edition, U.S. Department of Health and or by combinatorial methods, as described in, e.g., U.S. Pat. Human Services, NIH Publication No. 91-3242, (1991) and No. 5,223,409; WO 92/18619; WO 91/17271; WO Chothia et al., J. Mol. Biol. 196:901-917, (1987), and 92/20791; WO 92/15679; WO 93/01288; WO 92/01047; antibodies or antibody fragments containing one or more of WO92/09690: WO 90/02809; Fuchs et al., Bio/Technology them are within the scope of the invention. 9:1370-1372, (1991); Hay et al. Human Antibody Hybrido mas 3:81–85, (1992); Huse et al. Science 246:1275-1281, 0335 The antibodies of the invention can also include a (1989). Griffiths et al. EMBOJ 12:725-734, (1993). Hawk heavy and/or light chain constant region constant regions ins et al., J. Mol Biol 226:889-896, (1992); Clackson et al. typically mediate binding between the antibody and host Nature 352:624–628, (1991); Gram et al., Proc. Natl. Acad. tissues or factors, including effector cells of the immune Sci. USA89:3576-3580, (1992); Garrad et al., Bio/Technol system and the first component (C1q) of the classical ogy 9:1373-1377, (1991); Hoogenboom et al. Nucl. Acids complement system, and can therefore form heavy and light Res. 19:4133-4137, (1991); and Barbas et al., Proc. Natl. immunoglobulin chains, respectively. For example, the anti Acad. Sci. USA 88:7978-7982, (1991). In one embodiment, body can be a tetramer (two heavy and two light immuno an antibody is made by immunizing an animal with a protein globulin chains, which can be connected by, for example, encoded by a nucleic acid of the invention (one, of course, disulfide bonds). The heavy chain constant region contains that contains coding sequence) or a mutant or fragment (e.g., three domains (CH1, CH2 and CH3), whereas the light chain an antigenic peptide fragment) thereof. Alternatively, an constant region has one (CL). animal can be immunized with a tissue sample (e.g., a crude 0336 An antigen-binding fragment of the invention can tissue preparation, a whole cell (living, lysed, or fraction be: (i) a Fab fragment (i.e., a monovalent fragment consist ated) or a membrane fraction). Thus, antibodies of the ing of the VL, VH, CL and CH1 domains); (ii) a F(ab'), invention can specifically bind to a purified antigen or a fragment (i.e., a bivalent fragment containing two Fab tissue (e.g., a tissue section, a whole cell (living, lysed, or fragments linked by a disulfide bond at the hinge region); fractionated) or a membrane fraction). (iii) a Fd fragment consisting of the VH and CH1 domains: 0339. In the event an antigenic peptide is used, it can (iv) a Fv fragment consisting of the VL and VH domains of include at least eight (e.g., 10, 15, 20, or 30) consecutive a single arm of an antibody, (v) a dAb fragment Ward et al., amino acid residues found in a protein of the invention. The Nature 341:544-546, (1989), which consists of a VH antibodies generated can specifically bind to one of the domain; and (vi) an isolated complementarity determining proteins in their native form (thus, antibodies with linear or region (CDR). conformational epitopes are within the invention), in a 0337 F(ab'), fragments can be produced by pepsin diges denatured or otherwise non-native form, or both. Confor tion of the antibody molecule, and Fab fragments can be mational epitopes can sometimes be identified by identifying generated by reducing the disulfide bridges of F(ab')2 frag antibodies that bind to a protein in its native form, but not ments. Alternatively, Fab expression libraries can be con in a denatured form. structed Huse et al., Science 246:1275, (1989) to allow 0340 The host animal (e.g., a rabbit, mouse, guinea pig, rapid and easy identification of monoclonal Fab fragments or rat) can be immunized with the antigen, optionally linked with the desired specificity. Methods of making other anti to a carrier (i.e., a Substance that stabilizes or otherwise bodies and antibody fragments are known in the art. For improves the immunogenicity of an associated molecule). example, although the two domains of the Fv fragment, VL and optionally administered with an adjuvant (see, e.g., and VH, are coded for by separate genes, they can be joined, Ausubel et al., Supra). An exemplary carrier is keyhole using recombinant methods or a synthetic linker that enables limpet hemocyanin (KLH) and exemplary adjuvants, which them to be made as a single protein chain in which the VL will be selected in view of the host animals species, include and VH regions pair to form monovalent molecules known Freund's adjuvant (complete or incomplete), adjuvant min as single chain FV (ScPV); see e.g., Bird et al., Science eral gels (e.g., aluminum hydroxide), Surface active Sub 242:423-426, (1988); Huston et al., Proc. Natl. Acad. Sci. stances such as lysolecithin, pluronic polyols, polyanions, USA 85:5879-5883, (1988); Colchei et al., Ann. NY Acad. peptides, oil emulsions, dinitrophenol, BCG (bacille Cal Sci. 880:263-80, (1999); and Reiter, Clin. Cancer Res. mette-Guerin), and Corynebacterium parvum. KLH is also 2:245-52, (1996). Techniques for producing single chain Sometimes referred to as an adjuvant. The antibodies gen antibodies are also described in U.S. Pat. Nos. 4,946,778, erated in the host can be purified by, for example, affinity and 4,704,692. Such single chain antibodies are encom chromatography methods in which the polypeptide antigen passed within the term “antigen-binding fragment of an is immobilized on a resin. antibody. These antibody fragments are obtained using con 0341 Epitopes encompassed by an antigenic peptide may ventional techniques known to those of ordinary skill in the be located on the Surface of the protein (e.g., in hydrophilic art, and the fragments are screened for utility in the same regions), or in regions that are highly antigenic (Such regions manner that intact antibodies are screened. Moreover, a can be selected, initially, by virtue of containing many single chain antibody can form dimers or multimers and, charged residues). An Emini Surface probability analysis of thereby, become a multivalent antibody having specificities human protein sequences can be used to indicate the regions for different epitopes of the same target protein. that have a particularly high probability of being localized to 0338. The antibody can be a polyclonal (i.e., part of a the surface of the protein. heterogeneous population of antibody molecules derived 0342. The antibody can be a fully human antibody (e.g., from the Sera of the immunized animals) or a monoclonal an antibody made in a mouse that has been genetically US 2006/0068405 A1 Mar. 30, 2006 27 engineered to produce an antibody from a human immuno framework, a consensus framework or sequence, or a globulin sequence, such as that of a human immunoglobulin sequence that is at least 85% (e.g., 90%. 95%, 99%) iden gene (the kappa, lambda, alpha (IgA1 and IgA2), gamma tical thereto. A “consensus sequence' is one formed from the (IgG1, IgG2, IgG3, IgG4), delta, epsilon and mu constant most frequently occurring amino acids (or nucleotides) in a region genes or the myriad immunoglobulin variable region family of related sequences (see, e.g., Winnaker, From genes). Alternatively, the antibody can be a non-human Genes to Clones, Verlagsgesellschaft, Weinheim, Germany, antibody (e.g., a rodent (e.g., a mouse or rat), goat, or 1987). Each position in the consensus sequence is, occupied non-human primate (e.g., monkey) antibody). by the amino acid residue that occurs most frequently at that 0343 Methods of producing antibodies are well known in position in the family (where two occur equally frequently, the art. For example, as noted above, human monoclonal either can be included). A “consensus framework” refers to, antibodies can be generated in transgenic mice carrying the the framework region in the consensus immunoglobulin human immunoglobulin genes rather than those of the Sequence. mouse. Splenocytes obtained from these mice (after immu 0346. An antibody can be humanized by methods known nization with an antigen of interest) can be used to produce in the art. For example, humanized antibodies can be gen hybridomas that secrete human mAbs with specific affinities erated by replacing sequences of the FV variable region that for epitopes from a human protein (see, e.g., WO 91/00906, are not directly involved-in antigen binding with equivalent WO 91/10741; WO92/03918: WO92/03917; Lonberg et sequences from human Fv Variable regions. General meth al., Nature 368:856-859, 1994; Green et al., Nature Genet. ods for generating humanized antibodies are provided by 7:13-21, 1994; Morrison et al. Proc. Natl. Acad. Sci. USA Morrison Science 229:1202-1207, (1985)), Oi et al. Bio 81:6851-6855, 1994; Bruggeman et al., Immunol. 7:33-40, Techniques 4:214, (1986), and Queen et al. (U.S. Pat. Nos. 1993; Tuaillon et al., Proc. Natl. Acad. Sci. USA 90:3720 5,585,089; 5,693,761 and 5,693,762). Those nucleic acid 3724, 1993; and Bruggeman et al., Eur. J. Immunol 21:1323 sequences required by these methods can be obtained from 1326-1991). a hybridoma producing an antibody the polypeptides of the 0344) The antibody can also be one in which the variable present invention, or fragments thereof. The recombinant region, or a portion thereof (e.g., a CDR), is generated in a DNA encoding the humanized antibody, or fragment thereof, non-human organism (e.g., a rat or mouse). Thus, the can then be cloned into an appropriate expression vector. invention encompases chimeric, CDR-grafted, and human 0347 Humanized or CDR-grafted antibodies can be pro ized antibodies and antibodies that are generated in a non duced such that one, two, or all CDRs of an immunoglobulin human organism and then modified (in, e.g., the variable chain can be replaced see, e.g., U.S. Pat. No. 5.225,539; framework or constant region) to decrease antigenicity in a Jones et al., Nature 321:552-525, (1986); Verhoeyan et al., human. Chimeric antibodies (i.e., antibodies in which dif Science 239:1534, (1988); and Beidler et al., J. Immunol. ferent portions are derived from different animal species 141:4053-4060, (1988). Thus, the invention features (e.g., the variable region of a murine mAB and the constant humanized antibodies in which specific amino acid residues region of a human immunoglobulin) can be produced by have been substituted, deleted or added (in, e.g., in the recombinant techniques known in the art. For example, a framework-region to improve antigen binding). For gene encoding the Fc constant region of a murine (or other example, a humanized antibody will have framework resi species) monoclonal antibody molecule can be digested with dues identical to those of the donor or to amino acid residues restriction enzymes to remove the region encoding the other than those of the recipient framework residue. To murine Fc, and the equivalent portion of a gene encoding a generate such antibodies, a selected, Small number of accep human Fc constant region can be substituted therefore see tor framework residues of the humanized immunoglobulin European Patent Application Nos. 125,023: 184, 187; 171, chain are replaced by the corresponding donoramino acids. 496; and 173,494; see also WO 86/01533: U.S. Pat. No. The substitutions can occur adjacent to the CDR or in 4,816,567: Better et al. Science 240:1041-1043, (1988); regions that interact with a CDR (U.S. Pat. No. 5,585,089, Liu et al., Proc. Natl. Acad. Sci. USA84:3439-3443, (1987); see especially columns 12-16). Other techniques for human Liu et al., J. Immunol. 139:3521-3526, (1987); Sun et al., izing antibodies are described in EP 519596 A1. Proc. Natl. Acad. Sci. USA 84:214-218, (1987); Nishimura et al., Cancer Res.47:999-1005, (1987); Wood et al., Nature 0348. In certain embodiments, the antibody has an effec 314:446-449, (1985); Shaw et al., J. Natl. Cancer Inst. tor function and can fix complement, while in others it can 80:1553-1559, (1988); Morrison et al., Proc. Natl. Acad. Sci. neither recruit effector cells nor fix complement. The anti USA 81:6851, (1984); Neuberger et al., Nature 312:604, body can also have little or no ability to bind an Fc receptor. (1984); and Takeda et al., Nature 314:452, (1984)). For example, it can be an isotype or subtype, or a fragment or other mutant that cannot bind to an Fc receptor. (e.g., the 0345. In a humanized or CDR-grafted antibody, at least antibody can have a mutant (e.g., a deleted) Fc receptor one or two, but generally all three of the recipient CDRs (of binding region). The antibody may or may not alter (e.g., heavy and or light imnuuoglobulin chains) will be replaced increase or decrease) the activity of a protein to which it with a donor CDR. One need only replace the number of binds. CDRs required for binding of the humanized antibody to a protein described herein or a fragment thereof. The donor 0349. In other embodiments, the antibody can be coupled can be a rodent antibody, and the recipient can be a human to a heterologous Substance, Such as a toxin (e.g., ricin, framework or a human consensus framework. Typically, the diphtheria toxin, or active fragments thereof), another type immunoglobulin providing the CDRs is called the “donor of therapeutic agent (e.g., an antibiotic), or a detectable (and is often that of a rodent) and the immunoglobulin label. A detectable label can include an enzyme (e.g., providing the framework is called the “acceptor.” The accep horseradish peroxidase, alkaline phosphatase, B-galactosi tor framework can be a naturally occurring (e.g., a human) dase, or acetylcholinesterase), a prosthetic group (e.g., US 2006/0068405 A1 Mar. 30, 2006 28 streptavidin/biotin and avidin/biotin), or a fluorescent, lumi The protein can then be purified or recovered from the nescent, bioluminescent, or radioactive material. (e.g., animal’s milk or eggs. Animals suitable for Such purpose umbelliferone, fluorescein, fluorescein isothiocyanate, include pigs, cows, goats, sheep, and chickens. rhodamine, dichlorotriazinylamine fluorescein, dansyl chlo ride, or phycoerythrin (which are fluorescent), luminol 0355 Biomolecular sequences of the present invention can be classified to functional groups based on known (which is luminescent), luciferase, luciferin, and aequorin activity of homologous sequences. This functional group (which are bioluminescent), and 'I, I, S or H (which classification, allows the identification of diseases and con are radioactive)). ditions, which may be diagnosed and treated based on the 0350. The antibodies of the invention (e.g., a monoclonal novel sequence information and annotations as described in antibody) can be used to isolate the proteins of the invention the present invention. (by, for example, affinity chromatography or immunopre cipitation) or to detect them in, for example, a cell lysate or 0356. This functional group classification includes the supernatant (by Western blotting, ELISAS, radioimmune following groups: assays, and the like) or a histological section. One can therefore determine the abundance and pattern of expression 0357 Proteins Involved in Drug-Drug Interactions: of a particular protein. This information can be useful in 0358. The phrase “proteins involved in drug-drug inter making a diagnosis or in evaluating the efficacy of a clinical actions' refers to proteins involved in a biological process teSt. which mediates the interaction between at least two con 0351. The invention also includes the nucleic acids that Sumed drugs. encode the antibodies described above and vectors and cells 0359 Pharmaceutical compositions including such pro (e.g., mammalian cells Such as CHO cells or lymphatic cells) teins or protein encoding sequences, antibodies directed that contain them. Similarly, the invention includes cell lines against Such proteins or polynucleotides capable of altering (e.g., hybridomas) that make the antibodies of the invention expression of Such proteins, may be used to modulate and methods of making those cell lines. drug-drug interactions. Antibodies and polynucleotides Such 0352. Non-human transgenic animals are also within the as PCR primers and molecular probes designed to identify Scope of the invention. These animals can be used to study Such proteins or protein encoding sequences may be used for the function or activity of proteins of the invention and to diagnosis of Such drug-drug interactions. identify or evaluate agents that modulate their activity. A “transgenic animal' can be a mammal (e.g., a mouse, rat, 0360 Examples of these conditions include, but are not dog, pig, cow, sheep, goat, or non-human primate), an avian limited to the cytochrom. P450 , which is (e.g., a chicken), or an amphibian (e.g. a frog) having one or involved in the metabolism of many drugs: more cells that include a transgene (e.g., an exogenous DNA 0361 Examples of proteins involved in drug-drug inter molecule or a rearrangement (e.g., deletion of) endogenous actions are listed in Table 16, below. chromosomal DNA). The transgene can be integrated into or can occur within the genome of the cells of the animal, and 0362 Proteins Involved in the Metabolism of a Pro-Drug it can direct the expression of an encoded gene product in to a Drug: one or more types of cells or tissues. Alternatively, a 0363 The phrase “proteins involved in the metabolism of transgene can "knock out' or reduce gene expression. This a pro-drug to a drug” refers to proteins that activate an can occur when an endogenous gene has been altered by inactive pro-drug by chemically chaining it into a biologi homologous recombination, which occurs between it and an cally active compound. Preferably, the metabolizing enzyme exogenous DNA molecule that was introduced into a cell of is expressed in the target tissue thus reducing systemic side the animal (e.g., an embryonic cell) at a very early stage in effects. the animals development. 0364 Pharmaceutical compositions including such pro 0353 Intronic sequences and polyadenylation signals can teins or protein encoding sequences, antibodies directed be included in the transgene and, when present, can increase against Such proteins or polynucleotides capable of altering expression. One or more tissue-specific regulatory expression of Such proteins, may be used to modulate the sequences can also be operably linked to a transgene of the metabolism of a pro-drug into drug. Antibodies and poly invention to direct expression of protein to particular cells nucleotides such as PCR primers and molecular probes (exemplary regulatory sequences are described above, and designed to identify such proteins or protein encoding many others are known to those of ordinary skill in the art). sequences may be used for diagnosis of Such conditions. 0354) A “founder animal is one that carries a transgene of the invention in its genome or expresses mRNA from the 0365 Examples of these proteins include, but are not transgene in its cells or tissues. Founders can be bred to limited to esterases hydrolyzing the cholesterol lowering produce a line of transgenic animals carrying the founders drug simvastatin into its hydroxy acid active form. transgene or bred with founders carrying other transgenes 0366 MDR Proteins: (in which case the progeny would bear the transgenes borne by both founders). Accordingly, the invention features 0367 The phrase “MDR proteins” refers to Multi Drug founder animals, their progeny, cells or populations of cells Resistance proteins that are responsible for the resistance of obtained therefrom, and proteins obtained therefrom. For a cell to a range of drugs, usually by exporting these drugs example, a nucleic acid of the invention can be placed under outside the cell. Preferably, the MDR proteins are ABC the control of a promoter that directs expression of the binding cassette proteins. Preferably, drug resistance is encoded protein in the milk or eggs of the transgenic animal. associated with resistance to chemotherapy. US 2006/0068405 A1 Mar. 30, 2006 29

0368 Pharmaceutical compositions including such pro 0380 Pharmaceutical compositions including such pro teins or protein encoding sequences, antibodies directed teins or protein encoding sequences, antibodies directed against Such proteins or polynucleotides capable of altering against Such proteins or polynucleotides capable of altering expression of Such proteins, may be used to treat diseases in expression of Such proteins, may be used to treat diseases which the transport of molecules and macromolecules Such involving the immune system such as inflammation, autoim as neurotransmitters, hormones, Sugar etc. is abnormal lead mune diseases, infectious diseases, and cancerous processes. ing to various pathologies. Antibodies and polynucleotides Antibodies and polynucleotides such as PCR primers and such as PCR primers and molecular probes designed to molecular probes designed to identify such proteins or identify Such proteins or protein encoding sequences may be protein encoding sequences may be used for diagnosis of used for diagnosis of Such diseases. Such diseases. 0369 Examples of MDR proteins include, but are not 0381 Examples of such diseases and molecules that may limited to the multi-drug resistant transporter MDR1/P- be target for diagnostics include, but are not limited to glycoprotein, which is the gene product of MDR1, belong members of the complement family such as C3 and C4 that ing to the ATP-binding cassette (ABC) superfamily of their blood level is used for evaluation of autoimmune membrane transporters. This protein was shown to increase diseases and allergy state and C1 inhibitor that its absence is the resistance of malignant cells to therapy by exporting the associated with angioedema. Thus, new variants of these therapeutic agent out of the cell. genes are expected to be markers for similar events. Muta tion in variants of the complement family may be associated 0370 Acting on Amino Acids: with other immunological syndromes, such as increased 0371 The phrase “hydrolases acting on amino acids' bacterial infection that is associated with mutation in C3. C1 refers to hydrolases acting on a pair of amino acids. inhibitor was shown to provide safe and effective inhibition of complement activation after reperfused acute myocardial 0372 Pharmaceutical compositions, including such pro infarction and may reduce myocardial injury Eur. Heart J. teins or protein encoding sequences, antibodies directed 2002, 23 (21): 1670-7), thus, its variant may have the same or against Such proteins or polynucleotides capable of altering improved effect. expression of Such proteins, may be used to treat diseases in which the transfer of a glycosyl chemical group from one 0382 Transcription Factor Binding: molecule to another is abnormal thus, a beneficial effect may be achieved by modulation of such reaction. Antibodies and 0383. The phrase “tanscription factor binding refers to polynucleotides such as PCR primers and molecular probes proteins involved in transcription process by binding to designed to identify Such proteins or protein encoding nucleic acids, such as transcription factors, RNA and DNA binding proteins, Zinc fingers, helicase, , histones, sequences may be used for diagnosis of Such diseases. and nucleases. 0373) Examples of such diseases include, but are not limited to reperfusion of clotted blood vessels by TPA 0384 Pharmaceutical compositions including such pro (Tissue Plasminogen Activator) which converts the abun teins or protein encoding sequences, antibodies directed dant, but inactive, Zymogen plasminogen to plasmin by against Such proteins or polynucleotides capable of altering expression of Such proteins may be used to treat diseases hydrolyzing a single ARG-VAL bond in plasminogen. involving transcription factors binding proteins. Such treat 0374 Transaminases: ment may be based on transcription factor that can be used to for modulation of gene expression associated with the 0375. The term “transaminases” refers to enzymes trans disease. Antibodies and polynucleotides such as PCR prim ferring an amnine group from one compound to another. ers and molecular probes designed to identify Such proteins 0376 Pharmaceutical compositions including such pro or protein encoding sequences may be used for diagnosis of teins or protein encoding sequences, antibodies directed Such diseases. against Such proteins or polynucleotides capable of altering 0385 Examples of such diseases include, but are not expression of Such proteins, may be used to treat diseases in limited to breast cancer associated with ErbB-2 expression which the transfer of an amine group from one molecule to that was shown to be successfully modulated by a transcrip another is abnormal thus, a beneficial effect may be achieved tion factor Proc. Natl. Acad. Sci. USA. 2000, 97(4): 1495 by modulation of Such reaction. Antibodies and polynucle 500). Examples of novel transcription factors used for otides such as PCR primers and molecular probes designed therapeutic protein production include, but are not limited to to identify Such proteins or protein encoding sequences may those described for Erythropoietin production J. Biol. be used for diagnosis of Such diseases. Chem. 2000, 275(43):33.850-60; J. Biol. Chem. 2000, 0377 Examples of such transaminases include, but are 275(43):33850-60 and zinc fingers protein transcription not limited to two liver enzymes, frequently used as markers factors (ZFP-TF) variants J. Biol. Chem. 2000, for liver function SGOT (Serum Glutamic-Oxalocetic 275(43):33850-60). Transaminase AST) and SGPT (Serum Glutamic-Pyruvic Transaminase—ALT). 0386 Small GTPase Regulatory/Interacting Proteins: 0387. The phrase “Small GTPase regulatory/interacting 0378 Immunoglobulins: proteins’ refers to proteins capable of regulating or inter 0379 The term “immunoglobulins' refers to proteins that acting with GTPase Such as escort protein, guanyl are involved in the immune and complement systems such nucleotide exchange factor, guanyl-nucleotide exchange as antigens and autoantigens, immunoglobulins, MHC and factor adaptor, GDP-dissociation inhibitor, GTPase inhibi HLA proteins and their associated proteins. tor, GTPase activator, guanyl-nucleotide releasing factor, US 2006/0068405 A1 Mar. 30, 2006 30

GDP-dissociation stimulator, regulator of G-protein signal against Such proteins or polynucleotides capable of altering ing, RAS interactor, RHO interactor, RAB interactor, and expression of Such proteins, may be used to treat diseases RAL interactor. caused by abnormal activity of . Antibodies 0388 Pharmaceutical compositions including such pro and polynucleotides such as PCR primers and molecular teins or protein encoding sequences, antibodies directed probes designed to identify such proteins or protein encod against Such proteins or polynucleotides capable of altering ing sequences may be used for diagnosis of Such diseases. expression of Such proteins, may be used to treat diseases in 0397 Examples of such diseases include, but are not which G-proteases mediated signal-transduction is abnor limited to malignant and autoimmune diseases in which the mal, either as a cause, or as a result of the disease. Anti enzyme DHFR (DiHydroFolateReductase) that participates bodies and polynucleotides such as PCR primers and in folate metabolism and. essential for de novo glycine and molecular probes designed to identify such proteins or purine synthesis is the target for the widely used drug protein encoding sequences may be used for diagnosis of Methotrexate (MTX). Such diseases. 0398 Receptors: 0389 Examples of such diseases include, but are not limited to diseases related to prenylation. Modulation of 0399. The term “receptors' refers to protein-binding sites prenylation was shown to affect therapy of diseases such as on a cells Surface or interior, that recognize and binds to osteoporosis, ischemic heart disease, and inflammatory pro specific messenger molecule leading to a biological cesses. Small regulatory/interacting proteins are response, Such as signal transducers, complement receptors, major component in the prenylation post translation modi ligand-dependent nuclear receptors, transmembrane recep fication, and are required to the normal activity of prenylated tors, GPI-anchored membrane-bound receptors, various proteins. Thus, their variants may be used for therapy of coreceptors, internalization receptors, receptors to neu prenylation associated diseases. rotransmitters, hormones and various other effectors and ligands. 0390 Calcium Binding Proteins: 0400 Pharmaceutical compositions including such pro 0391 The phrase "calcium binding proteins’ refers to teins or protein encoding sequences, antibodies directed proteins involve in calcium binding, preferably, calcium against Such proteins or polynucleotides capable of altering binding proteins, ligand binding or carriers, such as diacylg expression of Such proteins, may be used to treat diseases lycerol kinase, Calpain, calcium-dependent protein serine/ caused by abnormal activity of receptors, preferably, recep threonine phosphatase, calcium sensing proteins, calcium tors to neurotransmitters, hormones and various other effec storage proteins. tors and ligands. Antibodies and polynucleotides such as 0392 Pharmaceutical compositions including such pro PCR primers and molecular probes designed to identify such teins or protein encoding sequences, antibodies directed proteins or protein encoding sequences may be used for against Such proteins or polynucleotides capable of altering diagnosis of Such diseases. expression of Such proteins, may be used to treat calcium 04.01 Examples of such diseases include, but are not involved diseases. Antibodies and polynucleotides Such as limited to, chronic myelomonocytic leukemia caused by PCR primers and molecular probes designed to identify such growth factor B receptor deficiency Rao D. S., et al., (2001) proteins or protein encoding sequences may be used for Mol. Cell Biol., 21 (22):7796-806), thrombosis associated diagnosis of Such diseases. with protease-activated receptor deficiency Sambrano G. 0393 Examples of such diseases include, but are not R., et al., (2001) Nature, 413(6851):26-7), hypercholester limited to diseases related to hypercalcemia, hypertension, olemia associated with low density lipoprotein receptor cardiovascular disease, muscle diseases, gastro-intestinal deficiency Koivisto U. M., et al., (2001) Cell, 105(5):575 diseases, uterus relaxing, and uterus. An example for therapy 85), familial Hibernian fever associated with tumor necrosis use of calcium binding proteins variant may be treatment of factor receptor deficiency Simon A., et al., (2001) Ned emergency cases of hypercalcemia, with secreted variants of Tijdschr Geneeskd, 145(2):77-8), colitis associated with calcium storage proteins. immunoglobulin E receptor expression Dombrowicz. D., et al., (2001) J. Exp. Med., 193(1):25-34), and alagille syn 0394 : drome associated with Jaggedl Stankiewicz. P. et al., (2001) 0395. The term “oxidoreductase' refers to enzymes that Am. J. Med. Genet., 103(2):166-71), breast cancer associ catalyze the removal of hydrogen atoms and electrons from ated with mutated BRCA2 and androgen, hypertension the compounds on which they act. Preferably, oxidoreduc associated with , and a adrenergic receptors, diabetes asso tases acting on the following groups of donors: CH-OH, ciated with the insulin receptor. Therapeutic applications of CH CH, CH-NH2, CH NH; oxidoreductases acting on nuclear receptors variants may be based on Secreted version NADH or NADPH, nitrogenous compounds, sulfur group of of receptors such as the thyroid nuclear receptor that by donors, heme group, hydrogen group, diphenols and related binding plasma free thyroid hormone to reduce its levels Substances as donors: oxidoreductases acting on peroxide as may have a therapeutic effect in cases of thyrotoxicosis. A acceptor, Superoxide radicals as acceptor, oxidizing metal secreted version of glucocorticoid nuclear receptor, by bind ions, CH2 groups: oxidoreductases acting on reduced ferre ing plasma free cortisol, thus, reducing, may have a thera doxin as donor, oxidoreductases acting on reduced fla peutic effect in cases of Cushing's disease (a disease asso Vodoxin as donor; and oxidoreductases acting on the alde ciated with high cortisole levels in the plasma). hyde or oxo group of donors. 0402 Secreted soluble TNF receptor is an example for a 0396 Pharmaceutical compositions including such pro molecule, which can be used to treat conditions in which teins or protein encoding sequences, antibodies directed downregulation of TNF levels or activity is benefitial, US 2006/0068405 A1 Mar. 30, 2006

including, but not limited to, Rheumatoid Arthritis, Juvenile diseases of the hormonal system, diabetes and infectious Rheumatoid Arthritis, Psoriatic Arritis and Ankylosing diseases such as bacterial and fingal infections. One specific Spondylitis. example is the clhemolysin, which is produced by S. aureus creating ion conductive pores in the cell membrane, thereby 0403 Protein Serine/Threonine Kinases: deminishing its integrity. 04.04 The phrase “protein serine/threonine kinases” refers to proteins which phosphorylate serine/threonine resi 0411 Hydrolases, Acting on Acid Anhydrides: dues, mainly involved in signal transduction, such as trans 0412. The phrase “hydrolases, acting on acid anhydrides’ membrane receptor protein serine/threonine kinase, 3-phos refers to hydrolytic enzymes that are acting on acid anhy phoinositide-dependent protein kinase, DNA-dependent drides, such as hydrolases acting on acid anhydrides in protein kinase, G-protein-coupled receptor phosphorylating phosphorus-containing anhydrides or in Sulfonyl-containing protein kinase, SNF1A/AMP-activated protein kinase, casein kinase, calmodulin regulated protein kinase, cyclic anhydrides, hydrolases catalyzing transmembrane move nucleotide dependent protein kinase, cyglin-dependent pro ment of substances, and involved in cellular and subcellular tein- kinase, eukaryotic translation initiation factor 2C. moVement. kinase, galactosyltransferase-associated: kinase, glycogen 0413 Pharmaceutical compositions including such pro synthase kinase 3, protein kinase C, receptor signaling teins or protein encoding sequences, antibodies directed protein serine/threonine kinase, ribosomal protein S6 kinase, against Such proteins or polynucleotides capable of altering and IkB kinase. expression of Such proteins may be used to treat diseases in 04.05 Pharmaceutical compositions including such pro which the -related activities are abnormal. Anti teins or protein encoding sequences, antibodies directed bodies and polynucleotides such as PCR primers and against Such proteins or polynucleotides capable of altering molecular probes designed to identify such proteins or expression of Such proteins, may be used to treat diseases protein encoding sequences may be used for diagnosis of ameliorated by a modulating kinase activity. Antibodies and Such diseases. polynucleotides such as PCR primers and molecular probes 0414 Examples of such diseases include, but are not designed to identify Such proteins or protein encoding limited to glaucoma treated with carbonic anhydrase inhibi sequences may be used for diagnosis of Such diseases. tors (e.g. Dorzolamide), peptic ulcer disease treated with 04.06 Examples of such diseases include, but are not H()K()ATPase inhibitors that were shown to affect disease limited to schizophrenia. 5-HT(2A) serotonin receptor is the by blocking gastric carbonic anhydrase (e.g. Omeprazole). principal molecular target for LSD-like hallucinogens and atypical antipsychotic drugs. It has been shown that a major 0415 , Transferring Phosphorus-Containing mechanism for the attenuation of this receptor signaling Groups: following agonist activation typically involves the phospho 0416) The phrase “transferases, transferring phosphorus rylation of serine and/or threonine residues by various containing groups' refers to enzymes that catalyze the kinases. Therefore, serine/threonine kinases specific for the transfer of phosphate from one molecule to another, Such as 5-HT(2A) serotonin receptor may serve as drug targets for phosphotransferases using the following groups as accep a disease such as Schizophrenia. Other diseases that may be tors: alcohol group, carboxyl group, nitrogenous group, treated through serine/thereonine kinases modulation are phosphate; phosphotransferases with regeneration of donors Peutz-Jeghers syndrome (PJS, a rare autosomal-dominant catalyzing intramolecular transfers; diphosphotransferases; disorder characterized by hamartomatous polyposis of the nucleotidyltransferase; and phosphotransferases for other gastrointestinal tract and melanin pigmentation of the skin Substituted phosphate groups. and mucous membranes Hum. Mutat. 2000, 16(1):23–30). breast cancer Oncogene. 1999, 18(35):4968-73), Type-2 0417 Pharmaceutical compositions including such pro diabetes insulin resistance Am. J. Cardiol. 2002, 90(5A): teins or protein encoding sequences, antibodies directed 11G-18G), and fanconi anemia Blood. 2001, 98(13):3650 against Such proteins or polynucleotides capable of altering 7. expression of Such proteins may be used to treat diseases in 04.07 Channel/Pore Class Transporters: which the transfer of a phosphorous containing functional group to a modulated moiety is abnormal. Antibodies and 0408. The phrase “Channel/pore class transporters' polynucleotides such as PCR primers and molecular probes refers to proteins that mediate the transport of molecules and designed to identify such proteins or protein encoding macromolecules across membranes. Such as C-type chan sequences may be used for diagnosis of Such diseases. nels, porins, and pore-forming toxins. 0418. Examples of such diseases include, but are not 04.09 Pharmaceutical compositions including such pro limited to acute MIAnn. Emerg. Med. 2003, 42(3):343-50), teins or protein encoding sequences, antibodies directed Cancer Oral. Dis. 2003, 9(3):119-28; J. Surg. Res. 2003, against Such proteins or polynucleotides capable of altering 113(1):102-8) and Alzheimer's diseaseAm J. Pathol. 2003, expression of Such proteins, may be used to treat diseases in 163(3):845-58). Example possible utilities of such trans which the transport of molecules and macromolecules are ferases for drug improvement include, but are not limited to abnormal, therefore leading to various pathologies. Antibod aminoglycosides treatment (antibiotics) to which resistance ies and polynucleotides such as PCR primers and molecular is mediated by aminoglycoside phosphotransferases Front. probes designed to identify such proteins or protein encod Biosci. 1999, 1:4:D9-21). Using aminoglycoside phospho ing sequences may be used for diagnosis of Such diseases. transferases variants or inhibiting these enzymes may reduce 0410 Examples of such diseases include, but are not aminoglycosides resistance. Since aminoglycosides can be limited to, diseases of the nerves system Such as Parkinson, toxic to some patients, proving the expression of aminogly US 2006/0068405 A1 Mar. 30, 2006 32 coside phosphotransferases in a patient can deter from 0429 Pharmaceutical compositions including such pro treating him with aminoglycosides and risking the patient in teins or protein encoding sequences, antibodies directed vain. against Such proteins or polynucleotides capable of altering expression of Such proteins, may be used to treat diseases in 0419 Phosphoric Monoester Hydrolases: which beneficial effect may be achieved by modulating the 0420. The phrase “phosphoric monoester hydrolases” activity of electron transporters. Antibodies and polynucle refers to hydrolytic enzymes that are acting on ester bonds, otides such as PCR primers and molecular probes designed Such as nuclease, Sulfuric ester hydrolase, carboxylic ester to identify, Such proteins or protein encoding sequences may hydrolase, thiolester hydrolase, phosphoric monoester be used for diagnosis of Such diseases. hydrolase, phosphoric diester hydrolase, triphosphoric 0430. Examples of such diseases include, but are not monoester hydrolase, diphosphoric monoester hydrolase, limited to cyanide toxicity, resulting from cyanide binding to and phosphoric triester hydrolase. ubiquitous metalloenzymes rendering them inactive, and 0421 Pharmaceutical compositions including such pro interfering with the electron transport. Novel electron trans teins or protein encoding sequences, antibodies directed porters to which cyanide can bind may serve as drug targets against Such proteins or polynucleotides capable of altering for new cyanide antidotes. expression of Such proteins, may be, used to treat diseases in 0431 Transferases, Transferring Glycosyl Groups: which the hydrolytic cleavage of a covalent bond with accompanying addition of water (—H being added to one 0432. The phrase “transferases, transferring glycosyl product of the cleavage and —OH to the other), is abnormal. groups' refers to enzymes that catalyze the transfer of a Antibodies and polynucleotides such as PCR primers and glycosyl chemical group from one molecule to another Such molecular probes designed to identify such proteins or as murein lytic endotransglycosylase E, and sialyltrans protein encoding sequences may be used for diagnosis of ferase. Such diseases. 0433 Pharmaceutical compositions including such pro teins or protein encoding sequences, antibodies directed 0422 Examples of such diseases include, but are not against Such proteins or polynucleotides capable of altering limited to diabetes and CNS diseases such as Parkinson and expression of Such proteins, may be used to treat diseases in CaCC. which the transfer of a glycosyl chemical group is abnormal. 0423) Enzyme Inhibitors: Antibodies and polynucleotides such as PCR primers and molecular probes designed to identify such proteins or 0424 The term “enzyme inhibitors' refers to inhibitors protein encoding sequences may be used for diagnosis of and Suppressors of other proteins and enzymes, such as Such diseases. inhibitors of kinases, phosphatases, chaperones, guanylate cyclase, DNA gyrase, ribonuclease, proteasome inhibitors, 0434 , Forming Carbon-Oxygen Bonds: diazepam-binding inhibitor, ornithine decarboxylase inhibi 0435 The phrase “ligases, forming carbon-oxygen tor, GTPase inhibitors, dUTP pyrophosphatase inhibitor, bonds' refers to enzymes that catalyze the linkage between phospholipase inhibitor, proteinase inhibitor, protein biosyn carbon and oxygen Such as forming aminoacyl-tRNA thesis-inhibitors, and C-mylase inhibitors. and related compounds. 0425 Pharmaceutical compositions including such pro 0436 Pharmaceutical compositions including such pro teins or protein encoding sequences, antibodies directed teins or protein encoding sequences, antibodies directed against Such proteins or polynucleotides capable of altering against Such proteins or polynucleotides capable of altering expression of Such proteins, may be used to treat diseases in expression of Such proteins, may be used to treat diseases in which beneficial effect may be achieved by modulating the which the linkage between carbon and oxygen in an energy activity of inhibitors and Suppressors of proteins and dependent process is abnormal. Antibodies and polynucle enzymes. Antibodies and polynucleotides such as PCR otides such as PCR primers and molecular probes designed primers and molecular probes designed to identify Such to identify Such proteins or protein encoding sequences may proteins or protein encoding sequences may be used for be used for diagnosis of Such diseases. diagnosis of Such diseases. 0437 Ligases: 0426 Examples of such diseases include, but are not limited to C-1 antitrypsin (a natural serine proteases, which 0438. The term “ligases” refers to enzymes that catalyze protects the lung and liver from proteolysis) deficiency the linkage of two molecules, generally utilizing ATP as the associated with emphysema, COPD and liver chirosis. C-1 energy donor, also called synthetase. Examples for ligases antitrypsin is also used for diagnostics in cases of unex are enzymes such as O-alanyl-dopamine hydrolase, carbon plained liver and lung disease. A variant of this enzyme may oxygen bonds forming ligase, carbon-sulfur bonds forming act as protease inhibitor or a diagnostic target for related ligase, carbon-nitrogen bonds forming ligase, carbon-carbon diseases. bonds forming ligase, and phosphoric ester bonds forming ligase. 0427 Electron Transporters: 0439 Pharmaceutical compositions including such pro 0428 The term “Electron transporters' refers to ligand teins or protein encoding sequences, antibodies directed binding or carrier proteins involved in electron transport against Such proteins or polynucleotides capable of altering Such as flavin-containing electron transporter, cytochromes, expression of Such proteins, may be used to treat diseases in electron donors, electron acceptors, electron carriers, and which the joining together of two molecules in an energy cytochrome-coxidases. dependent process is abnormal. Antibodies and polynucle US 2006/0068405 A1 Mar. 30, 2006 otides such as PCR primers and molecular probes designed pyrophosphokinase, L-fuculokinase, L-ribulokinase, L-xy to identify Such proteins or protein encoding sequences may lulokinase, isocitrate dehydrogenase (NADP) kinase, be used for diagnosis of Such diseases. acetate kinase, allose kinase, carbamate kinase, cobinamide kinase, diphosphate-purine nucleoside kinase, fructokinase, 0440 Examples of such diseases include, but are not glycerate kinase, hydroxymethylpyrimidine kinase, hygro limited to neurological disorders such as Parkinson's disease mycin-B kinase, inosine kinase, kanamycin kinase, phos Science. 2003, 302(5646):819-22; J. Neurol. 2003, 250 phomethylpyrimidine kinase, phosphoribulokinase, poly Suppl. 3:III25-III29 or epilepsy Nat. Genet. 2003, phosphate kinase, propionate kinase, pyruvate,water 35(2):125-7), cancerous diseases Cancer Res. 2003, dikinase, rhamnulokinase, tagatose-6-phosphate kinase, tet 63(17):5428-37: Lab. Invest. 2003, 83(9): 1255-65), renal raacyldisaccharide 4'-kinase, thiamine-phosphate kinase, diseases Am. J. Pathol. 2003, 163(4): 1645-52), infectious undecaprenol kinase, uridylate kinase, N-acylmannosamine diseases Arch. Virol. 2003, 148(9):1851-62 and fanconi anemia Nat. Genet. 2003, 35(2):165-70). kinase, D-erythro-sphingosine kinase. 0447 Pharmaceutical compositions including such pro 0441 Hydrolases, Acting on Glycosyllibonds: teins or protein encoding sequences, antibodies directed 0442. The phrase “hydrolases, acting on glycosyl bonds' against Such proteins or polynucleotides capable of altering refers to hydrolytic enzymes that are acting on glycosyl expression of Such proteins, may be used to treat diseases bonds such as hydrolases hydrolyzing N-glycosyl com which may be ameliorated by a modulating kinase activity. pounds, S-glycosyl compounds, and O-glycosyl com Antibodies and polynucleotides such as PCR primers and pounds. molecular probes designed to identify such proteins or 0443 Pharmaceutical compositions including such pro protein encoding sequences may be used for diagnosis of teins or protein encoding sequences, antibodies directed Such diseases. against Such proteins or polynucleotides capable of altering 0448 Examples of such diseases include, but are not expression of Such proteins, may be used to treat diseases in limited to, acute lymphoblastic leukemia associated with which the hydrolase-related activities are abnormal. Anti spleen tyrosine kinase deficiency Goodman P. A., et al., bodies and polynucleotides such as PCR primers and (2001) Oncogene, 20030):3969-78, ataxia telangiectasia molecular probes designed to identify such proteins or associated with ATM kinase deficiency Boultwood J. protein encoding sequences may be used for diagnosis of (2001) J. Clin. Pathol. 54(7): 512-6), congenital haemolytic Such diseases. anaemia associated with erythrocyte pyruvate kinase defi 0444 Examples of such diseases include cancerous dis ciency Zanella A. et al., (2001) Br. J. Haematol., 113(1):43 eases J. Natl. Cancer Inst. 2003, 95 (17): 1263-5; Carcino 8 mevalonic aciduria caused by mevalonate kinase defi genesis. 2003, 24(7): 1281-2; author reply 1283 vascular ciency Houten S. M., et al., (2001) Eur. J. Hum. Genet. diseases J. Thorac. Cardiovasc. Surg. 2003, 126(2):344–57). 9(4): 253-9), and acute myelogenous leukemia associated gastrointestinal diseases such as colitis J. Immunol. 2003, with over-expressed death-associated protein kinase Guz 171(3):1556-63 or liver fibrosis World J. Gastroenterol. man M. L., et al., (2001) Blood, 97(7):2177-9). 2002, 8(5):901-7). 0449 Nucleotide Binding: 0445 Kinases: 0450. The term “nucleotide binding refers to ligand binding or carrier proteins, involved in physical interaction 0446. The term “kinases’ refers to enzymes which phos with a nucleotide, preferably, any compound consisting of a phorylate serine/threonine or tyrosine residues, mainly involved in signal transduction. Examples for kinases nucleoside that is esterified with orthophosphate or an include enzymes such as 2-amino-4-hydroxy-6-hydroxym oligophosphate at any hydroxyl group on the glycose moi ethyldihydropteridine pyrophosphokinase, NAD() kinase, ety. Such as purine nucleotide binding proteins. acetylglutamate kinase, adenosine kinase, adenylate kinase, 0451 Pharmaceutical compositions including such pro adenylsulfate kinase, arginine kinase, aspartate kinase, cho teins or protein encoding sequences, antibodies directed line kinase, creatine kinase, cytidylate kinase, deoxyadenos against Such proteins or polynucleotides capable of altering ine kinase, deoxycytidine kinase, deoxyguanosine kinase, expression of Such proteins, may be used to treat diseases dephospho-CoA kinase, diacylglycerol kinase, dolicliol that are associated with abnormal nucleotide binding. Anti kinase, ethanolamine kinase, galactokinase, glucokinase, bodies and polynucleotides such as PCR primers and glutamate 5-kinase, glycerol kinase, glycerone kinase, gua molecular probes designed to identify such proteins or nylate kinase, hexokinase, homoserine kinase, hydroxyeth protein encoding sequences may be used for diagnosis of ylthiazole kinase, inositol/phosphatidylinositol kinase, keto Such diseases. hexokinase, mevalonate kinase, nucleoside-diphosphate kinase, pantothenate kinase, phosphoenolpyruvate carbox 04.52 Examples of such diseases include, but are not ykinase, phosphoglycerate kinase, phosphomevalonate limited to Gout (a syndrome characterized by high urate kinase, protein kinase, pyruvate dehydrogenase (lipoamide) level in the blood). Since urate is a breakdown metabolite of kinase, pyruvate kinase, ribokinase, ribose-phosphate pyro purines, reducing purines serum levels could have a thera phosphokinase, selenide, water dikinase, shikimate kinase, peutic effect in Gout disease. thiamine pyrophosphokinase, thymidine kinase, thymidylate 0453 Binding: kinase, uridine kinase, Xylulokinase, 1D-myo-inositol-tris phosphate 3-kinase, phosphofructokinase, pyridoxal kinase, 0454. The term “tubulin binding refers to binding pro sphinganine kinase, riboflavin kinase, 2-dehydro-3-deox teins that bind tubulin such as microtubule binding proteins. ygalactonokinase, 2-dehydro-3-deoxygluconokinase, 0455 Pharmaceutical compositions including such -pro 4-diphosphocytidyl-2C-methyl-D-erythritol kinase, GTP teins or protein encoding sequences, antibodies directed US 2006/0068405 A1 Mar. 30, 2006 34 against Such proteins or polynucleotides capable of altering hormone-releasing hormone receptor deficiency aheshwari expression of Such proteins, may be used to treat diseases H. G., et al., (1998) J. Clin. Endocrinol. Metab., which are associated with abnormal tubulin activity or 83(11):4065-74). structure. Binding the products of the genes of this family, or antibodies reactive therewith, can modulate a plurality of 0461) Molecular function Unknown: tubulin activities as well as change microtubulin structure. 0462. The phrase “molecular function unknown refers Antibodies and polynucleotides such as PCR primers and to various proteins with unknown molecular function, Such molecular probes designed to identify such proteins or as cell Surface antigens. protein encoding sequences. may be used for diagnosis of 0463 Pharmaceutical compositions including such pro Such diseases. teins or protein encoding sequences, antibodies directed 0456. Examples of such diseases include, but are not against Such proteins or polynucleotides capable of altering limited to, Alzheimer's disease associated with t-complex expression of Such proteins, may be used to treat diseases in polypeptide 1 deficiency Schuller E., et al., (2001) Life Sci., which regulation of the recognition, or participation or bind 69(3):263-70, neurodegeneration associated with apoE of cell Surface antigens to other moieties may have thera deficiency Masliah E., et al., (1995) Exp. Neurol. peutic effect. Antibodies and polynucleotides such as PCR 136(2):107-22), progressive axonopathy associated with primers and molecular probes designed to identify Such disfuctional neurofilaments Griffiths I. R. et al., (1989) proteins or protein encoding sequences may be used for Neuropathol. Appl. Neurobiol., 15(1):63-74), familial fron diagnosis of Such diseases. totemporal dementia associated with tau deficiency astor P. et al., (2001) Ann. Neurol. 49(2):263-7), and colon cancer 0464) Examples of such diseases include, but are not suppressed by APC White R. L., (1997) Pathol. Biol. limited to, autoimmune diseases, various infectious diseases, (Paris), 45(3):240-4). En example for a drug whose target is cancer diseases which involve non cell Surface antigens tubulin is the anticancer drug Taxol. Drugs having similar recognition and activity. mechanism of action (interfering with tubulin polymeriza 0465 Enzyme Activators: tion) may be developed based on tubulin binding proteins. 0466. The term “enzyme activators' refers to enzyme 0457 Receptor Signaling Proteins: regulators such as activators of kinases, phosphatases, sph 0458. The phrase “receptor signaling proteins’ refers to ingolipids, chaperones, guanylate cyclase, tryptophan receptor proteins involved in signal transduction such as hydroxylase, proteases, phospholipases, caspases, propro receptor signaling protein serine/threonine kinase, receptor tein convertase 2 activator, cyclin-dependent protein kinase signaling protein tyrosine kinase, receptor signaling protein 5 activator, superoxide-generating NADPH oxidase activa tyrosine phosphatase, aryl hydrocarbon receptor nuclear tor, sphingomyelin phosphodiesterase activator, monophe translocator, hematopoeitin/interferon-class (D200-domain) nol monooxygenase activator, proteasome activator, and cytokine receptor signal transducer, transmembrane receptor GTPase activator. protein tyrosine kinase signaling protein, transmembrane 0467 Pharmaceutical compositions including such pro receptor protein serine/threonine kinase signaling protein, teins or protein encoding sequences, antibodies directed receptor signaling protein serine/threonine kinase signaling against Such proteins or polynucleotides capable of altering protein, receptor signaling protein serine/threonine phos expression of Such proteins, may be used to treat diseases in phatase signaling protein, Small GTPase regulatory/interact which beneficial effect may be achieved by modulating the ing protein, receptor signaling protein tyrosine kinase sig activity of activators of proteins and enzymes. Antibodies naling protein, and receptor signaling protein serine/ and polynucleotides such as PCR primers and molecular threonine phosphatase. probes designed to identify such proteins or protein encod 0459 Pharmaceutical compositions including such pro ing sequences may be used for diagnosis of Such diseases. teins or protein encoding sequences, antibodies directed 0468 Examples of such diseases include, but are not against Such proteins or polynucleotides capable of altering limited to all complement related diseases, as most comple expression of Such proteins, may be used to treat diseases in ment proteins activate by cleavage other complement pro which the signal-transduction is abnormal, either as a cause, teins. or as a result of the disease. Antibodies and polynucleotides such as PCR primers and molecular probes designed to 0469 Transferases, Transferring One-Carbon Groups: identify Such proteins or protein encoding sequences may be 0470 The phrase “transferases, transferring one-carbon used for diagnosis of Such diseases. groups' refers enzymes that catalyze the transfer of a one-carbon chemical group from one molecule to another 0460) Examples of such diseases include, but are not Such as methyltransferase, amidinotransferase, hydroxym limited to, complete hypogonadotropic hypogonadism asso ethyl-, formyl- and related transferase, carboxyl- and car ciated with GnRH receptor deficiency Kottler M. L., et a. bamoyltransferase. (2000) J. Clin. Endocrinol. Metab., 85(9):3002-8), severe combined immunodeficiency disease associated with IL-7 0471 Pharmaceutical compositions including such pro receptor deficiency Puel A., and Leonard W. J., (2000) Curr. teins or protein encoding sequences, antibodies directed Opin. Immunol. 12(4):468-7), schizophrenia associated against Such proteins or polynucleotides capable of altering N-methyl-D-aspartate receptor deficiency Mohn A. R., et expression of Such proteins, may be used to treat diseases in al., (1999) Cell, 98(4):427-36), Yesinia-associated arthritis which the transfer of a one-carbon chemical group from one associated with tumor necrosis factor receptor p55 defi molecule to another is abnormal so that a beneficial effect ciency Zhao Y. X., et al., (1999) Arthritis Rheum. may be achieved by modulation of such reaction. Antibodies 42(8): 1662-72), and Dwarfism of Sindh caused by growth and polynucleotides such as PCR primers and molecular US 2006/0068405 A1 Mar. 30, 2006 probes designed to identify such proteins or protein encod 250 Suppl. 3:III25-III29ataxia J. Hum. Genet. ing sequences may be used for diagnosis of Such diseases. 2003:48(8):415-9 or Alzheimer diseases J. Mol. Neurosci. 2003, 20(3):283-6: J. Alzheimers Dis. 2003, 5(3):171-7), 0472 Transferases: cancerous diseases Semin. Oncol. 2003, 30(5):709-16). 0473. The term “transferases’ refers to enzymes that prostate cancer Semin. Oncol. 2003, 30(5):709-16 meta catalyze the transfer of a chemical group, preferably, a bolic diseases J Neurochem. 2003, 87(1):248-56), infec phosphate or amine from one molecule to another. It tious diseases, such as prion infection EMBO J. 2003, includes enzymes such as transferases, transferring one 22(20):5435-5445). Chaperones may be also used for carbon groups, aldehyde or ketonic groups, acyl groups, manipulating therapeutic proteins binding to their receptors glycosyl groups, alkyl or aryl (other than methyl) groups, therefore, improving their therapeutic effect. nitrogenous, phosphorus-containing groups, Sulfur-contain ing groups, lipoyltransferase, deoxycytidyl transferases. 0480 Cell Adhesion Molecule: 0474 Pharmaceutical compositions including such pro 0481. The phrase “cell adhesion molecule' refers to teins or protein encoding sequences, antibodies directed proteins that serve as adhesion molecules between adjoining against Such proteins or polynucleotides capable of altering cells such as membrane-associated protein with gnanylate expression of Such proteins, may be used to treat diseases in kinase activity, cell adhesion receptor, neuroligin, calcium which the transfer of a chemical group from one molecule to dependent cell adhesion molecule, selectin, calcium-inde another is abnormal. Antibodies and polynucleotides such as pendent cell adhesion molecule, and extracellular matrix PCR primers and molecular probes designed to identify such protein. proteins or protein encoding sequences may be used for 0482 Pharmaceutical compositions including such pro diagnosis of Such diseases. teins or protein encoding sequences, antibodies directed 0475 Examples of such diseases include, but are not against Such proteins or polynucleotides capable of altering limited to cancerous diseases such as prostate cancer Urol expression of Such proteins, may be used to treat diseases in ogy. 2003, 62(5 Suppl 1):55-62 or lung cancer Invest. New which adhesion between adjoining cells is involved, typi Drugs. 2003, 21 (4):435-43: JAMA. 2003, 22:290(16):2149 cally conditions in which the adhesion is abnormal. Anti 58), psychiatric disorders Am. J. Med. Genet. 2003, bodies and polynucleotides such as PCR primers and 15:123B(1):64-9), colorectal disease such as Crohnis dis molecular probes designed to identify such proteins or ease Dis. Colon Rectum. 2003, 46(11): 1498–507 or celiac protein encoding sequences may be used for diagnosis of diseases N-Engl. J. Med. 2003, 349(17): 1673-4; author such diseases. reply 1673-4), neurological diseases such as Prkinson's 0483 Examples of such diseases include, but are not disease J. Chem Neuroanat. 2003, 26(2): 143-51), Alzhe limited to cancer in which abnormal adhesion may cause and imer disease Hum. Mol. Genet. 2003 21 or Charcot-Marie enhance the process of metastasis and abnormal growth and Tooth Disease Mol. Biol. Evol. 2003 31). development of various tissues in which modulation adhe 0476 Chaperones: sion among adjoining cells can improve the condition. Leucocyte-endothlial interactions characterized by adhesion 0477 The term “chaperones' refers to functional classes molecules involved in interactions between cells lead to a of unrelated families of proteins that assist the correct tissue injury and ischemia reperfiision disorders in which non-covalent assembly of other polypeptide-containing activated signals generated during ischemia may trigger an structures in Vivo, but are not components of these exuberant inflammatory response during reperfusion, pro assembled structures when they a performing their normal Voking greater tissue damage than initial ischemic insult biological function. The group of chaperones include pro Crit. Care Med. 2002, 30(5 Suppl):S214-9). The blockade teins such as ribosomal chaperone, peptidylprolyl of leucocyte-endothelial adhesive interactions has the poten isomerase, -lectin-binding chaperone, nucleosome assembly tial to reduce vascular and tissue injury. This blockade may chaperone, ATPase, cochaperone, heat shock be achieved using a soluble variant of the adhesion mol protein, HSP70/HSP90 organizing protein, fimbrial chaper ecule. one, metallochaperone, tubulin folding, and HSC70-inter acting protein. 0484 States of septic shock and ARDS involve large recruitment of neutrophil cells to the damaged tissues. 0478 Pharmaceutical compositions including such pro Neutrophil cells bind to the endothelial cells in the target teins or protein encoding sequences, antibodies directed tissues-through adhesion molecules. Neutrophils possess against Such proteins or polynucleotides capable of altering multiple effector mechanisms that can produce endothelial expression of Such proteins, may be used to treat diseases and lung tissue injury, and interfere with pulmonary gas which are associated with abnormal protein activity, struc transfer by disruption of surfactant activity Eur. J. Surg. ture, degradation or accumulation of proteins. Antibodies 2002, 168(4):204-14). In such cases, the use of soluble and polynucleotides such as PCR primers and molecular variant of the adhesion molecule may decrease the adhesion probes designed to identify such proteins or protein encod of neutrophils to the damaged tissues. ing sequences may be used for diagnosis of Such diseases. 0485 Examples of such diseases include, but are not 0479. Examples of such diseases include, but are not limited to, Wiskott-Aldrich syndrome associated with WAS limited to neurological syndromes J. Neuropathol. Exp. deficiency Westerberg L., et al., (2001) Blood, 98(4): 1086 Neurol. 2003, 62(7):751-64; Antioxid Redox Signal. 2003, 94), asthma associated with intercellular adhesion mol 5(3):337-48; J. Neurochem. 2003, 86(2):394-404), neuro ecule-1 deficiency Tang M. L. and Fiscus L. C., (2001) logical diseases such as Parkinson's disease Hum. Genet. Pulm. Pharmacol. Ther., 14(3):203-10), intra-atrial throm 2003, 6: Neurol Sci. 2003, 24(3):159-60; J. Neurol. 2003, bogenesis associated with increased von Willebrand factor US 2006/0068405 A1 Mar. 30, 2006 36 activity Fukuchi M., et al., (2001) J. Am. Coll. Cardiol. lymphoproliferative disease inhibited by combined GM 37(5):1436-42, junctional epidermolysis bullosa associated CSF and IL-2 therapy Baiocchi R. A., et al., (2001) J. Clin. with laminin 5-3-3 deficiency Robbins P. B., et al., (2001) Invest. 108(6):887–94), multiple sclerosis in which recom Proc. Natl. Acad. Sci., 98(9):5193-8), and hydrocephalus binant proteins from the interferons family are the treatment caused by neural adhesion molecule L1 deficiency-Rolf B., of choice and sepsis in which activated protein C is a et al., (2001) Brain Res., 891 (1-2):247-52). therapeutic protein itself. 0486 Motor Proteins: 0494 Intracellular Transporters: 0487. The term “motor proteins’ refers to proteins that 0495. The term “intracellular transporters' refers to pro generate force or energy by the hydrolysis of ATP and that teins that mediate the transport of molecules and macromol function in the production of intracellular movement or ecules inside the cell. Such as intracellular nucleoside trans transportation. Examples, of Such proteins include porter, vacuolar assembly proteins, Vesicle transporters, microfilament motor, axonemal motor, microtubule motor, vesicle fusion proteins, type II protein secretors. and kinetochore motor (, , or ). 0496 Pharmaceutical compositions including such pro 0488 Pharmaceutical compositions including such pro teins or protein encoding sequences, antibodies directed teins or protein encoding sequences, antibodies directed against Such proteins or polynucleotides capable of altering against Such proteins or polynucleotides capable of altering expression of Such proteins, may be used to treat diseases in expression of Such proteins, may be used to treat diseases in which the transport of molecules and macromolecules is which force or energy generation is impaired. Antibodies abnormal leading to various pathologies. Antibodies and and polynucleotides such as PCR primers and molecular polynucleotides such as PCR primers and molecular probes probes designed to identify such proteins or protein encod designed to identify such proteins or protein encoding ing sequences may be used for diagnosis of Such diseases. sequences may be used for diagnosis of Such diseases. 0489 Examples of such diseases include, but are not 0497 Transporters: limited to, malignant diseases where microtubules are drug 0498. The term “transporters' refers to proteins that targets for a family of anticancer drugs such as myodystro mediate the transport of molecules and macromolecules, phies and myopathies Trends Cell Biol. 2002, 12(12):585 Such as channels, exchangers, and pumps. Transporters 91), neurological disorders Neuron. 2003, 25:40(1):25-40; include proteins such as: amine/polyamine transporter, lipid Trends Biochem. Sci. 2003, 28(10):558-65; Med. Genet. transporter, neurotransmitter transporter, organic acid trans 2003, 40(9):671-5), and hearing impairment Trends Bio porter, oxygen transporter, water transporter, carriers, intra chem. Sci. 2003, 28(10):558-65). cellular transports, protein transporters, ion transporters, 0490 Defense/Immunity Proteins: carbohydrate transporter, polyol transporter, amino acid transporters, vitamin/ transporters, siderophore 0491. The term “defense/immunity proteins’ refers to transporter, drug transporter, channel/pore class transporter, proteins that are involved in the immune and complement group translocator, auxiliary transport proteins, permeases, systems such as acute-phase response proteins, antimicro murein transporter, organic alcohol transporter, nucleobase, bial peptides, antiviral response proteins, blood coagulation nucleoside, and nucleotide and nucleic acid transporters. factors, complement components, immunoglobulins, major histocompatibility complex antigens and opsonins. 0499 Pharmaceutical compositions including such pro teins or protein encoding sequences, antibodies directed 0492 Pharmaceutical compositions including such pro against Such proteins or polynucleotides capable of altering teins or protein encoding sequences, antibodies directed expression of Such proteins, may be used to treat diseases in against Such proteins or polynucleotides capable of altering which the transport of molecules and macromolecules Such expression of Such proteins, may be used to treat diseases as neurotransmitters, hormones, Sugar etc. is impaired lead involving the immunological system including inflamma ing to various pathologies. Antibodies and polynucleotides tion, autoimmune diseases, infectious diseases, as well as such as PCR primers and molecular probes designed to cancerous processes or diseases which are manifested by identify Such proteins or protein encoding sequences may be abnormal coagulation processes, which may include abnor used for diagnosis of Such diseases. mal bleeding or excessive coagulation. Antibodies and poly nucleotides such as PCR primers and molecular probes 0500 Examples of such diseases include, but are not designed to identify Such proteins or protein encoding limited to, glycogen storage disease caused by glucose-6- sequences may be used for diagnosis of Such diseases. phosphate transporter deficiency Hiraiwa H., and Chou J. Y. (2001) DNA Cell Biol., 2008):447-53), tangier disease 0493 Examples of such diseases include, but are not associated with ATP-binding cassette transporter-1 defi limited to, late (C5-9) complement component deficiency ciency McNeish J., et al., (2000) Proc. Natl. Acad. Sci., associated with opsonin receptor allotypes Fijeen C. A., et 97(8):4245-50), systemic primary carmitine deficiency asso al., (2000) Clin. Exp. Immunol. 120(2):338-45), combined ciated with organic cation transporter deficiency Tang N. immunodeficiency associated with defective expression of L., et al., (1999) Hum. Mol. Genet., 8(4):655-60), Wilson MHC class II genes Griscelli C., et al., (1989) Immuno disease associated with copper-transporting defi defic. Rev. 1(2):135-53), loss of antiviral activity of CD4 T ciency Payne A. S., et al., (1998) Proc. Natl. Acad. Sci. cells caused by neutralization of endogenous TNFC Pavic I. 95(18): 10854-9), and atelosteogenesis associated with et al., (1993) J. Gen. Virol. 74 (Pt 10):2215-23), autoim diastrophic dysplasia Sulphate transporter deficiency New mune diseases associated with natural resistance-associated bury-Ecob R., (1998) J. Med. Genet., 35(1):49-53), Central macrophage protein deficiency Evans C. A., et al., (2001) Nervous system diseases treated by inhibiting neurotrans Neurogenetics, 3(2): 69-78), Epstein-Barr virus-associated mitter transporter (e.g. Depression, treated with serotonin US 2006/0068405 A1 Mar. 30, 2006 37 transporters inhibitors—Prozac), and Cystic fibrosis medi against Such proteins or polynucleotides capable of altering ated by the chloride channel CFTR. Other transporter related expression of Such proteins, may be used to treat diseases in diseases are cancer Oncogene. 2003, 22(38):6005-12 and which actin binding is impaired. Antibodies and polynucle especially cancer resistant to treatment Oncologist. 2003, otides such as PCR primers and molecular probes designed 8(5):411-24; J. Med. Invest. 2003, 50(3–4): 126-35), infec to identify Such proteins or protein encoding sequences may tious diseases, especially fingal infections Annu. Rev. Phy be used for diagnosis of Such diseases. topathol. 2003, 41:641-67), neurological diseases, such as 05.09 Examples of such diseases include, but are not Parkinson FASEB J. 2003, September 4 Epub ahead of limited to, neuromuscular diseases such as muscular dys print, diabetes where ATP-sensitive potassium channel in trophy Neurology. 2003, 61 (3):404-6), Cancerous diseases beta cells is the target for insulin secretagogues, hyperten Urology. 2003, 61(4):845-50; J. Cutan. Pathol. 2002, sion where calcium channels are the target for calcium 29(7):430; Cancer. 2002, 94(6):1777-86: Clin. Cancer Res. blockers, and cardiovascular diseases, including hypercho 2001, 7(8):2415-24; Breast Cancer Res. Treat. 2001, lesterolemia Am. J. Cardiol. 2003, 92(4B):10K-16K). 65(1):11-21), renal diseases such as glomerulonephritis J. 0501) There are about 30 membrane transporter genes Am. Soc. Nephrol. 2002, 13(2):322-31; Eur. J. Immunol. linked to a known genetic clinical syndrome. Secreted 2001, 31(4): 1221-7), and gastrointestinal diseases such as versions of splice variants of transporters may be therapeutic Crohn's disease J. Cell Physiol. 2000, 182(2):303-9). as the case with soluble receptors. These transporters may 0510) Protein Binding Proteins: have the capability to bind the compound in the serum they would normally bind on the membrane. For example, a 0511. The phrase “protein, binding proteins’ refers to secreted form ATP7B, a transporter involved in Wilson's proteins involved in diverse biological functions through disease, is expected to bind plasma Copper, therefore have binding other proteins. Examples of such biological function include intermediate filament binding, LIN4-domain bind a desired therapeutic effect in Wilson's disease. ing, LLR-domain binding, clathrin binding, ARF binding, 0502 : Vinculin binding, KU70 binding, troponin C binding PDZ domain binding, SH3-domain binding, fibroblast growth 0503. The term “lyases” refers to enzymes that catalyze factor binding, membrane-associated protein with guanylate the formation of double bonds by removing chemical groups kinase activity interacting, Wnt-protein binding, DEAD/H- from a substrate without hydrolysis or catalyze the addition box RNA helicase binding, B-amyloid binding, myosin of chemical groups to double bonds. It includes enzymes binding, TATA-binding protein binding DNA topoisomerase such as carbon-carbon , carbon-oxygen lyase, carbon I binding, polypeptide hormone binding, RHO binding, nitrogen lyase, carbon-sulfur lyase, carbon-halide lyase, and FH1-domain binding, syntaxin-1 binding, HSC70-interact phosphorus-oxygen lyase. ing, transcription factor binding, metarhodopsin binding, 0504 Pharmaceutical compositions including such pro tubulin binding, JUN kinase binding, protein binding, teins or protein encoding sequences, antibodies directed protein signal sequence binding, importin C. export receptor, against Such proteins or polynucleotides capable of altering poly-glutamine tract binding, protein carrier, B-catenin bind expression of Such proteins, may be used to treat diseases in ing, protein C-terminus binding, lipoprotein binding, cytosk which the double bonds formation catalyzed by these eletal protein binding protein, nuclear localization sequence enzymes is impaired. Antibodies and polynucleotides Such binding, protein phosphatase 1 binding, adenylate cyclase as PCR primers and molecular probes designed to identify binding, eukaryotic initiation factor 4E binding, calmodulin Such proteins or protein encoding sequences may be used for binding, collagen binding, insulin-like growth factor bind diagnosis of Such diseases. ing, lamin binding, profilin binding, tropomyosin binding, actin binding, peroxisome targeting sequence binding, 0505 Examples of such diseases include, but are not SNARE binding, and cyclin binding. limited to, autoimmune diseases JAMA. 2003, 290(13): 1721-8: JAMA. 2003, 290(13): 1713-20), diabetes 0512 Pharmaceutical compositions including such pro Diabetes. 2003, 52(9):2274-8), neurological disorders such teins or protein encoding sequences, antibodies directed as epilepsy J. Neurosci. 2003, 23(24):8471-9). Parkinson J. against Such proteins or polynucleotides capable of altering Neurosci. 2003, 23(23):8302-9: Lancet. 2003, expression of Such proteins, may be used to treat diseases 362(9385):712 or Creutzfeldt-Jakob disease Clin. Neuro which are associated with impaired protein binding. Anti physiol. 2003, 114(9): 1724-8), and cancerous diseases J. bodies and polynucleotides such as PCR primers and Pathol. 2003, 201(1):37-45: J. Pathol. 2003, 201(1):37-45: molecular probes designed to identify such proteins or Cancer Res. 2003, 63(16):4952-9: Eur. J. Cancer. 2003, protein encoding sequences may be used for diagnosis of 39(13): 1899-903). Such diseases. 0513. Examples of such diseases include, but are not 0506 Actin Binding Proteins: limited to, neurological and psychiatric diseases J. Neuro 0507 The phrase “actin binding proteins’ refers to pro sci. 2003, 23(25):8788-99; Neurobiol. Dis. 2003, 14(1): 146 teins binding actin as actin cross-linking, actin bundling, 56; J. Neurosci. 2003, 23(17):6956-64; Am. J. Pathol. 2003, F-actin capping, actin monomer binding, actin lateral bind 163(2):609-19), and cancerous diseases Cancer Res. 2003, ing, actin depolymerizing, actin monomer sequestering, 63(15):4299-304; Semin. Thromb. Hemosf. 2003, actin filament severing, actin modulating, membrane asso 29(3):247-58; Proc. Natl. Acad. Sci. U S A. 2003, ciated actin binding, actin thin filament length regulation, 100(16):9506-11). and actin polymerizing proteins. 0514 Ligand Binding or Carrier Proteins: 0508 Pharmaceutical compositions including such pro 0515. The phrase “ligand binding or carrier proteins' teins or protein encoding sequences, antibodies directed refers to proteins involved in diverse biological functions US 2006/0068405 A1 Mar. 30, 2006

Such as pyridoxal phosphate binding, carbohydrate binding, 144(10):4478-83), metabolic diseases Mol. Pathol. 2003, magnesium binding, amino acid binding, cyclosporin A 56(5):302-4; Neurosci. Lett. 2003,350(2):105-8), and peptic binding, nickel binding, chlorophyll binding, biotin binding, ulcer disease treated with inhibitors of the gastric H-K" penicillin binding, selenium binding, tocopherol binding, ATPase (e.g. Omeprazole) responsible for acid secretion in lipid binding, drug binding, oxygen transporter, electron the gastric mucosa. transporter, Steroid binding, juvenile hormone binding, ret inoid binding, heavy metal binding, calcium binding, protein 0522 Carboxylic Ester Hydrolases: binding, glycosaminoglycan binding, folate binding, odor 0523 The phrase carboxylic ester hydrolases” refers to ant binding, lipopolysaccharide binding and nucleotide hydrolytic enzymes acting on carboxylic ester bonds such as binding. N-acetylglucosaminylphosphatidylinositol deacetylase, 0516 Pharmaceutical compositions including such pro 2-acetyl-1-alkylglycerophosphocholine esterase, aminoa teins or protein encoding sequences, antibodies directed cyl-tRNA hydrolase, arylesterase, carboxylesterase, cho against Such proteins or polynucleotides capable of altering linesterase, gluconolactonase, Sterol esterase, acetylesterase, expression of Such proteins, may be used to treat diseases carboxymethylenebutenolidase, protein-glutamate methyl which are associated with impaired function of these pro esterase, lipase, and 6-phosphogluconolactonase. teins. Antibodies and polynucleotides such as PCR primers 0524 Pharmaceutical compositions including such pro and molecular probes designed to identify such proteins or teins or protein encoding sequences, antibodies directed protein encoding sequences may be used for diagnosis of against Such proteins or polynucleotides capable of altering Such diseases. expression of Such proteins, may be used to treat diseases in which the hydrolytic cleavage of a covalent bond with 0517 Examples of such diseases include, but are not accompanying addition of water (—H being added to one limited to, neurological disorders J. Med. Genet. 2003, product of the cleavage and —OH to the other) is abnormal 40(10):733-40; J. Neuropathol. Exp. Neurol. 2003, so that a beneficial effect may be achieved by modulation of 62(9):968-75; J. Neurochem. 2003, 87(2):427-36), autoim such reaction. Antibodies and polynucleotides such as PCR mune diseases (N. Engl. J. Med. 2003, 349(16): 1526-33; primers and molecular probes designed to identify Such JAMA. 2003, 290(13): 1721-8); gastroesophageal reflux dis proteins or protein encoding sequences may be used for ease Dig. Dis. Sci. 2003, 48(9): 1832-8), cardiovascular diseases J. Vasc. Surg. 2003, 38(4):827-32), cancerous diagnosis of Such diseases. diseases Oncogene. 2003, 22(43):6699-703: Br. J. Haema 0525) Examples of such diseases include, but are not tol. 2003, 123(2):288-96), respiratory diseases Circulation. limited to, autoimmune neuromuscular disease Myasthenia 2003, 108(15): 1839-44), and ophtalmic diseases Ophthal Gravis, treated with cholinesterase inhibitors. mology. 2003, 110(10):2040-4: Am. J. Ophthalmol. 2003, 136(4):729-32). 0526 Hydrolase, Acting on Ester Bonds: 0527 The phrase “hydrolase, acting on ester bonds' 0518 ATPases: refers to hydrolytic enzymes acting on ester bonds such as 0519) The term “ATPases” refers to enzymes that cata nucleases, Sulfuric ester hydrolase, carboxylic ester hydro lyze the hydrolysis of ATP to ADP releasing energy that is lases, thiolester hydrolase, phosphoric monoester hydrolase, used in the cell. This group include enzymes such as plasma phosphoric diester hydrolase, triphosphoric monoester membrane cation-transporting ATPase, ATP-binding cas hydrolase, diphosphoric monoester hydrolase, and phospho sette (ABC) transporter, magnesium-ATPase, hydrogen-f ric triester hydrolase. Sodium-translocating ATPase or ATPase translocating any 0528 Pharmaceutical compositions including such pro other elements, arsenite-transporting ATPase, protein-trans teins or protein encoding sequences, antibodies directed porting ATPase; DNA , P-type ATPase, and against Such proteins or polynucleotides capable of altering hydrolase, acting on acid anhydrides involved in cellular and expression of Such proteins, may be used to treat diseases in subcellular movement. which the hydrolytic cleavage of a covalent bond with 0520 Pharmaceutical compositions including such pro accompanying addition of water (—H being added to one teins or protein encoding sequences, antibodies directed product of the cleavage and —OH to the other), is abnormal. against Such proteins or polynucleotides capable of altering Antibodies and polynucleotides such as PCR primers and expression of Such proteins, may be used to treat diseases molecular probes designed to identify such proteins or which are associated with impaired conversion of the protein encoding sequences may be used for diagnosis of hydrolysis of ATP to ADP or resulting energy use. Antibod Such diseases. ies and polynucleotides such as PCR primers and molecular probes designed to identify such proteins or protein encod 0529) Hydrolases: ing sequences may be used for diagnosis of Such diseases. 0530. The term “hydrolases” refers to hydrolytic enzymes such as GPI-anchor transamidase, peptidases, 0521 Examples of such diseases include, but are not hydrolases, acting on ester bonds, glycosyl bonds, ether limited to, infectious diseases Such as helicobacter pylori bonds, carbon-nitrogen (but not peptide) bonds, acid anhy ulcers BMC Gastroenterology 2003, 3:31 (published 6 Nov. drides, acid carbon-carbon bonds, acid halide bonds, acid 2003). Neurological, muscular and psychiatric diseases phosphorus-nitrogen bonds, acid Sulfur-nitrogen bonds, acid Int. J. Neurosci. 2003, 13(12): 1705-1717: Int. J. Neurosci. carbon-phosphorus bonds, acid Sulfur-sulfur bonds. 2003, 113(11): 1579-1591; Ann. Neurol. 2003, 54(4):494 500), Amyotrophic Lateral Sclerosis. Other Motor Neuron 0531 Pharmaceutical compositions including such pro Disord. 2003 4(2):96-9), cardiovascular diseases J. Nippon. teins or protein encoding sequences, antibodies directed Med. Sch. 2003, 70(5):384-92; Endocrinology. 2003, against Such proteins or polynucleotides capable of altering US 2006/0068405 A1 Mar. 30, 2006 39 expression of Such proteins, may be used to treat diseases in thyl-D-erythritol 2.4-cyclodiphosphate synthase, 3.4 which the hydrolytic cleavage of a covalent bond with dihydroxy-2-butanone-4-phosphate synthase, 4-amino-4- accompanying addition of water (—H being added to one deoxychorismate lyase, 4-diphosphocytidyl-2C-methyl-D- product of the cleavage and —OH to the other) is abnormal. erythritol synthase, ADP-L-glycero-D-manno-heptose syn Antibodies and polynucleotides such as PCR primers and thase, D-erythro-7,8-dihydroneopterin triphosphate molecular probes designed to identify such proteins or 2-epimerase, N-ethylmaleimide reductase, O-antigen ligase, protein encoding sequences may be used for diagnosis of O-antigen polymerase, UDP-2,3-diacylglucosamine hydro Such diseases. lase, arsenate reductase, carnitine racemase, cobalamin 5'- phosphate synthase, cobinamide phosphate guanylyltrans 0532. Examples of such diseases include, but are not ferase, enterobactin synthetase, enterochelin esterase, limited to, cancerous diseases Cancer. 2003, 98(9): 1842-8; enterochelin synthetase, glycolate oxidase, integrase, lau Cancer. 2003, 98(9): 1822-9, neurological diseases such as royl transferase, peptidoglycan synthetase, phosphopanteth Parkinson diseases J. Neurol. 2003, 250 Suppl3:III5-III24; einyltransferase, phosphoglucosamine mutase, phosphohep J. Neurol. 2003, 250 Suppl 3:III2-III10), endocrinological tose isomerase, quinolinate synthase, siroheme synthase, diseases such as pancreatitis Pancreas. 2003, 27(4): 291-6) N-acylmannosamine-6-phosphate 2-epimerase, N-acetyl or childhood genetic diseases Eur. J. Pediatr. 1997, anhydromuramoyl-L-alanine amidase, carbon-phosphorous 156(12):935-8), coagulation diseases BMJ. 2003, lyase, heme-copper terminal oxidase, disulfide oxidoreduc 327(7421):974-7), cardiovascular diseases Ann. Intern. tase, phthalate dioxygenase reductase, sphingosine-1-phos Med. October 2003, 139(8):670-82), autoimmunity diseases phate lyase, molybdopterin oxidoreductase, dehydrogenase, J. Med. Genet. 2003, 40(10):761-6), and metabolic dis NADPH oxidase, naringenin-chalcone synthase, N-ethy eases. Am. J. Hum. Genet. 2001, 69(5):1002-12). lammeline chlorohydrolase, polyketide synthase, aldolase, 0533. Enzymes: kinase, phosphatase, CoA-ligase, oxidoreductase, trans ferase, hydrolase, lyase, isomerase, ligase, ATPase, Sulfhy 0534. The term “enzymes' refers to naturally occurring dryl oxidase, lipoate-protein ligase, 6-1-pyrroline-5-car or synthetic macromolecular Substance composed mostly of boxyate synthetase, lipoic acid synthase, and tRNA protein, that catalyzes, to various degree of specificity, at dihydrouridine synthase. least one (bio)chemical reactions at relatively low tempera tures. The action of RNA that has catalytic activity 0536 Pharmaceutical compositions including such pro (ribozyme) is often also regarded as enzymatic. Neverthe teins, or protein encoding sequences, antibodies directed less, enzymes are mainly proteinaceous and are often easily against such proteins or polynucleotides capable of altering inactivated by heating or by protein-denaturing agents. The expression of Such proteins, may be used to treat diseases Substances upon which they act are known as Substrates, for which can be ameliorated by modulating the activity of which the enzyme possesses a specific binding or active site. various enzymes which are involved both in enzymatic processes inside cells as well as in cell signaling; Antibodies 0535 The group of enzymes include various proteins and polynucleotides such as PCR primers and molecular possessing enzymatic activities such as mannosylphosphate probes designed to identify such proteins or protein encod transferase, para-hydroxybenzoate:polyprenyltransferase, ing sequences may be used for diagnosis of Such diseases. rieske iron-sulfur protein, imidazoleglycerol-phosphate Syn thase, sphingosine hydroxylase, tRNA 2'-phosphotrans 0537 Examples of such diseases include, but, are not ferase, sterol C-24(28) reductase, C-8 sterol isomerase, limited to diabetes where alpha-glucosidase is the target for C-22 sterol desaturase, C-14 sterol reductase, C-3 sterol drugs which delay glucose absorption, Osteoporosis where dehydrogenase (C-4 sterol decarboxylase), 3-keto sterol farnsesyl diphosphate synthase is the target for bisphospho reductase, C-4 methyl sterol oxidase, dihydronicotinamide nates, thyroid autoimmune disease associated with thyroid riboside quinone reductase, glutamate phosphate reductase, peroxidase, MUCOPOLYSACCHARIDOSES associated DNA repair enzyme, telomerase, C.-ketoacid dehydrogenase, with defects in lysosomal enzymes, Tay-Sachs Disease B-alanyl-dopamine synthase, RNA editase, aldo-keto reduc associated with defects in b-hexosaminidase and hyperten tase, alkylbase DNA glycosidase, glycogen debranching sion where Angiotensin Converting Enzyme is the target for enzyme, dihydropterin deaminase, dihydropterin oxidase, the common hypertension drugs—ACE inhibitors. dimethylnitrosamine demethylase, ecdysteroid UDP-gluco 0538 Cytoskeletal Proteins: syWIUDPgliucuronosyl transferase, glycine cleavage sys tem, hehicase, histone deacetylase, mievaldate reduictase, 0539. The term “cytoskeletal proteins” refers to proteins monooxygenase, poly(ADP-ribose).-glycohydrolase, pyru involved in the structure formation of the cytoskeleton. vate- dehydrogenase, serine esterase, Sterol carrier protein 0540 Pharmaceutical compositions including such pro X-related thiolase, transposase, tyramine-Bhydroxylase, teins or protein encoding sequences, antibodies directed para-aminobenzoic acid (PABA) synthase, glu-tRNA(gln) against Such proteins or polynucleotides capable of altering amidotransferase, molybdopterin cofactor Sulfurase, lanos terol 14-O-demethylase, aromatase, 4-hydroxybenzoate expression of Such proteins, may be used to treat diseases octaprenyltransferase.- 7,8-dihydro-8-oxoguanine-triphos which are caused or due to abnormalities in cytoskeleton, phatase, CDP-alcohol phosphotransferase, 2,5-diamino-6- including cancerous cells, and diseased cells Such as cells (ribosylamino)-4(3H)-pyrimidonone 5'-phosphate deami that do not propagate, grow or function normally. Antibodies nase, diphosphoinositol polyphosphate phosphohydrolase, and polynucleotides such as PCR primers and molecular Y-glutamyl carboxylase, Small protein conjugating enzyme, probes designed to, identify Such proteins or protein encod Small protein activating enzyme, 1-deoxyxylulose-5-phos ing sequences may be used for diagnosis of Such diseases. phate synthase, 2'-phosphotransferase, 2-octoprenyl-3-me 0541. Examples of such diseases include, but are not thyl-6-methoxy- 1,4-benzoquinone hydroxylase, 2C-me limited to, liver diseases such as cholestatic diseases Lan US 2006/0068405 A1 Mar. 30, 2006 40 cet. 2003, 362(9390): 1112-9), vascular diseases J. Cell 0548 Pharmaceutical compositions including such pro Biol. 2003, 162(6): 1111-22), endocrinological diseases teins or protein encoding sequences, antibodies directed Cancer. Res. 2003, 63(16):4836-41), neuromuscular dis against Such proteins or polynucleotides capable of altering orders such as muscular dystrophy Neuromuscul. Disord. expression of Such proteins, may be used to treat diseases 2003, 13(7-8):579-88), or myopathy Neuromuscul. Disord. involved in impaired hormone function or diseases which 2003, 13(6):456-67 neurological disorders such as Alzhe involve abnormal secretion of proteins which may be due to imer's disease J. Alzheimers Dis. 2003, 5(3):209-28), car abnormal presence, absence or impaired normal response to diac disorders J. Am. Coll. Cardiol. 2003, 42(2):319-27). normal levels of secreted proteins. Those secreted proteins skin disorders J. Am. Coll. Cardiol. 2003, 42(2):319-27), include hormones, neurotransmitters, and various, other and cancer Proteomics. 2003, 3(6):979-90). proteins secreted by cells to the extracellular environment. Antibodies and polynucleotides such as-PCR primers and 0542 Structural Proteins: molecular probes, designed to identify Such proteins or 0543. The term “structural proteins’ refers to proteins protein encoding sequences may be used for diagnosis of involved in the structure formation of the cell, such as Such diseases. structural proteins of ribosome, cell wall structural proteins, 0549. Examples of such diseases include, but are not structural proteins of cytoskeleton, extracellular matrix limited to, analgesia inhibited by orphanin FQ/nociceptin structural proteins, extracellular matrix glycoproteins, amy Shane R., et al., (2001) Brain Res., 907(1-2):109-16), loid proteins, plasma proteins, structural proteins of eye stroke protected by estrogen Alkayed N.J., et al., (2001) J. lens, structural protein of chorion (sensu Insecta), structural protein of cuticle (sensu Insecta), puparial glue protein Neurosci., 21 (19):7543-50, atherosclerosis associated with (sensu Diptera), structural proteins of bone, yolk proteins, growth hormone deficiency. Elhadd T. A., et al., (2001) J. structural proteins of muscle, structural protein of Vitelline Clin. Endocrinol. Metab., 86(9):4223-32), diabetes inhibited membrane (sensu Insecta), structural proteins of peritrophic by C.-galactosylceramide Hong S., et al., (2001) Nat. Med., membrane (sensu Insecta), and structural proteins of nuclear 7(9): 1052-6), and Huntington's disease Rao D. S., et al., pores. (2001) Mol. Cell Biol., 21 (22):7796-806). 0544 Pharmaceutical compositions including such pro 0550 Signal Transducer: teins or protein encoding sequences, antibodies directed 0551. The term “signal transducers' refers to proteins against Such proteins or polynucleotides capable of altering Such as activin inhibitors, receptor-associated proteins, C-2 expression of such proteins, may be used to treat diseases macroglobulin receptors, morphogens, quorum sensing sig which are caused by abnormalities in cytoskeleton, includ nal generators, quorum sensing response regulators, receptor ing cancerous cells, and diseased cells Such as cells that do signaling proteins, ligands, receptors, two-component sen not propagate, grow or function normally. Antibodies and Sor molecules, and-two-component response regulators. polynucleotides such as PCR primers and molecular probes designed to identify Such proteins or protein encoding 0552) Pharmaceutical compositions including such pro sequences may be used for diagnosis of Such diseases. teins or protein encoding sequences, antibodies directed against Such proteins or polynucleotides capable of altering 0545 Examples of such diseases include, but are not expression of Such proteins, may be used to treat diseases in limited to, blood vessels diseases such as aneurysms Car which the signal-transduction is impaired, either as a cause, diovasc. Res. 2003, 60(1):205-13), joint diseases Rheum. or as a result of the disease. Antibodies and polynucleotides Dis. Clin. North Am. 2003, 29(3):631-45), muscular dis such as PCR primers and molecular probes designed to eases such as muscular dystrophies Curr. Opin. Clin. Nutr. identify Such proteins or protein encoding sequences may be Metab. Care. 2003, 6(4):435-9), neuronal diseases such as used for diagnosis of Such diseases. encephalitis Neurovirol. 2003, 9(2):274-83), retinitis pig mentosa Dev. Ophthalmol. 2003, 37:109-25), and infec 0553 Examples of such diseases include, but are not tious diseases J. Virol. Methods. 2003, 109(1): 75-83: limited to, altered sexual dimorphism associated with signal FEMS Immunol. Med. Microbiol. 2003, 35(2):125-30; J. transducer and activator of transcription 5b Udy G. B., et Exp. Med. 2003, 197(5):633-42). al., (1997) Proc. Natl. Acad. Sci. U S A. 94(14):7239-44), multiple sclerosis associated with sgp130 deficiency Pad 0546 Ligands: berg F., et al., (1999) J. Neuroimmunol. 99(2):218-23), 0547. The term “ligands' refers to proteins that bind to intestinal inflammation associated with elevated signal another chemical entity to form a larger complex, involved transducer and activator of transcription 3 activity Suzuki in various biological processes, such as signal transduction, A., et al., (2001) J Exp Med, 193(4):471-81), carcinoid metabolism, growth and differentiation, etc. This group of tumor inhibited by increased signal transducer and activators proteins includes opioid peptides, baboon receptor ligand, of transcription 1 and 2 Zhou Y., et al. (2001) Oncology, branchless receptor ligand, breathless receptor ligand, eph 60(4):330-8), and esophageal cancer associated with loss of rin, frizzled receptor ligand, frizzled-2 receptor ligand, EGF-STAT1 pathway Watanabe G. et al., (2001) Cancer J., heartless receptor ligand, Notch receptor ligand, patched 7(2): 132-9). receptor ligand, punt receptor ligand, Ror receptor ligand, 0554 RNA polymerase II Transcription Factors: saxophone receptor ligand, SE20 receptor ligand, sevenless receptor ligand, Smooth receptor ligand, thickveins receptor 0555. The phrase “RNA polymerase II transcription fac ligand, Toll receptor ligand, Torso receptor ligand, death tors' refers to proteins such as specific and non-specific receptor ligand, Scavenger receptor ligand, neuroligin, inte RNA polymerase II transcription factors, enhancer binding, grin ligand, hormones, pheromones, growth factors, and ligand-regulated transcription factor, and general RNA poly Sulfonylurea receptor ligand. merase II transcription factors. US 2006/0068405 A1 Mar. 30, 2006

0556 Pharmaceutical compositions including such pro , histones and nucleases, for example diseases teins or protein encoding sequences, antibodies directed where there is abnormal replication or transcription of DNA against Such proteins or polynucleotides capable of altering and RNA respectively. Antibodies and polynucleotides such expression of Such proteins, may be used to treat diseases as PCR primers and molecular probes designed to identify involving impaired function of RNA polymerase II tran Such proteins or protein encoding sequences may be used for Scription factors. Antibodies and polynucleotides such as diagnosis of Such diseases. PCR primers and molecular probes designed to identify such 0565) Examples of such diseases include, but are not proteins or protein encoding sequences may be used for limited to, neurological diseases such as renitis pigmentoas diagnosis of Such diseases. Am. J. Ophthalmol. 2003, 136(4):678-87 parkinsonism 0557 Examples of such diseases include, but are not Proc. Natl. Acad. Sci. U S A. 2003, 100(18): 10347-52), limited to, cardiac diseases Cell Cycle. 2003, 202):99-104). Alzheimer J. Neurosci. 2003, 23(17):6914-27 and canavan xeroderma pigmentosum Bioessays. 2001, 23(8):671-3: diseases Brain Res Bull. 2003, 61 (4):427-35, cancerous Biochim. Biophys. Acta. 1997, 1354(3):241-51), muscular diseases such as leukemia Anticancer Res. 2003, atrophy J. Cell Biol. 2001, 152(1):75-85), neurological 23(4):3419-26 or lung cancer J. Pathol. 2003, 200(5):640 diseases such as Alzheimer's disease. Front Biosci. 2000, 6), miopathy Neuromuiscul Disord. 2003, 13(7-8):559-67) 5:D244-57), cancerous diseases such as breast cancer Biol. and liver diseases J. Pathol. 2003, 200(5):553-60. Chem. 1999, 380(2): 117-28), and autoimmune disorders Clin. Exp. Immunol. 1997, 109(3):488-94). 0566) Proteins Involved in Metabolism: 0567 The phrase “proteins involved in metabolism” 0558 RNA Binding Proteins: refers to proteins involved in the totality of the chemical 0559) The phrase “RNA binding proteins” refers to RNA reactions and physical changes that occur in living organ binding proteins involved in splicing and translation regu isms, comprising anabolism and catabolism; may be quali lation such as tRNA binding proteins, RNA helicases, fied to mean the chemical reactions and physical processes double-stranded RNA and single-stranded RNA binding undergone by a particular Substance, or class of Substances, proteins, mRNA binding proteins, SnRNA cap binding pro in-a living organism. This group includes proteins involved teins, 5S RNA and 7S RNA binding proteins, poly-pyrimi in the reactions of cell growth and maintenance Such as: dine tract binding proteins, SnRNA binding proteins, and metabolism resulting in cell growth, carbohydrate metabo AU-specific RNA binding proteins. lism, energy pathways, electron transport, nucleobase, nucleoside, nucleotide and nucleic acid metabolism, protein 0560 Pharmaceutical compositions including such pro metabolism and modification, amino acid and derivative teins or protein encoding sequences, antibodies directed metabolism, protein targeting, lipid metabolism, aromatic against Such proteins or polynucleotides capable of altering compound metabolism, one-carbon compound metabolism, expression of Such proteins, may be used to treat diseases coenzymes and prosthetic group metabolism, Sulfur metabo involving transcription and translation factors such as heli lism, phosphorus metabolism, phosphate metabolism, oxy cases, isomerases, histones and nucleases, diseases where gen and radical metabolism, Xenobiotic metabolism, nitro there is impaired transcription, splicing, post-transcriptional gen metabolism, fat body metabolism (sensu Insecta), processing, translation or stability of the RNA. Antibodies protein localization, catabolism, biosynthesis, toxin metabo and polynucleotides such as PCR primers and molecular lism, methylglyoxal metabolism, cyanate metabolism, gly probes designed to identify such proteins or protein encod colate metabolism, carbon utilization and antibiotic metabo ing sequences may be used for diagnosis of Such diseases. lism. 0561 Examples of such diseases include, but are not 0568 Pharmaceutical compositions including such pro limited to, cancerous diseases such as lymphomas Tumori. teins or protein encoding sequences, antibodies directed 2003, 89(3):278-84), prostate cancer Prostate. 2003, against Such proteins or polynucleotides capable of altering 57(1):80-92) or lung cancer J. Pathol. 2003, 200(5):640-6) expression of Such proteins, may be used to treat diseases blood diseases, such as fanconianenia Curr. Hematol. Rep. involving cell metabolism. Antibodies and polynucleotides 2003, 204):335-40), cardiovascular diseases such as athero such as PCR primers and molecular probes designed to sclerosis J. Thromb. Haemost. 2003, 1(7): 1381-90 muscle identify Such proteins or protein encoding sequences may be diseases Trends Cardiovasc. Med. 2003, 13(5):188-95 and used for diagnosis of Such diseases. brain and neuronal diseases Trends Cardiovasc.Med. 2003, 13(5) 188-95; Neurosci. Lett. 2003, 342(1-2):41-4). 0569. Examples of such metabolism-related diseases include, but are not limited to, multisystem mitochondrial 0562) Nucleic Acid Binding Proteins: disorder caused by mitochondrial DNA cytochromie C oxi 0563 The phrase “nucleic acid binding proteins’ refers to dase II deficiency Campos Y., et al., (2001) Ann. Neurol. proteins involved in RNA and DNA synthesis and expres 50(3):409-13), conduction defects and ventricular dysfunc sion regulation such as transcription factors, RNA and DNA tion in the heart associated with heterogeneous cohnexin43 binding proteins, Zinc fingers, helicase, isomerase, histones, expression Gutstein D. E., et al., (2001) Circulation, nucleases, ribonucleoproteins, and transcription and trans 104(10): 1194-9), atherosclerosis associated with growth lation factors. Suppressor p27 deficiency Diez-Juan A., and Andres V. (2001) FASEB J., 15(11):1989-95), colitis associated with 0564 Pharmaceutical compositions including such pro glutathione peroxidase deficiency Esworthy R. S., et al., teins or protein encoding sequences, antibodies directed (2001) Am. J. Physiol. Gastrointest. Liver Physiol. against Such proteins or polynucleotides capable of altering 281 (3):G848-55 systemic lupus erythematosus associated expression of Such proteins, may be used to treat diseases with deoxyribonuclease I deficiency Yasutomo K., et al., involving DNA or RNA binding proteins such as: helicases, (2001) Nat. Genet., 28(4):313-4), alcoholic pancreatitis US 2006/0068405 A1 Mar. 30, 2006 42

Pancreas. 2003, 27(4):281-5), amyloidosis and diseases that multiple symptoms. Therapeutic mechanisms of Such vari are related to amyloid metabolism, such as FMF, athero ants may include: (i) sequestration of auto-antibodies to Sclerosis, diabetes, and especially diabetes long term con thereby reduce their circulating levels; (ii) antigen specific sequences neurological diseases Such as Creutzfeldt-Jakob immunotherapy—based on the observation that prior sys disease, and Parkinson or Rasmussen's encephalitis. temic administration of a protein antigen could inhibit the Subsequent generation of the immune response to the same 0570 Cell Growth and/or Maintenance Proteins: antigen (has been proved in mice models for Myasthenia 0571. The phrase “Cell growth and/or maintenance pro Gravis and type I Diabetes). teins’ refers to proteins involved in any biological process 0578. In addition, any novel variant of autoantigens (not required for cell Survival, growth and maintenance, includ necessarily secreted) may be used for “specific immunoad ing proteins involved in biological processes such as cell Sorption' leading to a specific immunodepletion of an organization and biogenesis, cell growth, cell proliferation, antibody when used in immunoadsorption columns. metabolism, cell cycle, budding, cell shape and cell size control, sporulation (sensu Saccharomyces), transport, ion 0579 Variants of autoantigens are also of a diagnostic homeostasis, autophagy, cell motility, chemi-mechanical value. The diagnosis of many autoimmune disorders is based coupling, membrane fusion, cell-cell fusion, and stress on looking for specific autoantibodies to autoantigens response. known to be associated with an autoimmune condition. Most of the diagnostic techniques are based on having a recom 0572 Pharmaceutical compositions including such pro binant form of the autoantigen and using it to screen for teins or protein encoding sequences, antibodies directed serum autoantibodies. However these antibodies may bind against Such proteins or polynucleotides capable of altering the variants of the present invention with a similar or expression of Such proteins, may be used to treat or prevent augmented affinity. For example, TPO is a known autoan diseases such as cancer, degenerative diseases, for example tigen in thyroid autoimmunity. It has been shown that its neurodegenerative diseases or conditions associated with variant TPOzanelli also take part in the autoimmune process aging, or alternatively, diseases wherein apoptosis which and can bind the same antibodies as TPO Biochemistry. should have taken place, does not take place. Antibodies and polynucleotides such as PCR prirmers and molecular probes 2001 February 27; 40(8):2572-9.). designed to identify Such proteins or protein encoding 0580. The nucleic acid sequences of the present inven sequences may be used for diagnosis of Such diseases, tion, the proteins encoded thereby and the cells and anti detection of pre-disposition to a disease, and determination bodies described hereinabove can be used in screening of the stage of a disease. assays, therapeutic or prophylactic methods of treatment, or predictive medicine (e.g., diagnostic and prognostic assays, 0573. Examples of such diseases include, but are not including those used to monitor clinical trials, and pharma limited to, ataxia-telangiectasia associated with ataxia-te cogenetics). langiectasia mutated deficiency Hande. et al., (2001) Hum. Mol. Genet., 10(5):519-28), osteoporosis associated with 0581 More specifically, the nucleic acids of the present osteonectin deficiency Delany et al., (2000) J. Clin. Invest. invention can be used to: (i) express a protein of the 105(7):915-23), arthritis caused by membrane-bound matrix invention in a host cell in culture or in an intact multicellular metalloproteinase deficiency Holmbeck et al., (1999) Cell, organism following, e.g., gene therapy; (ii) detect an 99(1):81-92), defective stratum corneum and early neonatal mRNA; or (iii) detect an alteration in a gene to which a death associated with transglutaminase 1 deficiency Mat nucleic acid of the invention specifically binds; or to modu suki et al., (1998) Proc. Natl. Acad. Sci. USA, 95(3):1044 late such a gene's activity. 9), and Alzheimer's disease associated with estrogen Sim 0582 The nucleic acids and proteins of the present inven pkins et al., (1997) Am. J. Med., 103(3A): 19S-25S). tion can also be used to treat disorders characterized by 0574 Variants of Proteins Which Accumulate an Ele either insufficient or excessive production of those nucleic ment/Compound acids or proteins, a failure in a biochemical pathway in which they normally participate in a cell, or other aberrant 0575 Variant proteins which their wild type version or unwanted activity relative to the wild type protein (e.g., naturally binds a certain compound or element inside the inappropriate enzymatic activity or unproductive protein cell. Such as for storage, may have therapeutic effect as folding). The proteins of the invention are useful in screen secreted variants. For example, Ferritin, accumulates iron ing for naturally occurring protein Substrates or other com inside the cells. A secreted variant of this protein is expected pounds (e.g., drugs) that modulate protein activity. The to bind plasma iron, reduce its levels to thereby have antibodies of the invention can also be used to detect and therapeutic effects in hemodisorders which are characterized isolate the proteins of the invention, to regulate their bio by high levels of free-iron in the blood. availability, or otherwise modulate their activity. Examplary uses, and the methods by which they can be achieved, are 0576 Autoantigens described in detail below. 0577 Autoantigens refer to “self proteins which evoke autoimmune response. Examples of autoantigens are listed 0583 Possible utilities for Variants of Drug Targets in Table 15, below. Secreted splice variants of such autoan 0584 Finding a variant of a known drug target can be tigens can be used to treat such autoimmune disorders. Since advantageous in cases where the known drug has a major autoimmune disorders are occasionally accompanied by side effect, the therapeutic efficacy of the known drug is different autoimmune manifestations (including but not lim medium, a known drug has failed clinical trials due to one ited to multiple endocrine syndromes, i.e., syndrome), the of the above. A drug which is specific to a new protein secreted variants of the present invention may treat these variant of the target or to the target only (without affecting US 2006/0068405 A1 Mar. 30, 2006

the novel variant) is likely to have lower side effects as paraneoplastic neurological diseases, cerebellar atrophy, compared to the original drug, higher therapeutic efficacy, paraneoplastic cerebellar atrophy, non-paraneoplastic stiff and broader or different range of activities. man syndrome, cerebellar atrophies, progressive cerebellar atrophies, encephalitis, Rasmussen's encephalitis, 0585 For example, COX3, which is a variant of COX1, armyotrophic lateral sclerosis, Sydeham chorea, Gilles de la is known to bind COX inhibitors in different affinity than Tourette Syndrome, polyendocrinopathies, autoimmune COX1. This molecule is also associated with different polyendocrinopathies Antoine J C. and Honnorat J. Rev physiological processes than COX1. Therefore, a compound Neurol (Paris) January 2000: 156 (1):23), neuropathies, specific to COX1 or compounds specific to COX3 would dysimmune neuropathies Nobile-Orazio E. et al., Electro have lower side effects (by not affecting the other variants), encephalogr. Clin Neurophysiol Suppl 1999:50:4193 neu and higher therapeutic efficacy to larger populations. romyotonia, acquired neuromyotonia, arthrogryposis multi 0586 Diseases That May be Treated/Diagnosed Using plex congenita Vincent A. et al., Ann NY Acad Sci. May the Teaching of the Present Invention 13, 1998:841 :482), cardiovascular diseases, cardiovascular autoimmune diseases, atherosclerosis Matsuura E. et al., 0587 Inflammatory Diseases Lupus. 1998:7 Suppl 2:S135), myocardial infarction Vaarala 0. Lupus. 1998:7 Suppl 2:S132), thrombosis Tin 0588. Examples of inflammatory diseases include, but are cani A. et al., Lupus 1998:7 Suppl 2:S107-9), granuloma not limited to, chronic inflammatory diseases and acute tosis, Wegener's granulomatosis, arteritis, Takayasu's arteri inflammatory diseases. tis and Kawasaki syndrome Praprotnik S. et al., Wien Klin 0589) Inflammatory Diseases Associated with Hypersen Wochenschr Aug. 25, 2000:112 (15-16):660), anti-factor sitivity VIII autoimmune disease Lacroix-Desmazes S. et al., Semin Thromb Hemost 2000:26 (2):157), vasculitises, 0590 Examples of hypersensitivity include, but are not necrotizing Small vessel vasculitises, microscopic polyangii limited to, Types I-IV hypersensitivity, immediate hypersen tis, Churg and Strauss syndrome, glomerulonephritis, pauci sitivity, antibody mediated hypersensitivity, immune com immune focal necrotizing glomerulonephritis, crescentic plex mediated hypersensitivity, T lymphocyte mediated glomerulonephritis Noel L. H. Ann Med Inteme (Paris). hypersensitivity and DTH. An example of type I or imme May 2000:151 (3):178), antiphospholipid syndrome Flam diate hyperSensitivity is asthma. Examples of type II hyper holz R. et al., J. Clin Apheresis 1999:14 (4): 171), heart sensitivity include, but are not limited to, rheumatoid dis failure, agonist-like B-adrenoceptor antibodies in heart fail eases, rheumatoid autoimmune diseases, rheumatoid ure Wallukat G. et al., Am J Cardiol. Jun. 17, 1999;83 arthritis Krenn V. et al., Histol Histopathol July 2000: 15 (12A):75H, thrombocytopenic purpura Moccia F. Ann Ital (3):791), spondylitis, ankylosing spondylitis Jan Voswinkel Med Int. April-June 1999:14. (2):114), hemolytic anemia, et al., Arthritis Res 2001; 3 (3): 189), systemic diseases, autoimmune hemolytic anemia Efremov D G. et. al., Leuk systemic autoimmune diseases, systemic lupus erythemato Lyrmphoma. January 1998:28 (3-4):285), gastrointestinal sus Erikson J. et al., Immunol Res 1998; 17 (1-2):49). diseases, autoimmune diseases of the gastrointestinal tract, sclerosis, systemic sclerosis Renaudineau Y. et al., Clin intestinal diseases, chronic inflammatory, intestinal disease Diagn Lab Immunol. March 1999;6(2):156; Chan OT. et al., Garcia Herola A. et al., Gastroenterol Hepatol. January Immunol Rev June 1999; 169:107), glandular diseases, glan 2000:23 (1):16), celiac disease Landau Y E. and Shoenfeld dular autoimmune diseases, pancreatic autoimmune dis Y. Harefuah Jan. 16, 2000: 138 (2): 122), autoimmune dis eases, diabetes, Type I diabetes Zimmet P. Diabetes Res eases of the musculature, myositis, autoimmune myositis, Clin Pract October 1996:34 Suppl:S125, thyroid diseases, Sjogren's syndrome Feist E. et al., IntArch Allergy Immu autoimmune thyroid diseases, Graves disease Orgiazzi J. nol September 2000:123 (1):92), smooth muscle autoim Endocrinol Metab Clin North Am June 2000:29 (2):339), mune disease Zauli D. et al., Biomed Pharmacother June thyroiditis, spontaneous autoimmune thyroiditis Braley 1999:53 (5-6):234. hepatic diseases, hepatic autoimmune Mullen H. and Yu S, J Immunol Dec. 15, 2001;165 (12):7262). Hashimoto's thyroiditis Toyoda N. et al. Nip diseases, autoimmune hepatitis Manns M P J Hepatol pon Rinsho August 1999:57 (8): 1810), myxedema, idio August 2000:33 (2):326 and primary biliary cirrhosis pathic myxedema Mitsuma T. Nippon Rinsho. August Strassburg C P. et al., Eur J. Gastroenterol Hepatol. June 1999:57 (8): 1759), autoimmune reproductive diseases, ova 1999; 11 (6):595). rian diseases, ovarian autoimmunity Garza K. M. et al., J 0591 Examples of type IV or T cell mediated- hypersen Reprod Immunol February 1998:37 (2):87), autoimmune sitivity, include, but are not limited to, rheumatoid diseases, anti-sperm infertility Diekman A. B. et al., Am J Reprod rheumatoid arthritis Tisch R, McDevitt H O. Proc Natl Immunol. March 2000:43 (3):134), repeated fetal loss Tin AcadSci U S AJan. 18, 1994:91 (2):437), systemic diseases, cani A. et al., Lupus 1998:7 Suppl 2:S107-9), neurodegen systemic autoimmune diseases, systemic lupus erythemato erative diseases, neurological diseases, neurological autoim sus Datta SK. Lupus 1998:7 (9):591), glandular diseases, mune diseases, multiple sclerosis Cross A H. et al., J glandular autoimmune diseases, pancreatic diseases, pancre Neuroimmunol Jan. 1, 2001:112 (1-2): 1) Alzheimer's dis atic autoimmune diseases, Type 1 diabetes Castano L. and ease Oron L. et al., J Neural Transm Suppl. 1997:49:77), Eisenbarth G. S. Ann. Rev. Immunol. 8:647), thyroid dis myasthenia gravis Infante A. J. And Kraig E. Int Rev eases, autoimmune thyroid diseases, Graves disease Immunol 1999; 18 (1-2):83), motor neuropathies Komberg Sakata S. et al., Mol Cell Endocrinol March 1993:92 A J. J. Clin Neurosci. May 2000:7 (3):191), Guillain-Barre (1):77), ovarian diseases Garza K. M. et al., J Reprod syndrome, neuropathies and autoimmune neuropathies Immunol February 1998:37 (2):87), prostatitis, autoimmune Kusunoki S. Am J Med Sci. April 2000:319 (4):234), prostatitis Alexander R B. et al., Urology December myasthenic- diseases, Lambert-Eaton myasthenic syndrome 1997:50 (6):893), polyglandular syndrome, autoimmune Takamori M. Am J Med Sci. April 2000:319 (4):204), polyglandular syndrome. Type I autoimmune polyglandular US 2006/0068405 A1 Mar. 30, 2006 44 syndrome Hara T. et al., Blood. Mar. 1, 1991:77 (5):1127), 0596. Examples of autoimmune rheumatoid diseases neurological diseases, autoimmune neurological diseases, include, but are not limited to rheumatoid arthritis Krenn V. multiple sclerosis, neuritis, optic neuritis Soderstrom M. et et al. Histol Histopathol July 2000:15 (3):791; Tisch R, al., J Neurol Neurosurg Psychiatry May 1994:57 (5):544), McDevitt HO. Proc Natl Acad Sci units SAJan. 18, 1994:91 myasthenia gravis Oshima M. et al., EurJImmunol Decem (2):437) and ankylosing spondylitis Jan VoSwinkel et al., ber 1990:20 (12):2563), stiff-man syndrome Hiemstra HS. Arthritis Res 2001: 3 (3): 189). et al., Proc Natl Acad Sci U S A 2001 Mar. 27, 2001:98 (7):3988), cardiovascular diseases, cardiac autoimmunity in 0597 Examples of autoimmune glandular diseases Chagas disease Cunha-Neto E. et al., J. Clin Invest Oct. 15, include, but are not limited to, autoimmune diseases of the 1996:98 (8): 1709), autoimmune thrombocytopenic purpura pancreas, Type 1 diabetes Castano L. and Eisenbarth G. S. Semple J. W. et al., Blood May 15, 1996:87 (10):4245), Ann. Rev. Immunol. 8:647; Zimmet P. Diabetes Res Clin anti-helper T lymphocyte autoimmunity Caporossi A P. et Pract October 1996:34 Suppl:S125), autoimmune thyroid al., Viral Immunol 1998; 11 (1):9), hemolytic anemia Sallah diseases, Graves disease Orgiazzi J. Endocnnol Metab Clin S. et al., Ann Hematol March 1997:74 (3):139), hepatic North Am June 2000:29 (2):339; Sakata S. et al., Mol Cell diseases, hepatic autoimmune diseases, hepatitis, chronic Endocrinol March 1993:92 (1):77), spontaneous autoim active hepatitis Franco A. et al., Clin Immunol Immuno mune thyroiditis Braley-Mullen H. and Yu S, J Immunol pathol March 1990:54 (3):382), biliary cirrhosis, primary 2000 Dec. 15:165 (12):7262), Hashimoto's thyroiditis biliary cirrhosis Jones D. E. Clin Sci (Colch) November Toyoda N. et al., Nippon Rinsho August 1999:57 (8): 1810), 1996:91 (5):551), nephric diseases, nephric autoimmune idiopathic myxedema Mitsuma T. Nippon Rinsho. August diseases, nephritis, interstitial nephritis Kelly CJ.JAm Soc 1999:57 (8): 1759), ovarian autoimmunity Garza K. M. et Nephrol August 1990:1 (2): 140), connective tissue diseases, al., J Reprod Immunol February 1998:37 (2):87), autoim ear diseases, autoimmune connective tissue diseases, mune anti-sperm infertility, autoimmune prostatitis and autoimmune ear disease Yoo T J. et al., Cell Immunol Type I autoimmune polyglandular syndrome. August 1994:157 (1):249), disease of the inner ear Gloddek 0598. Examples of autoimmune gastrointestinal diseases B. et al., Ann NY Acad Sci Dec. 29, 1997:830:266), skin include, but are not limited to, chronic inflammatory intes diseases, cutaneous diseases, dermal diseases, bullous skin tinal diseases Garcia Herola A. et al., Gastroenterol Hepa diseases, pemphigus Vulgaris, bullous pemphigoid and pem tol. January 2000:23 (1):16), celiac disease Landau Y E. phigus foliaceus. and Shoenfeld Y. Harefuah Jan. 16, 2000:138 (2):122), 0592 Examples of delayed type hypersensitivity include, colitis, ileitis and Crohn's disease and ulcerative colitis. but are not limited to, contact dermatitis and drug eruption. 0599 Examples of autoimmune cutaneous diseases include, but are not limited to, autoimmune bullous skin 0593. Autoimmune Diseases diseases, such as, but are not limited to, pemphigus Vulgaris, 0594 Examples of autoimmune diseases include, but are bullous pemphigoid and pemphigus foliaceus. not limited to, cardiovascular diseases, rheumatoid diseases, 0600 Examples of autoimmune hepatic diseases include, glandular diseases, gastrointestinal diseases, cutaneous dis but are not limited to, hepatitis, autoimmune chronic active eases, hepatic diseases, neurological diseases, muscular dis hepatitis Franco A. et al., Clin Immunol Immunopathol eases, nephric diseases, diseases related to reproduction, March 1990:54 (3):382), primary biliary cirrhosis Jones connective tissue diseases and systemic diseases. DE. Clin Sci (Colch) November 1996:91 (5):551; Strassburg 0595 Examples of autoimmune cardiovascular and blood C P. et al., Eur J. Gastroenterol Hepatol. June 1999; 11 diseases include, but are not limited to atherosclerosis Mat (6):595) and autoimmune hepatitis Manns M P J Hepatol suura E. et al., Lupus. 1998:7 Suppl 2:S135), myocardial August 2000:33 (2):326). infarction Vaarala O. Lupus. 1998:7 Suppl 2:S132), throm 0601 Examples of autoimmune neurological diseases bosis Tincani A. et al., Lupus 1998:7 Suppl 2:S107-9), include, but are not limited to, multiple sclerosis Cross A H. Wegener's granulomatosis, Takayasu's arteritis, Kawasaki et al., J Neuroimmunol Jan. 1, 2001:112 (1-2): 1) Alzhe syndrome Praprotnik S. et al., Wien Klin Wochenschr Aug. imer's disease Oron L. et al., J Neural Transm Suppl. 25, 2000: 112.(15-16):660), anti-factor VIII autoimmune dis 1997:49:77), myasthenia gravis Infante A. J. And Kraig E. ease Lacroix-Desmazes S. et al., Semin Thromb Int Rev Immunol 1999:18 (1-2):83; Oshima M. et al., Eur J Hemost.2000:26 (2):157), necrotizing small vessel vasculi Immunol December 1990:20 (12):2563), neuropathies, tis, microscopic polyangiitis, Churg and Strauss syndrome, motor neurbpathies Komberg A.J. J. Clin Neurosci. May pauci-immune focal necrotizing and crescentic glomerulo 2000:7 (3):191), Guillain-Barre syndrome and autoimmune nephritis Noel L. H. Ann Med Inteme (Paris). May 2000:151 neuropathies Kusunoki S. Am J Med Sci. April 2000:319 (3):178), antiphospholipid syndrome Flamholz R. et al., J (4):234), myasthenia, Lambert-Eaton myasthenic syndrome Clin Apheresis 1999; 14 (4): 171), antibody-induced heart Takamori M. Am J Med Sci. April 2000:319 (4):204), failure Wallukat G. et al., Am J Cardiol. Jun. 17, 1999;83 paraneoplastic neurological diseases, cerebellar atrophy, (12A):75H, thrombocytdpenic purpura Moccia F. Ann Ital paraneoplastic cerebellar atrophy and stiff-man syndrome Med Int. April-June 1999:14 (2):114: Semple J. W. et al., Hiemstra HS. et al., Proc Natl AcadSci units SA Mar. 27, Blood May 15, 1996:87 (10):4245), autoimmune hemolytic 2001;98 (7):3988), non-paraneoplastic stiff man syndrome, anemia Efremov D G. et al., Leuk Lymphoma January progressive cerebellar atrophies, encephalitis, Rasmussen's 1998:28 (3-4):285; Sallah S. et al., Ann Hematol March encephalitis, amyotrophic lateral Sclerosis, Sydehamn cho 1997:74 (3):139), cardiac autoimmunity in Chagas disease rea, Gilles de la Tourette syndrome and autoimmune poly Cunha-Neto E. et al., J. Clin Invest Oct. 15, 1996:98 endocrinopathies Antoine J C. and Honnorat J. Rev Neurol (8): 1709) and anti-helper T lymphocyte autoimmunity Ca (Paris) January 2000:156 (1):23), dysimmune neuropathies porossi A P. et al., Viral Immunol 1998; 11 (1):9). Nobile-Orazio E. et al., Electroencephalogr. Clin Neuro US 2006/0068405 A1 Mar. 30, 2006 physiol Suppl 1999:50:419), acquired neuromyotonia, lymphoma, Such as Birkitt's Non-Hodgkins; Lymphoctyic arthrogryposis multiplex congenita Vincent A. et al. Ann N leukemia, such as acute lumphoblastic leukemia. Chronic Y Acad Sci. May 13, 1998:841:482), neuritis, optic neuritis lymphocytic leukemia; Myeloproliferative diseases, such as Soderstrom M. et al., J Neurol Neurosurg Psychiatry May Solid tumors Benign Meningioma, Mixed tumors of salivary 1994:57 (5):544) multiple sclerosis and neurodegenerative gland, Colonic adenomas; Adenocarcinomas, such as Small diseases. cell lung cancer, Kidney, Uterus, Prostate, Bladder, Ovary, 0602 Examples of autoimmune muscular diseases Colon, Sarcomas, Liposarcoma, myxoid, Synovial sarcoma, include, but are not limited to, myositis, autoimmune myo Rhabdomyosarcoma (alveolar), Extraskeletel myxoid sitis and primary Sjogren's syndrome Feist E. et al., Int: chonodrosarcoma, Ewing's tumor; other include Testicular Arch Allergy Immunol September 2000:123 (1):92) and and ovarian dysgerminoma, Retinoblastoma, Wilms tumor, Smooth muscle autoimmune disease Zauli D. et al., Biomed Neuroblastoma, Malignant melanoma, Mesothelioma, Pharmacother June 1999:53 (5-6):234). breast, skin, prostate, and ovarian. 0.615 Thus, the nucleic acid sequences of the present 0603 Examples of autoimmune nephric diseases include, invention and the proteins encoded thereby and the cells and but are not limited to, nephritis and autoimmune interstitial antibodies described hereinabove can be used in, for nephritis Kelly C J. J. Am Soc Nephrol August example, Screening assays, therapeutic or prophylactic 1990;1(2): 140), glommerular nephritis. methods of treatment, or predictive medicine (e.g., diagnos 0604 Examples of autoimmune diseases related to repro tic and prognostic assays, including those used to monitor duction include, but are not limited to, repeated fetal loss clinical trials, and pharmacogenetics). Tincani A. et al., Lupus 1998:7 Suppl 2:S107-9). 0616) More specifically, the nucleic acids of the invention 0605 Examples of autoimmune connective tissue dis can be used to: (i) express a protein of the invention in a host eases include, but are not limited to, ear diseases, autoim cell (in culture or in an intact multicellular organism fol mune ear diseases Yoo T. J. et al., Cell Immunol August lowing, e.g., gene therapy, given, of course, that the tran 1994:157 (1):249) and autoimmune diseases of the inner ear Script in question contains more than untranslated Gloddek B. et al., Ann NY AcadSci 1997 Dec 29:830:266). sequence); (ii) detect an mRNA, or (iii) detect an alteration 06.06 Examples of autoimmune systemic diseases in a gene to which a nucleic acid of the invention specifically include, but are not limited to, systemic lupus erythematosus binds; or to modulate Such a gene’s activity. Erikson J. et al., Immunol Res 1998;17 (1-2):49) and 0617 The nucleic acids and proteins of the invention can systemic sclerosis Renaudineau Y. et al., Clin Diagn Lab also be used to treat disorders characterized by either insuf Immunol. March 1999;6 (2): 156; Chan OT. et al., Immunol ficient or excessive production of those nucleic acids or Rev June 1999; 169:107). proteins, a failure in a biochemical pathway in which they normally participate in a cell, or other aberrant or unwanted 0607) Infectious Diseases activity relative to the wild type protein (e.g., inappropriate 0608 Examples of infectious diseases include, but are enzymatic activity or unproductive protein folding). The not limited to, chronic infectious diseases, Subacute infec proteins of the invention are especially, useful in screening tious diseases, acute infectious diseases, viral diseases, bac for naturally occurring protein Substrates or other com terial diseases, protozoan diseases, parasitic-diseases, fungal pounds (e.g., drugs) that modulate protein activity. The diseases, mycoplasma diseases, and prion diseases. antibodies of the invention can also be used to detect and isolate the proteins of the invention, to regulate their bio 0609 Graft Rejection Diseases availability, or otherwise modulate their activity. These uses, 0610 Examples of diseases associated with transplanta and the methods by which they can be achieved, are tion of a graft include, but are not limited to, graft rejection, described in detail below. chronic graft rejection, Subacute graft rejection, hyperacute graft rejection, acute graft rejection, and graft versus host Screening Assays disease. 0618. The present invention provides methods (or 0611 Allergic Diseases 'screening assays”) for identifying agents (or “test com 0612 Examples of allergic diseases include, but are not pounds” that bind to or otherwise modulate (i.e., stimulate or limited to, asthma, hives, urticaria, pollen allergy, dust mite inhibit) the expression or activity of a nucleic acid of the allergy, venom allergy, cosmetics allergy, latex allergy, present invention or the protein it encodes. An agent may be, chemical allergy, drug allergy, insect bite allergy, animal for example, a small molecule Such as a peptide, peptido dander allergy, stinging plant allergy, poison ivy allergy and mimetic (e.g., a peptoid), an amino acid or an analog thereof, a polynucleotide or an analog thereof, a nucleotide or an food allergy. analog thereof, or an organic or inorganic compound (e.g., 0613 Cancerous Diseases a heteroorganic or organometallic compound) having a molecular weight less than about 10,000 (e.g., about 5,000, 0614 Examples of cancer include but are not limited to 1,000, or 500) grams per mole and salts, esters, and other carcinoma, lymphoma, blastoma, sarcoma, and leukemia. pharmaceutically acceptable forms of Such compounds. Particular examples of cancerous diseases but are not limited to: Myeloid leukemia Such as Chronic myelogenous leuke 0619 Agents identified in the screening assays can be mia. Acute myelbgenous leukemia with maturation. Acute used, for example, to modulate the expression or activity of promyelocytic leukemia, Acute nonlymphocytic leukemia the nucleic acids or proteins of the invention in a therapeutic with increased basophils, Acute monocytic leukemia. Acute protocol, or to discover more about the biological functions myelomonocytic leukemia with eosinophilia; malignant of the proteins. US 2006/0068405 A1 Mar. 30, 2006 46

0620. The assays can be constructed to screen for agents detected using fluorescence energy transfer (FET, see, e.g., that modulate the expression or activity of a protein of the U.S. Pat. Nos. 5,631,169 and 4,868,103). An FET binding invention or another cellular component with which it event can be conveniently measured-through fluorometric interacts. For example, where the protein of the invention is detection means well known in the art (e.g., by means of a an enzyme, the screening assay can be constructed to detect fluorimeter). Where analysis in real time is desirable, one agents that modulate either the enzyme’s expression or can examine the interaction (e.g., binding) between an agent activity or that of its substrate. The agents tested can be those and a protein of the invention with Biomolecular Interaction obtained from combinatorial libraries. Methods known in Analysis BIA, see, e.g., Sjolander and Urbaniczky Anal. the art allow the production and screening of biological Chem. 63:2338-2345, (1991) and Szabo et al., Curr. Opin. libraries; peptoid libraries i.e., libraries of molecules that Struct. Biol. 5:699-705, (1995). BIA allows one to detect function as peptides even though they have a non-peptide biospecific interactions in real time without labeling any of backbone that confers resistance to enzymatic degradation; the interactants (e.g., BIAcore). see, e.g., Zuckermann et al., J. Med. Chem. 37:2678-85, (1994); spatially addressable parallel solid phase or solution 0624 The screening assays can also be cell-free assays phase libraries; synthetic libraries requiring deconvolution; (i.e., soluble or membrane-bound forms of the proteins of “one-bead one-compound libraries; and synthetic libraries. the invention, including the variants, mutants, and other The biological and peptoid libraries can be used to test only fragments described above, can be used to identify agents peptides, but the other four are applicable to testing peptides, that bind those proteins or otherwise modulate their expres non-peptide oligomers or libraries of Small molecules Lam, sion or activity). The basic protocol is the same as that for Anticancer Drug Des. 12:145, (1997). Molecular libraries a cell-based assay in that, in either case, one must contact the can be synthesized as described by DeWitt et al. Proc. Natl. protein of the invention with an agent of interest for a Acad. Sci. USA 90:6909, (1993) Erb et al. Proc. Natl. Sufficient time and under appropriate (e.g., physiological) Acad. Sci. USA 91:11422, (1994). Zuckermann et al. J. conditions to allow any potential interaction to occur and Med. Chem. 37:2678, (1994) Cho et al. Science 261:1303, then determine whether the agent binds the protein or (1993) and Gallop et al. J. Med. Chem. 37:1233, (1994). otherwise modulates its expression or activity. 0621 Libraries of compounds may be presented in solu 0625 Those of ordinary skill in the art will, however, tion see, e.g., Houghten, Biotechniques 13:412-421. appreciate that there are differences between cell-based and (1992), or on beads Lam, Nature 354:82-84, (1991), chips cell-free assays. For example, when membrane-bound forms Fodor, Nature 364:555-556, (1993), bacteria or spores of the protein are used, it may be desirable to utilize a (U.S. Pat. No. 5,223,409), plasmids Cullet al., Proc Natl solubilizing agent (e.g., non-ionic detergents such as n-oc Acad Sci USA 89:1865-1869, (1992) or on phage Scott tylglucoside, n-dodecylglucoside, n-dodecylmaltoside, and Smith, Science 249:386-390, (1990); Devlin, Science octanoyl-N-methylglucamide, decanoyl-N-methylglucam 249:404–406, (1990); Cwirla et al., Proc. Natl. Acad. Sci. ide, Triton(R) X-100, Triton(R) X-1 14, ThesitR, Isotridecy USA 87:6378-6382, (1990); Felici, J. Mol. Biol. 222:301 poly(ethylene glycol ether), 3-(3-cholamidopropyl)dim ethylamrniinio-1-propane sulfonate (CHAPS), 3-(3- 310, (1991); and U.S. Pat. No. 5,223.409). cholamidopropyl)dimethlylamminio-2-hydroxy-1-propane 0622. The screening assay can be a cell-based assay, in sulfonate (CHAPSO), or N-dodecyl=N,N-dimethyl-3-am which case the screening method includes contacting a cell, monio-1-propane Sulfonate). that expresses a protein of the invention with a test com pound and determining the ability of the test compound to 0626. In the assays of the invention, any of the proteins modulate the protein's activit. The cell used can be a described herein or the agents being tested can be anchored mammalian cell, including a cell obtained from a human or to a solid phase or otherwise immobilized (assays in which from a human cell line. one of two substances that interact with one another are anchored to a solid phase are sometimes referred to as 0623 Alternatively, or in addition to examining the abil "heterogeneous' assays). For example, a protein of the ity of an agent to modulate expression or activity generally, present invention can be anchored to a mic(rotiter plate, a one can examine the ability of an agent to interact with, for test tube, a microcentrifuge tube, a column, or the like before example, to specifically bind to, a nucleic acid or protein of it is exposed to an agent. Any complex that forms on the the invention. For example, one can couple an agent (e.g., a solid phase is detected at the end of the period of exposure. substrate) to, a label (those described above, including For example, a protein of the present invention can be radioactive or enzymatically active Substances, are suitable), anchored to a solid Surface, and the test compound (which contact the nucleic acid or protein of the invention with the is not anchored and can be labeled, directly or indirectly) is labeled agent, and determine whether they bind one another added to the surface bearing the anchored protein. Un (by detecting, for example, a complex containing the nucleic reacted (e.g., unbound) components can be removed (by, acid or protein and the labeled agent). Labels are not, e.g., washing) under conditions that allow any complexes however, always required. For example, one can use a formed to remain immobilized on the solid surface, where microphysiometer to detect interaction between an agent and they can be detected (e.g., by virtue of a label attached to the a protein of the invention, neither of which were previously protein or the agent or with a labeled antibody that specifi labeled McConnell et al., Science 257: 1906-1912, (1992). cally binds an immobilized component and may, itself, be A microphysiometer (also known as a cytosensor) is an directly or indirectly labeled). analytical instrument that measures the rate at which a cell acidifies its environment. The instrument uses a light-ad 0627. One can immobilize either a protein of the present dressable potentiometric sensor (LAPS), and changes in the invention or an antibody to which it specifically binds to acidification rate indicate interaction between an agent and facilitate separation of complexed (or bound) protein from a protein of the invention. Molecular interactions can also be uncomplexed (or unbound), protein. Such immobilization US 2006/0068405 A1 Mar. 30, 2006 47 can also make it easier to automate the assay, and fusing the disrupt preformed complexes (by, e.g., displacing one of the proteins of the invention to heterologous proteins can facili components from the complex), can be added after a com tate their immobilization. For example, proteins fused to plex containing the gene product and its binding partner has glutathione-S-transferase can be adsorbed onto glutathione formed. sepharose beads (Sigma Chemical Co., St. Louis, Mo.) or 0633. The proteins of the invention can also be used as glutathione derivatized microtiter plates, then combined “bait proteins’ in a two- or three-hybrid assaysee, e.g., U.S. with the agent and incubated under conditions conducive to Pat. No. 5.283,317; Zervos et al., Cell 72:223-232, (1993); complex formation (e.g., conditions in which the salt and pH Madura et al., J. Biol. Chem. 268: 12046-12054, (1993); levels are within physiological levels). Following incuba Bartel et al. Biotechniques 14:920-924, (1993); Iwabuchi et tion, the Solid phase is washed to remove any unbound al., Oncogene 8:1693-1696, (1993); and WO 94/10300 to components (where the Solid phase includes beads, the identify other proteins that bind to (e.g., specifically bind to) matrix can be immobilized), the presence or absence of a or otherwise interact with a protein of the invention. Such complex is determined. Alternatively, complexes can be binding proteins can activate or inhibit the proteins of the dissociated from a matrix, and the level of protein binding invention (and thereby influence the biochemical pathways or activity can be determined using standard techniques. and events in which those proteins are active). 0628 Immobilization can be achieved with methods known in the art. For example, biotinylated protein can be 0634. As noted above, the screening assays of the inven prepared from biotin-NHS (N-hydroxy-succinimide) using tion can be used to identify an agent that inhibits the techniques known in the art (e.g., the biotinylation kit from expression of a protein of the invention by, for example, Pierce Chemicals, Rockford, Ill.) and immobilized in the inhibiting the transcription or translation of a nucleic acid wells of streptavidin-coated tissue culture plates (also from that encodes it. In these assays, one can contact a cell or cell free mixture with the agent and then evaluate mRNA or Pierce Chemical). protein expression relative to the levels that are observed in 0629. The screening assays of the invention can employ the absence of the agent (a statistically significant increase antibodies that react with the proteins of the invention but do in expression indicating that the agent stimulates mRNA or not interfere with their activity. These antibodies can be protein expression and a decrease (again, one that is statis derivatized to a solid surface, where they will trap a protein tically significant) indicating tat the agent inhibits mRNA or of the invention. Any interaction between a protein of the protein expression). Methods for determining levels of invention and an agent can then be detected using a second mRNA or protein expression are known in the art and, here, antibody that specifically binds the complex formed between would employ the nucleic acids, proteins, and antibodies of the protein of the invention and the agent to which it is the-present invention. bound. 0635) It should be noted that if desired, two or more of the 0630 Cell-free assays can also be conducted in a liquid methods described herein can be practiced together. For phase, in which case any reaction product can be separated example, one can evaluate an agent that was first identified (and thereby detected) by, for example: differential centrifu in a cell-based assay in a cell free assay. Similarly, and the gation (Rivas and Minton, Trends Biochem Sci 18:284-7, ability of the agent to modulate the activity of a protein of 1993); chromatography (e.g., gel filtration or ion-exchange the invention can be confirmed in vivo (e.g., in a transgenic chromatography); electrophoresis see, e.g., Ausubel et al., animal). Eds. Current Protocols in Molecular Biology, J. Wiley & Sons, New York, N.Y., (1999); or immunoprecipitation 0636. The screening methods of the present invention can see, e.g., Ausubel et al. (Supra); see also Heegaard, J. Mol. also be used to identify proteins (in the event transcripts of Recognit. 11:141-148, (1998) and Hage and Tweed, J. the present invention encode proteins) that are associated Chromatogr. Biomed. Sci. Appl. 699:499-525, (1997)). (e.g., causally) with drug resistance. One can then block the Fluorescence energy transfer (see above) can also be used, activity of these proteins (with, e.g., an antibody of the and is convenient because binding can be detected without invention) and thereby improve the ability of a therapeutic purifying the complex from Solution. Assays in which the agent to exert a desirable effect on a cell or tissue in a subject entire reaction of interest is carried out in a liquid phase are (e.g., a human patient). Sometimes referred to as homogeneous assays. 0637 Monitoring the influence of therapeutic agents 06.31 The screening methods of the invention can also be (e.g., drugs) or other events (e.g., radiation therapy) on the designed as competition assays in which an agent and a expression or activity of a biomolecular sequence of the substance that is known to bind a protein of the present present invention can be useful in clinical trials (a desired invention compete to bind that protein. Depending upon the extension of the screening assays described above). For order of addition of reaction components and the reaction example, agents that exert an effect by, in part, altering the conditions (e.g., whether the reaction is allowed to reach expression or activity of a protein of the invention ex vivo equilibrium), agents that inhibit complex formation can be can be tested for their ability to do so as the treatment distinguished from those that disrupt preformed complexes. progresses in a subject. Moreover, in animal or clinical trials, the expression or activity of a nucleic acid can be used, 0632. In either approach, the order in which reactants are optionally in conjunction with that of other genes, as a “read added can be varied to obtain different information about the out' or marker of the phenotype of a particular cell. agents being tested. For example, agents that interfere with the interaction between a gene product and one or more of Detection Assays its binding partners (by, e.g., competing with the binding partner), can be identified by adding the binding partner and 0638. The nucleic acid sequences of the invention can the agent to the reaction at about the same time. Agents that serve as polynucleotide reagents that are useful in detecting US 2006/0068405 A1 Mar. 30, 2006 48 a specific nucleic acid sequence. For example, one car can (such as blood, saliva, or semen) found at a crime scene can use the nucleic acid sequences of the present invention to be compared to a standard (e.g., sequences obtained and map the corresponding genes on a chromosome (and thereby amplified from a Suspect), thereby allowing one to deter discover which proteins of the invention are associated with mine whether the suspect is the source of the tissue or bodily genetic disease) or to identify an individual from a biologi fluid. cal sample (i.e., to carry out tissue typing, which is useful in criminal investigations and forensic science). The novel 0644. The nucleic acids of the invention, when used as transcripts of the present invention can be used to identify probes or primers, can target specific loci in the human those tissues or cells affected by a disease (e.g., the nucleic genome. This will improve the reliability of DNA-based acids of the invention can be used as markers to identify forensic identifications because the more identifying mark cells, tissues, and specific pathologies, such as cancer), and ers examined, the less likely it is that one individual will be to identify individuals who may have or be at risk for a mistaken for another. Moreover, tests that rely on obtaining particular cancer. Specific methods of detection are actual genomic sequence (which is possible here) are more described herein and are known to those of ordinary skill in accurate than those in which identification is based on the the art. patterns formed by restriction enzyme generated fragments. 0639 The nucleic acids of the present invention can be 0.645. The nucleic acids of the invention can also be used used to determine whether a particular individual is the to study the expression of the mRNAs in histological Source of a biological sample (e.g., a blood sample). This is sections (i.e., they can be used in in situ hybridization). This presently achieved by examining restriction fragment length approach can be useful when forensic pathologists are polymorphisms (RFLPs; U.S. Pat. No. 5,272,957), and the presented with tissues of unknown origin or when the purity sequences disclosed here are useful as additional DNA of a population of cells (e.g., a cell line) is in question. The markers for RFLP. For example, one can digest a sample of nucleic acids can also be used in diagnosing a particular an individual’s genomic DNA, separate the fragments (e.g. condition and in monitoring a treatment regime. by Southern blotting), and expose the fragments to probes generated from the nucleic acids of the present invention Predictive Medicine (methods employing restriction endonucleases are discussed 0.646. The nucleic acids, proteins, antibodies, and cells further below). If the pattern of binding matches that described hereinabove are generally useful in the field of obtained from a tissue of an unknown source, then the predictive medicine and, more specifically, are useful in individual is the source of the tissue. diagnostic and prognostic assays and in monitoring clinical 0640 The nucleic acids of the present invention can also trials. For example, one can determine whether a Subject is be used to determine the sequence of selected portions of an at risk of developing a disorder associated with a lesion in, individual’s genome. For example, the sequences that rep or the misexpression of a nucleic acid of the invention (e.g., resent new genes can be used to prepare primers that can be a cancer Such as pancreatic cancer, breast cancer, or a cancer used to amplify an individual's DNA and subsequently within the urinary system). In addition, the nucleic acids sequence it. Panels of DNA sequences (each amplified with expressed in tumor tissues and not in normal tissues are a different set of primers) can uniquely identify individuals markers that can be used to determine whether a subject has (as every person will have unique sequences due to allelic or is likely to develop a particular type of cancer. differences). 0647. The “subject” referred to in the context of any of 0641 Allelic variation occurs to some degree in the the methods of the present invention, is a vertebrate animal coding regions of these sequences, and to a greater degree in (e.g., a mammal Such as an animal commonly-used in the noncoding regions. Each of the sequences described experimental studies (e.g. rats, mice, rabbits and guinea herein can, to Some degree, be used as a standard against pigs); a domesticated animal (e.g., a dog or cat); an animal which DNA from an individual can be compared for iden kept as livestock (e.g., a pig, cow, sheep, goat, or horse); a tification purposes. Because greater numbers of polymor non-human primate (e.g. an ape, monkey, or chimpanzee); a phisms occur in the noncoding regions, fewer sequences are human primate; an avian (e.g., a chicken); an amphibian necessary to differentiate individuals. The noncoding (e.g., a frog); or a reptile. The animal can be an unborn sequences disclosed herein can provide positive individual animal (accordingly, the methods of the invention can be identification with a panel of perhaps 10 to 1,000 primers used to carry out genetic screening or to make prenatal which each yield a noncoding amplified sequence of 100 diagnoses). The Subject can also be a human. bases. If predicted coding sequences are used, a more 0648. The methods related to predictive medicine can appropriate number of primers for positive individual iden also be carried out by using a nucleic acid of the invention tification would be 500-2,000. to, for example detect, in a tissue of a Subject: (i) the presence or absence of a mutation that affects the expression 0642) If a panel of reagents from the nucleic acids of the corresponding gene (e.g., a mutation in the 5' regu described herein is used to generate a unique identification latory region of the gene); (ii) the presence or absence of a database for an individual, those same reagents can later be mutation that alters the structure of the corresponding gene; used to identify tissue from that individual. Using the (iii) an altered level (i.e., a non-wild type level) of mRNA of database, the individual, whether still living or dead, can the corresponding gene (the proteins of the invention can be Subsequently be linked to even very small tissue samples. similarly used to detect an altered level of protein expres 0643 DNA-based identification techniques, including sion); (iv) a deletion or addition of one or more nucleotides those in which small samples of DNA are amplified (e.g., by from the nucleic acid sequences of the present invention; (v) PCR) can also be used in forensic biology. Sequences a Substitution of one or more nucleotides in the nucleic acid amplified from tissues (such as hair or skin) or body fluids sequences of the present invention (e.g., a point mutation); US 2006/0068405 A1 Mar. 30, 2006 49

(vi) a gross chromosomal rearrangement (e.g., a transloca tion can be identified in two-dimensional arrays containing tion, inversion, or deletion); or (vii) aberrant modification of light-generated DNA probes Croninet al., Human Mutation a gene corresponding to the nucleic acid sequences of the 7:244-255, (1996). Briefly, when a light-generated DNA present invention (e.g., modification of the methylation probe is used, a first array of probes is used to scan through pattern of the genomic DNA). Similarly, one can test for long stretches of DNA in a sample and a control to identify inappropriate post-translational modification of any protein base changes between the sequences by making linear arrays encoded. Abnormal expression or abnormal gene or protein of sequential overlapping probes. This step allows the iden structures indicate that the subject is at risk for the associated tification of point mutations, and it can be followed by use disorder. of a second array that allows the characterization of specific mutations by using Smaller, specialized probe arrays 0649. A genetic lesion can be detected by, for example, complementary to all variants or mutations detected. Each providing an oligonucleotide probe or primer having a mutation array is composed of parallel probe sets, one sequence that hybridizes to a sense or antisense Strand of a complementary to the wild-type gene and the other comple nucleic acid sequence of the present invention, a naturally mentary to the mutant gene. Arrays are discussed further occurring mutant thereof, or the 5' or 3' sequences that are naturally associated with the corresponding gene, and below; see also: Kozal et al. Nature Medicine 2:753-759, exposing the probe or primer to a nucleic acid within a tissue (1996). of interest (e.g., a tumor). One can detect hybridization 0653. The level of an mRNA in a sample can also be between the probe or primer and the nucleic acid of the evaluated with a nucleic acid amplification technique. e.g., tissue by standard methods (e.g., in situ hybridization) and RT-PCR (U.S. Pat. No. 4,683.202), ligase chain reaction thereby detect the presence or absence of the genetic lesion. LCR; Barany, Proc. Natl. Acad. Sci. USA 88:189-193, Where the probe or primer specifically hybridizes with a (1991); LCR can be particularly useful for detecting point new splice variant, the probe or primer can be used to detect mutations), self sustained sequence replication Guatelli et a non-wild type splicing pattern of the mRNA. The anti al., Proc. Natl. Acad. Sci. USA 87: 1874-1878, (1990)), bodies of the invention can be similarly used to detect the transcriptional amplification system Kwoh et al., Proc. Natl. presence or absence of a protein encoded by a mutant, Acad. Sci. USA 86:1173-1177, (1989), Q-Beta Replicase mis-expressed, or otherwise deficient gene. Diagnostic and Lizardi et al., Bio/Technology 6:1197. (1988), or rolling prognostic assays are described further below. circle replication (U.S. Pat. No. 5,854,033). Following amplification, the nucleic acid can be detected using tech 0650 Qualitative or quantitative analyses (which reveal niques known in the art. Amplification primers are a pair of the presence or absence of a substance or its level of nucleic acids that anneal to 5' or 3' regions of a gene (plus expression or activity, respectively) can be carried out for and minus Strands, respectively, or Vice-versa) at Some any one of the nucleic acid sequences of the present inven distance (possibly a short distance) from one another. For tion, or (where the nucleic acid encodes a protein) the example, each primer can consist of about 10 to 30 nucle proteins they encode, by obtaining a biological sample from otides and bind to sequences that are about 50 to 200 a subject and contacting the sample with an agent capable of nucleotides apart. Serial analysis of gene expression can be specifically binding a nucleic acid represented by the nucleic used to detect transcript levels (U.S. Pat. No. 5,695.937). acid sequences of the present invention or a protein those Other useful amplification techniques (useful in, for nucleic acids encode. The conditions in which contacting is example, detecting an alteration in a gene) include anchor performed should allow for specific binding. Suitable con PCR, real-time PCR or RACE PCR. ditions are known to those of ordinary skill in the art. The biological sample can be a tissue, a cell, or a bodily fluid 0654) Mutations in the gene sequences of the invention (e.g., blood or serum), which may or may not be extracted can also be identified by examining alterations in restriction from the Subject (i.e., expression can be monitored in vivo). enzyme cleavage-patterns. For example, one can isolate DNA from a sample cell or tissue and a control, amplify if 0651 More specifically, the expression of a nucleic acid (if necessary), digest it with one or more restriction endo sequence can be examined by, for example, Southern or nucleases, and determine the length(s) of the fragment(s) Northern analyses, polymerase chain reaction analyses, or produced (e.g., by gel electrophoresis). If the size of the with probe arrays. For example, one can diagnose a condi fragment obtained from the sample is different from the size tion associated with expression or mis-expression of a gene of the fragment obtained from the control, there is a muta by isolating mRNA from a cell and contacting the mRNA tion in the DNA in the sample tissue. Sequence specific with a nucleic acid probe with which it can hybridize under ribozymes (see, for example, U.S. Pat. No. 5,498.531) can stringent conditions (the characteristics of useful -probes are be used to detect specific mutations by development or loss known to those of ordinary skill in the art and are discussed of a ribozyme cleavage site. elsewhere herein). The mRNA can be immobilized on a Surface (e.g., a membrane, such as nitrocellulose or other 0655 Any sequencing reaction known in the art (includ commercially available membrane) following gel electro ing those that are automated) can also be used to determine phoresis. whether there is a mutation, and, if so, how the mutant differs from the wild type sequence. Mutations can also be 0652 Alternatively, one or more nucleic acids (the target identified by using cleavage agents to detect mismatched sequence or the probe) can be distributed on a two-dimen bases in RNA/RNA or RNA/DNA duplexes Myers et al., sional array (e.g., a gene chip). Arrays are useful in detecting Science 230:1242, (1985); Cotton et al., Proc. Natl. Acad. mutations because a probe positioned on the array can have Sci. USA 85.4397, (1988); Saleeba et al., Methods Enzymol. one or more mismatches to a nucleic acid of the invention 217:286-295, (1992). Mismatch cleavage reactions employ (e.g., a destabilizing mismatch). For example, genetic muta one or more proteins that recognize mismatched base pairs tions in any of nucleic acid sequences of the present inven in double-stranded DNA (so called “DNA mismatch repair