US 2011 0091454A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0091454 A1 Diber et al. (43) Pub. Date: Apr. 21, 2011

(54) METHODS AND SYSTEMS FOR Beck, Kfar Saba (IL); Wei-Yong ANNOTATING BOMOLECULAR Zhu, Plainsboro, NJ (US); Alon SEQUENCES Wasserman, New York, NY (US); Chen Chermesh, Mishmar (76) Inventors: Alex Diber, Rishon-LeZion (IL); HaShiva (IL); Idit Azar, Tel-Aviv Sarah Pollock, Tel-Aviv (IL); Zurit (IL); Rotem Sorek, Rechovot (IL); Levine, Herzlia (IL); Sergey Jeanne Bernstein, Kfar Yona (IL) Nemzer, RaAnana (IL); Vladimir Grebinskiy, Highland Park, NJ (21) Appl. No.: 12/701,651 (US); Brian Meloon, Plainsboro, NJ (US); Andrew Olson, (22) Filed: Feb. 8, 2010 Northport, NY (US): Avi Related U.S. Application Data Rosenberg, Kfar Saba (IL); Ami Haviv, Hod-HaSharon (IL); Shaul (63) Continuation of application No. 11/043,860, filed on Zevin, Mevaseret Zion (IL); Tomer Jan. 27, 2005. Zekharia, Givataim (IL); Zipi (60) Provisional application No. 60/539,129, filed on Jan. Shaked, Tel-Aviv (IL); Moshe 27, 2004. Olshansky, Haifa (IL); Ariel Farkash, Haifa (IL); Eyal Publication Classification Privman, Tel-Aviv (IL); Amit (51) Int. Cl. Novik, Beit-YeHoshua (IL); Naomi A 6LX 39/395 (2006.01) Keren, Givat Shmuel (IL); Gad S. C07K 6/00 (2006.01) Cojocaru, Ramat-HaSharon (IL); GOIN 33/53 (2006.01) Pinch as Akiva, Ramat-Gan (IL); Yossi Cohen, Woking (GB); Ronen A6IP35/00 (2006.01) Shemesh, Modiin (IL); Osnat (52) U.S. Cl...... 424/133.1: 530/387.9; 424/139.1; Sella-Tavor, Kfar-Kish (IL); Liat 530/387.3; 436/501 Mintz, East Brunswick, NJ (US); (57) ABSTRACT Hanging Xie, Lambertville, NJ (US); Dvir Dahary, Tel-Aviv (IL); Polypeptide sequences and polynucleotide sequences are Erez. Levanon, Petach-Tikva (IL); provided. Also provided are annotative information concern Shiri Freilich, Haifa (IL); Nili ing Such sequences and uses for these sequences. O

Externa ser 4. Database interface

Processing it.

Application VOdie Database Patent Application Publication Apr. 21, 2011 Sheet 1 of 37 US 2011/0091454 A1

4.

n Processing it.

Application / 18 Module

4.

/ Processing init

8 Patent Application Publication Apr. 21, 2011 Sheet 2 of 37 US 2011/0091454 A1

~NN

A | ~N

~__)~~ Z‘61–

\|/ N |× \/ Patent Application Publication Apr. 21, 2011 Sheet 3 of 37 US 2011/0091454 A1

of 1.1 epithelial ceili tumors ... Carciorra ... alien O'CarClona ... 1.2 obiliar Carcinona 1.2 Fiese cfia ceili tutors 2. SarCOf a .2.1.1 iposarcoma 1.2.1.2 rhabdomyosarconna 1.2.1.3 pnet 1.2.1.4 ewing sarcoma 1.3 food it fors 1.3.3 lymphoma .3.2 leukemia 1.3.3 myeloma 1.4 endocrine fiors 1.4.1 pheocromocytoma 4.2 Carci Cic

2 endocrine syster 2. a dreaf 2.1.1 pheocroinocytona 2.2 pancreas 2.2.1 islets of langerhans 2.3 regroeidocrine 2.3.1 hypothalamus 23.2 Ca"Croid 2.4 thyroid

3 Wascar tissue 3. f arteries 3. ... acia 3.2 Veii. Fig. 4. Patent Application Publication Apr. 21, 2011 Sheet 4 of 37 US 2011/0091454 A1

4 genitorinary syster 4. urinary systern 4.1 biadier 4.1.2 kidney 4.2 genital systern 4.2.1 women genital system 4.2.1 Cervix 4.2.1.2. Oway 4.2.1.3 liters 4.2.1.3.1 erado retriuirr 4.2.2 men gentile system 4.2.2.1 prostate 4.2.2.2 testis 4.2.2.2.1 epididymis

5 scies 5.1 rhabdomyosarconna 5.2 tongue 5.3 after .4 fearf 5. Liters

Occi 6.1 peripheral blood 6.1.1 erythroid ine 6, i.2 leukocyte 6.1.2.1 lymphoid system 6.1.2.1. 1 lymphoma 6.1.2.1.2 spleen 6.1.2.1.3 thaiaritis 6.2 step ces 6.2.1 myeloid 6.2.2 yeoma 6.3 Borge arrow Fig. 4 (Cont.) 6.4 feifreria Patent Application Publication Apr. 21, 2011 Sheet 5 of 37 US 2011/0091454 A1

7 nerve system 7.1 CNS, central nervous system 7 brai 7... Ceer 7.1.2 Cerebe in 7.1.1.3 pitituary gland 7.14 hypothalamus 7.1.1.5 ha anus 7.1.16 olfactory 7.1.1.7 hippocampus 7.1.1.8 amygdala 7. 9 for a de 7.1.1.10 pnet 7.2 Embryonal nerve system 7.2.1 primitive neuroectodern 7.3 retiria 7.3.1 retiricasionna

8 reast 8. du cital breas: 8.1.1. dicta Carcircra 8.2 ou lar carcia Opra 8.3 mammary

skeeter 9. One 9. i. ewing sarcoma 9.1.2 craniofacia 9, .2. Cavariri 9.2 corrective tiss Lie 9.2.1 trabeculae 9.2.2 cartilage Fig. 4 (Cont.) Patent Application Publication Apr. 21, 2011 Sheet 6 of 37 US 2011/0091454 A1 embryo O. arrior .2 cric 10.3 primitive neuroectodern 10.4 piacenta

exocrine system 11.1 pancreas 11.1.1 islets of langerhans 11.2 prostate 11.3 saivary gland

12 face organs 2. rose 12.2 ear 2.2. COcea 12.3 eye 2.3. retira 2.3. retiricastoria 2.3.2 ens 12.4 riotti 12.5 tongue

3 gastrointestina syster 13. 1 frucOSa 13.2 Stoira Ci 13.3 iriesiiie 3.3. COO recta 3.3. CCC 3.4 hepatobiliary systern 3.4 live 13.4.2 biliary system 13.4.2.1 gai bladder 13.5 pancreas 13.5.1 islets of langerhans Fig. 4 (Cont.) Patent Application Publication Apr. 21, 2011 Sheet 7 of 37 US 2011/0091454 A1

4 respiratory system 14.1 nasopharynx 14.2 lung 14.2.1 sma cellung carcinoma

Sk 1.1 certis 15.1. melanocyte

6 fat tissue 16.1 iposarcona Fig. 4 (Cont.) Patent Application Publication Apr. 21, 2011 Sheet 8 of 37 US 2011/0091454 A1

|09 |0+ Foz e6eueoja u Oopeiae now Patent Application Publication Apr. 21, 2011 Sheet 9 of 37 US 2011/0091454 A1

80eds Jeyneoexe Jenieoexe auleoidetie-dueli je feu

ujeoideuerqueueused effieu

use doejonu

euejqueu jeepnu menu auOSIXOjed t 3ueiguelii et u e pu030.

8Leigueueipt Otool

XLei eLu(OOL UOLuO3Ou needde fog

unno.8. Oil Seidopte

OSOAQ JO)883;SCAO u08 (

JoelexSOA) LSedoA) 9. l

u:80i/suo go equir N

Patent Application Publication Apr. 21, 2011 Sheet 12 of 37 US 2011/0091454 A1

Œ.È $DukeD?ö(5£.DukeC {}Tõ§7,5 §82- -º* $DukeB genomicDNA ºB ?ë,,DukeA adenocarcinoma pancreas breas? ?jver testis lung heart ovary thymus sp?een kidney p?acen?a stornach brain

puuduI04800uðþýtrººfds ? Patent Application Publication Apr. 21, 2011 Sheet 13 of 37 US 2011/0091454 A1

(s s C S2 . -- S; C s 2 O Ewing Ost San is2 i 2 < : Saf Corra Cei-lines si CO ag C. c. is Celinetai I S 5 () a

: &

8 & 8 & S. 8 & & &SS &: S SS Fig. 9

. Patent Application Publication Apr. 21, 2011 Sheet 14 of 37 US 2011/0091454 A1

~~~~~~<^^^

Ieulou?oleosoulenbsol#

Fig. 11 Patent Application Publication Apr. 21, 2011 Sheet 15 of 37 US 2011/0091454 A1

%

NAOZ-9C(tudoN6unºiC# (Og)etuou?oueosnotu?enbS){

Fig. 12 Patent Application Publication Apr. 21, 2011 Sheet 16 of 37 US 2011/0091454 A1

A?etuou?ojeo6un";SN?# (Og)etuou?ojeosnotu?enbS,{{ e?ou?ojeoso?uenbS6#

Fig. 13

Patent Application Publication Apr. 21, 2011 Sheet 18 of 37 US 2011/0091454 A1

eg|(61-) OGI‘61–qGL'61)

Patent Application Publication Apr. 21, 2011 Sheet 19 of 37 US 2011/0091454 A1

Patent Application Publication Apr. 21, 2011 Sheet 20 of 37 US 2011/0091454 A1

S.--

sss s x * x

s'

s Patent Application Publication US 2011/0091454 A1

Patent Application Publication Apr. 21, 2011 Sheet 23 of 37 US 2011/0091454 A1

Transcript 1 -- -2- Transcript 2-1-

Fig. 17

GCSF Stricture

Patent Application Publication Apr. 21, 2011 Sheet 24 of 37 US 2011/0091454 A1

-3..>---,----

EfnòIN?AON-(Z.T}+SO3)) Patent Application Publication Apr. 21, 2011 Sheet 25 of 37 US 2011/0091454 A1

Patent Application Publication Apr. 21, 2011 Sheet 26 of 37 US 2011/0091454 A1

s :

s i

SSY SYY - Patent Application Publication Apr. 21, 2011 Sheet 27 of 37 US 2011/0091454 A1

SS% %

.aes,el

S SN

Patent Application Publication US 2011/0091454 A1

Patent Application Publication Apr. 21, 2011 Sheet 30 of 37 US 2011/0091454 A1

?|(WA ÅS 6???ioddnssiSE (£?)que?ueANA?U eOz(61-)

S Patent Application Publication Apr. 21, 2011 Sheet 31 of 37 US 2011/0091454 A1

O N qOz(61-) g i

Patent Application Publication Apr. 21, 2011 Sheet 33 of 37 US 2011/0091454 A1 SNs%2%

?OZ(61-)

SSSSSSSSS ww. s r (s e C s na %S Patent Application Publication Apr. 21, 2011 Sheet 34 of 37 US 2011/0091454 A1

US 2011/0091454 A1 Apr. 21, 2011

METHODS AND SYSTEMIS FOR 0010. To generate meaningful amounts of data, however, ANNOTATING BIOMOLECULAR high throughput template preparation, sequencing and analy SEQUENCES sis protocols must be applied. As such, the number of new genes identified as well as the statistical significance of the RELATIONSHIP TO EXISTING APPLICATIONS data is proportional to the number of clones sequenced as well as the complexity of the tissue being analyzed Adams et al. 0001. The present application claims priority as a Con (1995) Nature 377:3-173; Hillier et al. (1996) Genome Res. tinuation application from U.S. patent application Ser. No. 6:807-828. 11/043,860, filed on Jan. 27, 2005, which claims priority from 0011 Subtractive cloning Subtractive cloning offers an U.S. Provisional Patent Application No. 60/539,129 filed Jan. inexpensive and flexible alternative to EST sequencing and 27, 2004, the contents of all of which are hereby incorporated cDNA array hybridization. In this approach, double-stranded by reference. cDNA is created from the two-cell or tissue populations of interest, linkers are ligated to the ends of the cDNA fragments FIELD AND BACKGROUND OF THE and the cDNA pools are then amplified by PCR. The cDNA INVENTION pool from which unique clones are desired is designated the 0002 The present invention relates to systems and meth “tester, and the cDNA pool that is used to subtract away ods useful for annotating biomolecular sequences. More par shared sequences is designated the “driver'. Following initial ticularly, the present invention relates to computational PCR amplification, the linkers are removed from both cDNA approaches, which enable systemic characterization of bio pools and unique linkers are ligated to the tester sample. The molecular sequences and identification of differentially tester is then hybridized to a vast excess of driver DNA and expressed biomolecular sequences Such as sequences associ sequences that are unique to the tester cDNA pool are ampli ated with a pathology. fied by PCR. 0003. In the post-genomic era, data analysis rather than 0012. The primary limitation of subtractive methods is data collection presents the biggest challenge to biologists. that they are not always comprehensive. The cDNAs identi Efforts to ascribe biological meaning to genomic data, fied are typically those, which differ significantly in expres whether by identification of function, structure or expression sion level between cell-populations and Subtle quantitative pattern are lagging behind sequencing efforts Boguski MS differences are often missed. In addition each experiment is a (1999) Science 286:453-455. pair wise comparison and since Subtractions are based on a 0004. It is well recognized that elucidation of spatial and series of sensitive biochemical reactions it is difficult to temporal patterns of gene expression in healthy and diseased directly compare a series of RNA samples. states may contribute immensely to further understanding of 0013 Differential display—Differential display is another disease mechanisms. PCR-based differential cloning method Liang and Pardee 0005. Therefore, any observational method that can rap (1992) Science 257:967-70; Welsh et al. (1992) Nucleic idly, accurately and economically observe and measure the Acids Res. 20:4965-70. In classical differential display, pattern of expression of selected individual genes or of whole reverse transcription is primed with either oligo-dT or an genomes is of great value to scientists. arbitrary primer. Thereafter an arbitrary primer is used in 0006. In recent years, a variety of techniques have been conjunction with the reverse transcription primer to amplify developed to analyze differential gene expression. However, cDNA fragments and the cDNA fragments are separated on a current observation and measurement methods are inaccu polyacrylamide gel. Differences in gene expression are visu rate, time consuming, labor intensive or expensive, often alized by the presence or absence of bands on the gel and times requiring complex molecular and biochemical analysis quantitative differences in gene expression are identified by of numerous gene sequences. differences in the intensity of bands. Adaptation of differen 0007 For example, observation methods for individual tial display methods for fluorescent DNA sequencing mRNA or cDNA molecules such as Northern blot analysis, machines has enhanced the ability to quantify differences in RNase protection, or selective hybridization to arrayed cDNA gene expression Kato (1995) Nucleic Acids Res. 18:3685 libraries see Sambrook et al. (1989) Molecular cloning. A 90. laboratory manual, Cold Spring Harbor press, NY depend on 0014. A limitation of the classical differential display specific hybridization of a single oligonucleotide probe approach is that false positive results are often generated complementary to the known sequence of an individual mol during PCR or in the process of cloning the differentially ecule. Since a single human cell is estimated to express expressed PCR products. Although a variety of methods have 10,000-30,000 genes Liang et al. (1992) Science 257:967 been developed to discriminate true from false positives, 971, single probe methods to identify all sequences in a these typically rely on the availability of relatively large complex sample are ineffective and laborious. amounts of RNA. 0008. Other approaches for high throughput analysis of 0015 Serial analysis of gene expression (SAGE)—this differential gene expression are Summarized infra. DNA sequence based method is essentially an accelerated 0009 EST sequencing The basic idea is to create cDNA version of EST sequencing Valculescu et al. (1995) Science libraries from tissues of interest, pick clones randomly from 270:484-8. In this method a digestible unique sequence tag these libraries and then perform a single sequencing reaction of 13 or more bases is generated for each transcript in the cell from a large number of clones. Each sequencing reaction or tissue of interest, thereby generating a SAGE library. generates 300 base pairs or so of sequence that represents a 0016 Sequencing each SAGE library creates transcript unique sequence tag for a particular transcript. An EST profiles. Since each sequencing reaction yields information sequencing project is technically simple to execute since it for twenty or more genes, it is possible to generate data points requires only a cDNA library, automated DNA sequencing for tens of thousands of transcripts in modest sequencing capabilities and standard bioinformatics protocols. efforts. The relative abundance of each gene is determined by US 2011/0091454 A1 Apr. 21, 2011 counting or clustering sequence tags. The advantages of interest, the contig being assembled from a plurality of SAGE over many other methods include the high throughput expressed sequences; and (b) comparing a number of the that can be achieved and the ability to accumulate and com plurality of expressed sequences corresponding to the contig pare SAGE tag data from a variety of samples, however the which are expressed in each of the at least two tissue types, to technical difficulties concerning the generation of good thereby compare the expression level of the gene of interest in SAGE libraries and data analysis are significant. the at least two tissue types. 0017 Altogether, it is clear from the above that laboratory 0026. According to further features in preferred embodi bench approaches are ineffective, time consuming, expensive ments of the invention described below, the method further and often times inaccurate in handling and processing the vast comprises computationally aligning sequences expressed in amount of genomic information which is now available. each of the at least two types of tissue with the contig to 0018. It is appreciated, that much of the analysis can be thereby identify the expressed sequences corresponding to effected by developing computational algorithms, which can the contig prior to (b). be applied to mining data from existing databases, thereby 0027. According to yet another aspect of the present retrieving and integrating valuable biological information. invention there is provided a method of comparing an expres 0019. To date, there are more than a hundred major bio sion level of at least two splice variants of a gene of interest in molecule databases and application servers on the Internet a tissue, the method comprising: (a) obtaining a contig having and new sites are being introduced at an ever-increasing rates exonal sequences of the at least two splice variants of the gene Ashburner and Goodman (1997) Curr. Opin. Genet. Dev. of interest, the contig being assembled from a plurality of 7:750-756; Karp (1998) Trends Biochem. Sci. 23:114-116. expressed sequences; (b) identifying at least one contig 0020. However, these databases are organized in sequence region unique to one of the at least two splice extremely heterogeneous formats. These reflect the inherent variants of the gene of interest; and (c) comparing a number of complexity of biological data, ranging from plain-text nucleic the plurality of expressed sequences in the tissue having the at acid and protein sequences, through the three dimensional least one contig sequence region with a number of the plural structures of therapeutic drugs and macromolecules and high ity of expressed sequences not-having the at least one contig resolution images of cells and tissues, to microarray-chip sequence region, to thereby compare the expression level of outputs. Moreover data structures are constantly evolving to the at least two splice variants of the gene of interest in the reflect new research and technology development. tissue. 0021. The heterogeneous and dynamic nature of these bio 0028. According to still further features in the described logical databases present major obstacles in mining data rel preferred embodiments the plurality of expressed sequences evant to specific biological queries. Clearly, simple retrieval present complete exonal coverage of the gene of interest. of data is not sufficient for data mining; efficient data retrieval 0029. According to still further features in the described requires flexible data manipulation and Sophisticated data preferred embodiments the plurality of expressed sequences integration. Efficient data retrieval requires the use of com present partial exonal coverage of the gene of interest. plex queries across multiple heterogeneous data sources; data 0030. According to still further features in the described warehousing by merging data derived from multiple public preferred embodiments the obtaining the contig is effected by Sources and local (i.e., private) sources; and multiple data a sequence assembly software. analysis procedures that require feeding Subsets of data 0031. According to still further features in the described derived from different sources into various application pro preferred embodiments the method further comprising scor grams for gene finding, protein-structure prediction, func ing each of the plurality of the expressed sequences prior to tional domain or motif identification, phylogenetic tree con (c), wherein the scoring is effected according to: struction, graphic presentation and so forth. 0032 (i) expression level of each of the plurality of the 0022. Current biological data retrieval systems are not expressed sequences; and fully up to the demand of smooth and flexible data integration 0033 (ii) a quality of each of the plurality of the expressed Etzold et al. (1996) Methods Enzymol266:t14-t28; Schuler Sequences: et al. (1996) Methods Enzymol. 266:141-162; Chung and 0034. According to still further features in the described Wong (1999) Trends Biotech. 17:351-355). preferred embodiments comparing is effected using statisti 0023 There is thus a widely recognized need for, and it cal pairing analysis. would be highly advantageous to have, systems and methods 0035. According to still further features in the described which can be used for efficient retrieval and processing of preferred embodiments the statistical pairing analysis is data from biological databases thereby enabling annotation of Fisher exact test. previously un-annotated biomolecular sequences. 0036. According to still further features in the described preferred embodiments the tissue is selected from the group SUMMARY OF THE INVENTION consisting of a tissue of a pathological origin of interest, a 0024. According to one aspect of the present invention tissue of a cellular composition of interest. there is provided a computer readable storage medium, com 0037 According to still further features in the described prising a database stored in a retrievable manner, the database preferred embodiments the method further comprising com including biomolecular sequence information as set forth in paring the number of the plurality of expressed sequences in files “Transcripts.gz’, and/or “Proteins.gz' of enclosed CD the tissue having the at least one contig sequence region with ROM4, and biomolecular sequence annotations, as set forth a number of the plurality of expressed sequences of the con in file Annotations.gz' of enclosed CD-ROM4. tig. 0025. According to another aspect of the present invention 0038 According to still another aspect of the present there is provided a method of comparing an expression level invention there is provided a computer readable storage of a gene of interest in at least two types of tissues, the method medium comprising data stored in a retrievable manner, the comprising: (a) obtaining a contig representing the gene of data including sequence information of differentially US 2011/0091454 A1 Apr. 21, 2011 expressed mRNA sequences as set forth in files “Transcripts. expressed sequences in each of the at least two tissue types, to gz’, and/or “Proteins.gz' of enclosed CD-ROM4, and thereby generate the database of differentially expressed sequence annotations as set forth in annotation categories genes. “HTS”, “HTAA’ and/or “HTAAT', in the file “Annotations. 0050. According to still an additional aspect of the present gz” of enclosed CD-ROM4. invention there is provided an isolated polynucleotide com prising a nucleic acid sequence being at least 80% identical to 0039. According to still further features in the described a nucleic acid sequence of the sequences set forth in file preferred embodiments the database further includes infor “Transcripts.gz' of the enclosed CD-ROM4. mation pertaining to generation of the data and potential uses 0051. According to still further features in the described of the data. preferred embodiments the nucleic acid sequence is set forth 0040. According to still further features in the described in the file “Transcripts.gz' of the enclosed CD-ROM4. preferred embodiments the medium is selected from the 0.052 According to a further aspect of the present inven group consisting of a magnetic storage medium, an optical tion there is provided an isolated polynucleotide comprising storage medium and an optico-magnetic storage medium. a nucleic acid sequence encoding a polypeptide having an 0041 According to still further features in the described amino acid sequence at least 80% homologous to a sequence preferred embodiments the database further includes infor set forth in the file “Proteins.gz” of the enclosed CD-ROM4. mation pertaining to gain and/or loss of function of the dif 0053 According to yet a further aspect of the present ferentially expressed mRNA splice variants or polypeptides invention there is provided an isolated polynucleotide com encoded thereby. prising a nucleic acid sequence at least 80% identical to a 0042. According to an additional aspect of the present sequence set forth in the file “Transcripts.gz' of the enclosed invention there is provided a kit useful for detecting differen CD-ROM4. tially expressed polynucleotide sequences, the kit comprising 0054 According to still a further aspect of the present at least one oligonucleotide being designed and configured to invention there is provided an isolated polypeptide having an be specifically hybridizable with a polynucleotide sequence amino acid sequence at least 80% homologous to a sequence selected from the group consisting of sequence files “Tran set forth in the file “Proteins.gz” of the enclosed CD-ROM4. scripts.gz' of enclosed CD-ROM4 under moderate to strin 0055 According to still a further aspect of the present gent hybridization conditions. invention there is provided use of a polynucleotide or 0043. According to still further features in the described polypeptide set forth in the file “Transcripts.gz' or “Proteins. preferred embodiments the at least one oligonucleotide is gz of the enclosed CD-ROM4 for the diagnosis and/or treat labeled. ment of the diseases listed in herein. 0044 According to still further features in the described 0056. The present invention successfully addresses the preferred embodiments the at least one oligonucleotide is shortcomings of the presently known configurations by pro attached to a solid substrate. viding methods and systems useful for systematically uncov 0045. According to still further features in the described ering and annotating biomolecular sequences. preferred embodiments the solid substrate is configured as a 0057. Unless otherwise defined, all technical and scien microarray and whereas the at least one oligonucleotide tific terms used herein have the same meaning as commonly includes a plurality of oligonucleotides each being capable of understood by one of ordinary skill in the art to which this hybridizing with a specific polynucleotide sequence of the invention belongs. Although methods and materials similar or polynucleotide sequences set forth in the files “Transcripts. equivalent to those described herein can be used in the prac gz' of enclosed CD-ROM4 under moderate to stringent tice or testing of the present invention, Suitable methods and hybridization conditions. materials are described below. In addition, the materials, 0046 According to still further features in the described methods, and examples are illustrative only and not intended preferred embodiments each of the plurality of oligonucle to be limiting. otides is being attached to the microarray in a regio-specific a. BRIEF DESCRIPTION OF THE DRAWINGS 0047 According to still further features in the described 0058. The invention is herein described, by way of preferred embodiments the at least one oligonucleotide is example only, with reference to the accompanying drawings. designed and configured for DNA hybridization. With specific reference now to the drawings in detail, it is 0048. According to still further features in the described stressed that the particulars shown are by way of example and preferred embodiments the at least one oligonucleotide is for purposes of illustrative discussion of the preferred designed and configured for RNA hybridization. embodiments of the present invention only, and are presented 0049 According to yet an additional aspect of the present in the cause of providing what is believed to be the most invention there is provided a system for generating a database useful and readily understood description of the principles of differentially expressed genes, the system comprising a and conceptual aspects of the invention. In this regard, no processing unit, the processing unit executing a Software attempt is made to show structural details of the invention in application configured for: (a) obtaining contigs representing more detail than is necessary for a fundamental understand genes of interest, each of the contigs being assembled from a ing of the invention, the description taken with the drawings plurality of expressed sequences; (b) comparing a number of making apparent to those skilled in the art how the several the plurality of expressed sequences corresponding to each of forms of the invention may be embodied in practice. the contigs, which are expressed in each of at least two tissue 0059. In the drawings: types, to thereby compare the expression level of the genes of 0060 FIG.1a illustrates a system designed and configured interest in the at least two tissue types; and (c) storing contigs for generating a database of annotated biomolecular which are supported by different numbers of the plurality of sequences according to the teachings of the present invention. US 2011/0091454 A1 Apr. 21, 2011

0061 FIG. 1b illustrates a remote configuration of the (SEQID NO: 40) colorectal specific transcript. Colon normal system described in FIG. 1a. was obtained from Biochain, cat no: A406029. The adeno 0062 FIG. 2 illustrates a gastrointestinal tissue hierarchy carcinoma sample represents a pool of spleen, lung, stomach dendrogram generated according to the teachings of the and kidney adenocarcinomas, obtained from patients. Each of present invention. all other tissues (i.e., colon carcinoma samples Duke's A-D; 0063 FIG.3 is a scheme illustrating multiple alignment of and normal thymus, spleen, kidney, placenta, stomach, brain) alternatively spliced expressed sequences with a genomic were obtained from 3-6 patients and pooled. sequence including 3 exons (A, B and C) and two introns. (0071 FIG. 11 is a histogram depicting Real-Time RT-PCR Two alternative splicing events are described: One from the quantification of copy number, of a lung specific transcript, donor site, which involves an AB junction, between donorand (SEQID NO: 15). Amplification products obtained from the proximal acceptor and an AC junction, between donor and following tissues were quantified; normal salivary gland from distal acceptor; A Second alternative splicing event is total RNA (Clontech, cat no:64110-1); lung normal from described from the acceptor site, which involves AC junction, pooled adult total RNA (BioChain, cat no: A409363); lung between distal donor and acceptor and BC junction, between tumor squamous cell carcinoma (Clontech, cat no:64013-1); proximal donor and acceptor. lung tumor squamos cell carcinoma (BioChain, cat 0064 FIG. 4 is a tissue hierarchy dendogram generated no: A409017); pooled lung tumor squamos cell carcinoma according to the teachings of the present invention. The (BioChain, cat no: A411075); moderately differentiated higher annotation levels are marked with a single number, i.e., squamos cell carcinoma (BioChain, cat no: A409091); well 1-16. The lower annotation levels are marked within the rel differentiated squamos cell carcinoma (BioChain, cat no: evant category as one-four numbers after the point (e.g. 4. A408175); pooled adenocarcinoma (BioChain, cat no: genitourinary system; 4.2 genital system; 4.2.1 women geni A411076); moderately differentiated alveolus cell carcinoma tal system; 4.2.1.1 cervix). (BioChain, cat no: A409089); non-small cell lung carcinoma 0065 FIG. 5 is a graph illustrating a correlation between cell line H1299; The following normal and tumor samples LOD scores of textual information analysis and accuracy of were obtained from patients: normal lung (internal number ontological annotation prediction. Results are based on self CG-207N), lung carcinoma (internal number-CG-72), squa validation studies. Only predictions made with LOD scores mos cell carcinoma (internal number-CG-196), squamos cell above 2 were evaluated and used for GO annotation process. carcinoma (internal number-CG-207), lung adenocarcinoma 0066 FIGS. 6a-care histograms showing the distribution (internal number-CG-120), lung adenocarcinoma (internal of proteins (closed squares) and contigs (opened squares) number-CG-160). Copy number was normalized to the levels from Ensembl version 1.0.0 in the major nodes of three GO of expression of the housekeeping genes Proteasome 26S categories—cellular component (FIG. 6a), molecular func subunit (dark columns) and GADPH (bright columns). tion (FIG. 6b), and biological process (FIG. 6c). (0072 FIG. 12 is a histogram depicting Real-Time RT-PCR 0067 FIG. 7 illustrates results from RT-PCR analysis of quantification of copy number, of the lung specific transcript the expression pattern of the AA535072 (SEQ ID NO:39) (SEQID NO:32). Amplification products obtained from the colorectal cancer-specific transcript. The following cell and following tissues and cell-lines were quantified; lung normal tissue samples were tested: B-colon carcinoma cell line from pooled adult total RNA (BioChain, cat no: A409363); SW480 (ATCC-228); C colon carcinoma cell line SW620 lung tumor squamos cell carcinoma (Clontech, cat no:64013 (ATCC-227); D colon carcinoma cell line colo-205 1); lung tumor squamos cell carcinoma (BioChain, cat (ATCC-222). Colon normal tissue indicates a pool of 10 no: A409017); pooled lung tumor squamos cell carcinoma different samples, (Biochain, cat no A406029). The adeno (BioChain, cat no: A411075); moderately differentiated carcinoma sample represents a pool of spleen, lung, stomach squamos cell carcinoma (BioChain, cat no: A409091); well and kidney adenocarcinomas, obtained from patients. Each of differentiated squamos cell carcinoma (BioChain, cat no: the tissues (i.e., colon carcinoma samples Duke's A-D; and A408175); pooled adenocarcinoma (BioChain, cat no: normal muscle, pancreas, breast, liver, testis, lung, heart, A411076); moderately differentiated alveolus cell carcinoma ovary, thymus, spleen kidney, placenta, stomach, brain) were (BioChain, cat no: A409089); non-small cell lung carcinoma obtained from 3-6 patients and pooled. cell line H1299; The following normal and tumor samples 0068 FIG. 8 illustrates results from RT-PCR analysis of were obtained from patients: normal lung (internal number the expression pattern of the AA513157 (SEQ ID NO: 7) CG-207N), lung carcinoma (internal number-CG-72), squa Ewing sarcoma specific transcript. The (+) or (-) symbols, mos cell carcinoma (internal number-CG-196), squamos cell indicate presence or absence of reverse transcriptase in the carcinoma (internal number-CG-207), lung adenocarcinoma reaction mixture. A molecular weight standard is indicated by (internal number-CG-120), lung adenocarcinoma (internal M. Tissue samples (i.e., Ewing sarcoma samples, spleen number-CG-160). Copy number was normalized to the levels adenocarcinoma, brain, prostate and thymus) were obtained of expression of the housekeeping genes Proteasome 26S from patients. The Ln-CAP human prostatic adenocarcinoma subunit (dark columns) and GADPH (bright columns). cell line was obtained from the ATCC (Manassas, Va.). (0073 FIG. 13 is a histogram depicting Real-Time RT-PCR 0069 FIG.9 is an autoradiogram of a northern blot analy quantification of copy number, of the lung specific transcript sis depicting tissue distribution and expression levels of (SEQID NO: 18). Amplification products obtained from the AA513157 (SEQ ID NO: 7) Ewing sarcoma specific tran following tissues and cell-lines were quantified; lung normal script. Arrows indicate the molecular weight of 28S and 18S from pooled adult total RNA (BioChain, cat no: A409363); ribosomal RNA subunits. The indicated tissue samples were lung tumor squamos cell carcinoma (Clontech, cat no:64013 obtained from patients and SK-ES-1—Ewing sarcoma cell 1); lung tumor squamos cell carcinoma (BioChain, cat line was obtained from the ATCC(CRL-1427). no: A409017); pooled lung tumor squamos cell carcinoma 0070 FIG. 10 illustrates results from semi quantitative (BioChain, cat no: A411075); moderately differentiated RT-PCR analysis of the expression pattern of the AA469088 squamos cell carcinoma (BioChain, cat no: A409091); well US 2011/0091454 A1 Apr. 21, 2011

differentiated Squamos cell carcinoma (BioChain, cat no: depicts the identification of mouse ESTs which contain the A408175); pooled adenocarcinoma (BioChain, cat no: exon as well as the two flanking exons. FIG. 15c illustrates a A411076); moderately differentiated alveolus cell carcinoma specific embodiment wherein the exon is absent in the mouse (BioChain, cat no: A409089); non-small cell lung carcinoma ESTs, in this case the human exon sequence is searched cell line H1299; The following normal and tumor samples against the intron spanned by the skipping mouse EST on the were obtained from patients: normal lung (internal number mouse genome. If a significant conservation (i.e., above 80%) CG-207N), lung carcinoma (internal number-CG-72), squa was found and the alignment spanned the full length of the mos cell carcinoma (internal number-CG-196), squamos cell human exon, the exon was considered conserved. carcinoma (internal number-CG-207), lung adenocarcinoma 0076 FIGS. 16a-d illustrate the stepwise methodology (internal number-CG-120), lung adenocarcinoma (internal which is used to uncover true SNPs, as described in Example number-CG-160). Copy number was normalized to the levels 22 of the Examples section. of expression of the housekeeping genes Proteasome 26S 0077 FIG. 17 is a schematic illustration, depicting group subunit (dark columns) and GADPH (bright columns). ing of transcripts of a given contig based on presence or 0074 FIG. 14 is a histogram depicting Real-Time RT-PCR absence of unique sequence regions. Region 1: common to all quantification of copy number, of a lung specific transcript transcripts, thus it is not considered; Region 2: Specific to (SEQID NO: 21). Amplification products obtained from the T 1: T 1 unique regions (2+6) against T 2+3 unique following tissues and cell-lines were quantified; Samples 1-6 regions (3+4); Region 3: Specific to T 2+3: T 2+3 unique are commercial normal lung samples (BioChain, CDP regions (3+4) against T1 unique regions (2+6); Region 4: 061010; A503205, A503384, A503385, A503204, A503206, specific to T 3: T 3 unique regions (4) against T1+2 unique A409363). Sample 7 is lung well differentiated adenocarci regions (2+5+6); Region 5: Specific to T 1+2: T 1+2 unique noma (BioChain, CDP-064004A. A504117). Sample 8 is regions (2+5+6) against T3 unique regions (4); Region 6: lung moderately differentiated adenocarcinoma (BioChain, specific to T 1: same as region 2. CDP-064004A. A504119). Sample 9 is lung moderately to 0078 FIG. 18a is a schematic illustration depicting the poorly differentiated adenocarcinoma (BioChain, CDP GCSF splice variant (SEQ ID NO: 68) as compared to the 064004A. A504116). Sample 10 is lung well differentiated wild-type gene product. adenocarcinoma (BioChain, CDP-064004A. A504118). (0079 FIG. 18b present the nucleic acid sequence of the Samples 11-16 are lung adenocarcinoma samples obtained GCSF splice variant (SEQID NO: 71), which was uncovered from patients. Sample 17 is lung moderately differentiated using the teachings of the present invention. Start and stop squamous cell carcinoma (BioChain, CDP-064004B; codons are highlighted. A503187). Sample 18 is lung squamous cell carcinoma (Bio 0080 FIG. 18c present the amino acid sequence of the Chain, CDP-064004B: A503386). Samples 20-21 are lung GCSF splice variant (SEQID NO: 68), which was uncovered moderately differentiated squamous cell carcinoma (Bio using the teachings of the present invention. Chain, CDP-064004B; A503387, A503383). Sample 22 is I0081 FIG. 18d is a sequence alignment depicting the pro lung Squamous cell carcinoma pooled (BioChain, CDP tein product of a GCSF splice variant (SEQ ID NO: 68) as 064004B; A411075). Samples 23-26 and sample 31 are lung compared to the wild-type protein (Refsec Accession No. squamous cell carcinoma obtained from patients. Sample 27 MN000759). is lung squamous cell carcinoma (Clontech, 64013-1). I0082 FIG. 18e is an illustration depicting a graphical Sample 28 is lung Squamous cell carcinoma (BioChain, viewer scheme presenting the a splice variant of GCSF (SEQ A409017). Sample 29 is lung moderately differentiated squa ID NO: 68) uncovered by the present invention as compared mous cell carcinoma (BioChain, CDP-064004B: A409091). to the wildtype mRNA of GCSF. ESTs supporting the variant Sample 30 is lung well differentiated squamous cell carci are indicated. The transcript indicated as “0” represents noma (BioChain, CDP-064004B; A408175). Samples 32-35 known mRNA. The color code is as follows: red designates are lung small cell carcinoma (BioChain, CDP-064004D: genomic DNA; pink designates Refseq mRNA: light blue A504115, A501390, A501389, A501391). Sample 36-37 are designates known GenBank mRNAs; purple designates ESTs lung large cell carcinoma (BioChain, CDP-064004C: which are aligned in the same directionality as their annota A5041 13, A504.114). Sample 38 is lung moderately differen tion; gray designates ESTs without direction annotation; dark tiated alveolus cell carcinoma (BioChain, A409089). Sample blue designates predicted transcripts; turquoise designates 39 is lung carcinoma obtained from patient. Sample 40 is lung the predicted polypeptide. H1299 non-small cell carcinoma cell line. Sample 41 is nor I0083 FIG. 19a is a schematic illustration depicting the mal salivary gland sample (Clontech, 64110-1). Copy num IL-7 splice variant (SEQ ID NO: 69) as compared to the ber was normalized to the levels of expression of the house wild-type gene product. keeping genes Proteasome 26S subunit (dark columns) and I0084 FIG. 19b present the nucleic acid sequence of the GADPH (bright columns). IL-7 splice variant (SEQID NO: 72), which was uncovered 0075 FIGS. 15a-c are schematic illustrations depicting using the teachings of the present invention. Start and stop the methodology undertaken for finding exon-skipping codons are highlighted. events which are conserved between human and mice I0085 FIG. 19c present the amino acid sequence of the genomes. 3.583 exon skipping events were found in the IL-7 splice variant (SEQID NO: 69), which was uncovered human genome using the methodology described in Sorek using the teachings of the present invention. (2002) Genome Res. 12:1060-1067. FIG. 15a for 980 of I0086 FIG. 19d is a sequence alignment depicting the pro these human exons, a mouse EST spanning the intron which tein product of an IL-7 splice variant (SEQ ID NO: 69) as represents the exon-skipping variant was found. Human compared to the wild-type protein (GenBank Accession No. ESTs are designated in purple. Mouse ESTs are denoted by IL7 HUMAN). light blue. FIGS. 15b-c depict two approaches for identifying I0087 FIG. 19e is an illustration depicting a graphical exon conservation between mice and human. FIG. 15b viewer scheme presenting the a splice variant of IL-7 (SEQID US 2011/0091454 A1 Apr. 21, 2011

NO: 69) uncovered by the present invention as compared to ous, and tumor derived colon samples as determined by real the wild type mRNA of IL-7. ESTs supporting the variant are time PCR using a troponin-569208 unique region derived indicated. The transcript indicated as “0” represents known fragment (SEQ ID NOs: 44-amplicon). Expression was nor mRNA. The color code is as follows: red designates genomic malized to the averaged expression of four housekeeping DNA; pink designates Refseq mRNA: light blue designates genes PBGD, HPRT, RPS27A and G6PD. known GenBank mRNAs; purple designates ESTs which are aligned in the same directionality as their annotation; gray DESCRIPTION OF THE PREFERRED designates ESTs without direction annotation: dark blue des EMBODIMENTS ignates predicted transcripts; turquoise designates the pre 0097. The present invention is of methods and systems, dicted polypeptide. which can be used for annotating biomolecular sequences. 0088 FIG. 20a is a schematic illustration depicting the Specifically, the present invention can be used to identify and VEGF-B splice variant (SEQID NO: 70) as compared to the annotate differentially expressed biomolecular sequences, wild-type gene product. such as differentially expressed alternatively spliced 0089 FIG. 20b present the nucleic acid sequence of the Sequences. VEGF-B splice variant (SEQID NO: 73) which was uncov 0098. The principles and operation of the present inven ered using the teachings of the present invention. Start and tion may be better understood with reference to the drawings stop codons are highlighted. and accompanying descriptions. 0090 FIG. 20c present the amino acid sequence of the 0099. Before explaining at least one embodiment of the VEGF-B splice variant (SEQID NO: 70) which was uncov invention in detail, it is to be understood that the invention is ered using the teachings of the present invention. not limited in its application to the details of construction and 0091 FIG. 20d is a sequence alignment depicting the pro the arrangement of the components set forth in the following tein product of a VEGF-B splice variant (SEQID NO:70) as description or illustrated in the drawings. The invention is compared to the wild-type protein (GenBank accession No. capable of other embodiments or of being practiced or carried VEGB. HUMAN). out in various ways. Also, it is to be understood that the 0092 FIG. 20e is an illustration depicting a graphical phraseology and terminology employed herein is for the pur viewer scheme presenting the a splice variant of VEGF-B pose of description and should not be regarded as limiting. (SEQ ID NO: 70) uncovered by the present invention as 0100 Terminology compared to the wildtype mRNA of VEGF-B. ESTs support 0101. As used herein, the term "oligonucleotide' refers to ing the variant are indicated. The transcript indicated as “0” a single stranded or double stranded oligomer or polymer of represents known mRNA. The color code is as follows: red ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) or designates genomic DNA; pink designates Refseq mRNA: mimetics thereof. This term includes oligonucleotides com light blue designates known GenBank mRNAs; purple des posed of naturally-occurring bases, Sugars and covalent inter ignates ESTs which are aligned in the same directionality as nucleoside linkages (e.g., backbone) as well as oligonucle their annotation; black designates ESTs aligned in a direction otides having non-naturally-occurring portions which opposite to the annotation; gray designates ESTs without function similarly. Such modified or substituted oligonucle direction annotation; dark blue designates predicted tran otides are often preferred over native forms because of desir Scripts; turquoise designates the predicted polypeptide. able properties such as, for example, enhanced cellular 0093 FIG. 21 is an illustration depicting schematic align uptake, enhanced affinity for nucleic acid target and increased ment of the nucleic acid sequences of wild type Troponin stability in the presence of nucleases. transcript (GenBank Accession No. NM 003283) and vari 0102) The phrase “complementary DNA (cDNA) refers ants 1, 4, 6, 9, 10, 14 and 16 (SEQID NOS. 75, 77,79, 81, 83, to the double stranded or single stranded DNA molecule, 66 and 67, respectively). Coding regions are marked by green. which is synthesized from a messenger RNA template. Sequence region 4a codes for the unique amino acid sequence 0103) The term “contig” refers to a series of overlapping and is marked by light green and diagonal stripes. Other sequences with Sufficient identity to create a longer contigu regions marked in lightgreen code for additional novel amino ous sequence. A plurality of contigs may form a cluster. acids sequences. Red arrows indicate the location of the prim Clusters are generally formed based upon a specified degree ers and SEQID NOS. thereof, which were used for real-time of homology and overlap (e.g., a stringency), and/or based on PCR validation. prior knowledge of ESTs from different contigs derived from 0094 FIG. 22 is a histogram depicting the expression of the same mRNA also known as clone mates. The different troponin transcripts of the present invention in normal, benign contigs in a cluster do not typically represent the entire and tumor derived ovarian samples as determined by real time sequence of the gene, rather the gene may comprise one or PCR using a troponin-569208 unique region derived frag more unknown intervening sequences between the defined ment (SEQID NOs: 44-amplicon). Expression was normal contigs. ized to the averaged expression of four housekeeping genes 0104. The term “cluster refers to a nucleic acid sequence PBGD, HPRT, GAPDH and SDHA. cluster or a protein sequence cluster. The former refers to a 0095 FIG. 23 is a histogram depicting the expression of group of nucleic acid sequences which share a requisite level troponin transcripts of the present invention in normal and of homology and or other similar traits according to a given tumor derived lung samples as determined by real time PCR clustering criterion; and the latter refers to a group of protein using a troponin-569208 unique region derived fragment sequences which share a requisite level of homology and/or (SEQ ID NO: 44-amplicon). Expression was normalized to other similar traits according to a given clustering criterion. the averaged expression of four housekeeping genes PBGD, 0105. A process and/or method to group nucleic acid or HPRT, Ubiquitin and SDHA. protein sequences as Such is referred to as clustering, which is 0096 FIG. 24 is a histogram depicting the expression of typically performed by a clustering (i.e., alignment) applica troponin transcripts of the present invention in non-cancer tion program implementing a cluster algorithm. US 2011/0091454 A1 Apr. 21, 2011

0106. As used herein the phrase “biomolecular dominant negative inhibitors based on inhibition of mERK5a sequences’ refers to amino acid sequences (i.e., peptides, kinase activity and mERK5a-mediated MEF2C transactiva polypeptides) and nucleic acid sequences, which include but tion. The C-terminal tail, which contains a putative nuclear are not limited to genomic sequences, expressed sequence localization signal, is not required for activation and kinase tags, contigs, complementary DNA (cDNA) sequences, pre activity but is responsible for the activation of nuclear tran messenger RNA (mRNA) sequences, and mRNA sequences. scription factor MEF2C due to nuclear targeting. In addition, Expressed sequences include also products of alternative the N-terminal domain spanning amino acids (aa) 1-77 is splicing or RNA editing events which are well known for important for cytoplasmic targeting; the domain from aa 78 to contributing to gene product diversity Krevintseva Trends 139 is required for association with the upstream kinase Genet. (2003) 19(3):124-8; Keegan (2001) Nat. Rev. Genet. MEK5; and the domain from aa 140-406 is necessary for 2:869-78; Schaub (2002) Biobhimie 84:791-803; Adler oligomerization Yon et al. J. Biol. Chem. (2001) 276(14): (1994) Curr. Opin. Genet. Dev. 4:316-22). 10870-8). 0107 As used herein the phrase “functionally altered bio 0112 The phrase “modification of the original function' molecular sequences' refers to expressed sequences, (e.g., may be exemplified by a changing a receptor function to a alternatively spliced sequences) which protein products ligand function. For example, a soluble secreted receptor may exhibit gain of function or loss of function or modification of exhibit change in functionality as compared to a membrane the original function. anchored wild-type receptor by acting as a ligand, activating 0108. As used herein the phrase “gain of function” refers parallel signaling pathways by trans-signaling e.g., the sig to any gene product (e.g., product of alternative splicing, naling reported for soluble IL-6R. Kallen Biochim Biophys product of RNA editing), which exhibits increased function Acta. (2002) Nov. 11; 1592(3):323-43, stabilizing ligand ality as compared to the wild type gene product. Such again receptor interactions or protecting the ligand or the wild-type of function may have a dominant effect on the wild-type gene receptor from degradation and/or prolonging their half-life. product. An alternatively spliced variant of Max, a binding In this case the Soluble receptor will function as an agonist. partner of the Myc oncogene, provides a typical example for 0113. As used herein the term “modulator” refers to a a “gain of function' alteration. This variant is truncated at the molecule which inhibits (i.e., antagonist, inhibitor, Suppres COOH-terminus and while is still capable of binding to the sor) or activates (i.e., agonist, stimulant, activator) a down CACGTG motif of c-Myc, it lacks the nuclear localization stream molecule to thereby modulate it's activity. signal and the putative regulatory domain of Max. When 0114. As used herein the phrase “functional domain tested in a myc-ras cotransformation assay in rat embryo refers to a region of a biomolecular sequence, which displays fibroblasts, wild-type Max suppressed cellular transforma a particular function. This function may give rise to a biologi tion, whereas the above-described Max splice variant cal, chemical, or physiological consequence which may be enhanced transformation Makela T P Koskinen PJ. Vastrik reversible or irreversible and which may include protein I, Alitalo K., Science. 1992 Apr. 17; 256(5055):373-7. protein interactions (e.g., binding interactions) involving the 0109 As used herein the phrase “loss of function” refers to functional domain, a change in the conformation or a trans any gene product (mRNA or protein), which exhibits total or formation into a different chemical state of the functional partial reduction in function as compared to the wild type domain or of molecules acted upon by the functional domain, gene product. Loss of function can also manifest itself the transduction of an intracellular or intercellular signal, the through a dominant negative effect. regulation of gene or protein expression, the regulation of cell 0110. As used herein the phrase “dominant negative' growth or death, or the activation or inhibition of an immune refers to the dominant negative effect of a gene product (e.g., response. product of alternative splicing, product of RNA editing) on 0115 With the presentation of the human genome work the activity of wild type protein. For example, a protein prod ing draft, data analysis rather than data collection presents the uct of an altered splice variant may bind a wild type target biggest challenge to biologists. Efforts to ascribe biological protein without enzymatically activating it (e.g., receptor meaning to genomic data, include the development of dimers), thus blocking and preventing the active advanced wet laboratorial techniques as well as computerized from binding and activating the target protein. This mode of algorithms. While the former are limited due to inaccuracy, action provides a mechanism to the dominant negative action time consumption, laborintensiveness and costs the latter are of soluble receptors on wild-type membrane anchored recep still unfeasible due to the poor organization of on hand tors. Such soluble receptors may compete with wild-type sequence databases as well as the composite nature of bio receptors on ligand-binding and as such may be used as logical data. antagonists. For example, two splice variants of guanylyl 0116. As is further described hereinbelow, the present cyclase-B receptor were recently described (GC-B1, Tamura inventors have developed a computer-based approach for the N and Garbers DL, J. Biol. Chem. (2003)278(49):48880-9). functional, spatial and temporal analysis of biological data. One form has a 25 amino acid deletion in the kinase homol The present methodology generates comprehensive data ogy domain. This variant binds the ligand but fails to activate bases, which greatly facilitate the use of available genetic the cyclase. A second variant includes only a portion of the information in both research and commercial applications. extracellular domain. This form fails to bind the ligand. Both 0117. By applying the algorithms described hereinbelow variants. When co-expressed with the wild-type receptor both and in the Examples section, which follows, the present act as dominant negative isoforms by virtue of blocking for inventors collected sequence information and corresponding mation of active GC-B1 homodimers. sequence annotations as set forth in the files “Transcripts 0111. A dominant negative effect may also be exerted by nucleotide seqs part 1 of CD ROM1, “Transcripts nucle miss-localization of the altered variant or by multiple modes otide seqs part2”, “Transcripts nucleotide seqs part3. of action. For example, the splice variants of wild-type myto “protein seqs”, “PropG seqs'. “Transcripts nucleotide se gen activated protein kinase 5a, ERK5b and mERK5c act as qs part4” of CD-ROM2, “summary table” of CD-ROM3, US 2011/0091454 A1 Apr. 21, 2011

“Annotations.gz”. “Transcripts.gz, and “Proteins.gz' of analyzing (e.g., text-mining) at least one biomolecular enclosed CD-ROM4. This comprehensive database allows sequence of each cluster thereby annotating biomolecular simple elucidation of yet unknown function of mass gene Sequences. products and illustrates spatial and temporal patterns of gene I0127. Any annotational information identified and/or gen expression in various types of tissues, such as healthy and erated according to the teachings of the present invention can diseased, which may contribute immensely to further under be stored in a database which can be generated by a suitable standing of disease mechanisms and allow use thereof in the computing platform. configuration of therapeutic and diagnostic applications. I0128. Thus, the method according to this aspect of the 0118. As is further described hereinunder, the present present invention provides a novel approach for annotating invention encompasses several novel approaches for annotat biomolecular sequences even on a scale of a genome, a tran ing biomolecular sequences which can be individually scriptom (i.e., the repertoire of all messenger RNA molecules applied or in combination. transcribed from a genome) or a proteom (i.e., the repertoire 0119) “Annotating refers to the act of discovering and/or of all proteins translated from messenger RNA molecules). assigning an annotation (i.e., critical or explanatory notes or This enables transcriptome-wise comparative analyses (e.g., comment) to a biomolecular sequence of the present inven analyzing chromosomal distribution of human genes) and tion. cross-transcriptome comparative studies (e.g., comparing 0120. The term “annotation” refers to a functional or expressed data across species) both of which may involve structural description of a sequence, which may include iden various Subontologies such as molecular function, biological tifying attributes Such as locus name, keywords, Medline process and cellular localization. references, cloning data, single nucleotide polymorphism I0129. Biomolecular sequences which can be used as data, information of coding region, regulatory regions, cata working material for the annotating process according to this lytic regions, name of encoded protein, Subcellular localiza aspect of the present invention can be obtained from a bio tion of the encoded protein, protein hydrophobicity, protein molecular sequence database. Such a database can include function, mechanism of protein function, information on protein sequences and/or nucleic acid sequences derived from metabolic pathways, regulatory pathways, protein-protein libraries of expressed messenger RNA i.e., expressed interactions, tissue expression profile, diseases and disorders sequence tags (EST). cDNA clones, contigs, pre-mRNA, (i.e., indications), therapies, pharmacological activities and which are prepared from specific tissues or cell-lines or from diagnostic applications. whole organisms. I0130. This database can be a pre-existing publicly avail The Ontological Annotation Approach able database i.e., GenBank database maintained by the 0121. An ontology refers to the body of knowledge in a National Center for Biotechnology Information (NCBI), part specific knowledge domain or discipline Such as molecular of the National Library of Medicine, and the TIGR database biology, microbiology, immunology, Virology, plant sci maintained by The Institute for Genomic Research, Blocks ences, pharmaceutical chemistry, medicine, neurology, endo database maintained by the Fred Hutchinson Cancer crinology, genetics, ecology, genomics, proteomics, chemin Research Center, Swiss-Prot site maintained by the Univer formatics, pharmacogenomics, bioinformatics, computer sity of Geneva and GenPept maintained by NCBI and includ Sciences, statistics, mathematics, chemistry, physics and arti ing public protein-sequence database which contains all the ficial intelligence. protein databases from GenBank, or private databases (i.e., 0122 An ontology includes domain-specific concepts— the LifeSeqTM and PathoSeq t M databases available from referred to herein as Sub-ontologies. A sub-ontology may be Incyte Pharmaceuticals, Inc. of Palo Alto, Calif.). Optionally, classified into Smaller and narrower categories. biomolecular sequences of the present invention can be 0123. The ontological annotation approach of the present assembled from a number of pre-existing databases as invention is effected as follows. described in Example 5 of the Examples section. 0.124 First, biomolecular sequences are computationally I0131 Alternatively, the database can be generated from clustered according to a progressive homology range, thereby sequence libraries including, but not limited to, cDNA librar generating a plurality of clusters each being of a predeter ies, EST libraries, mRNA libraries and the like. mined homology of the homology range. I0132 Construction and sequencing of a cDNA library is 0.125 Progressive homology according to this aspect of one approach for generating a database of expressed mRNA the present invention is used to identify meaningful homolo sequences. cDNA library construction is typically effected by gies among biomolecular sequences and thereby assign new tissue or cell sample preparation, RNA isolation, cDNA ontological annotations to sequences, which share requisite sequence construction and sequencing. levels of homologies. Essentially, a biomolecular sequence is I0133. It will be appreciated that such cDNA libraries can assigned to a specific cluster if displays a predetermined be constructed from RNA isolated from whole organisms, homology to at least one member of the cluster (i.e., single tissues, tissue sections, or cell populations. Libraries can also linkage). As used herein “progressive homology range” refers be constructed from a tissue reflecting a particular pathologi to a range of homology thresholds, which progress via pre cal or physiological State. determined increments from a low homology level (e.g.35%) I0134. Once raw sequence data is obtained, biomolecular to a high homology level (e.g. 99%). Further description of a sequences are computationally clustered according to a pro progressive homology range is provided in the Examples gressive homology range using one or more clustering algo section which follows. rithms. To obtain progressive clusters, the biomolecular 0126 Following generation of clusters, one or more sequences are clustered through single linkage. Namely, a ontologies are assigned to each cluster. Ontologies are biomolecular sequence belongs to a cluster if this sequence derived from an annotation preassociated with at least one shares a sequence homology above a certain threshold to one biomolecular sequence of each cluster, and/or generated by member of the cluster. The threshold increments from a high US 2011/0091454 A1 Apr. 21, 2011 homology level to a low homology level with a predetermined 0144 Ontologies, Sub ontologies, and their ontological resolution. Preferably the homology range is selected from relations (i.e., inherent relation—the sub-ontology “IS THE” 99%-35%. ontology or composite relation—the ontology “HAS’ the sub 0135 Computational clustering can be effected using any ontology) can be organized into various computer data struc commercially available alignment software including the tures such as a tree, a map, a graph, a stack oralist. These may local homology algorithm of Smith & Waterman, Adv. Appl. also be presented in various data format such as, text, table, Math. 2:482 (1981), using the homology alignment algorithm html, or extensible markup language (XML) of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), using 0145 Ontologies and/or Subontologies assigned to a spe the search for similarity method of Pearson & Lipman, Proc. cific biomolecular sequence can be derived from an annota Natl. Acad. Sci. USA85:2444 (1988), or using computerized tion, which is preassociated with at least one biomolecular implementations of algorithms GAP BESTFIT. FASTA, and sequence in a cluster generated as described hereinabove. TFASTA in the Wisconsin Genetics Software Package 0146 For example, biomolecular sequences obtained Release 7.0, Genetics Computer Group, 575 Science Dr. from an annotated database are typically preassociated with Madison, Wis. an annotation. An "annotated database' refers to a database of 0.136 Another example of an algorithm which is suitable biomolecular sequences, which are at least partially charac for sequence alignment is the BLAST algorithm, which is terized with respect to functional or structural aspects of the described in Altschulet al., J. Mol. Biol. 215:403-410 (1990). sequence. Examples of annotated databases include but are Software for performing BLAST analyses is publicly avail not limited to: GenBank (www.ncbi.nlm.nih.gov/GenBank/), able through the National Center for Biotechnology Informa Swiss-Prot (www.expasy.ch/sprot/sprot-top.html), GDB tion (http://www.ncbi.nlm.nih.gov/). (www.gdb.org/), PIR (www.mips.biochem.mpg.de/proj/ 0.137 Since the present invention requires processing of prostseqdb/), YDB (www.mips.biochem.mpg.de/proj/ large amounts of data, sequence alignment is preferably yeast/), MIPS (www.mips.biochem.mpg.de/proj/human), effected using assembly software. HGI (www.tigr.org/tdb/hgi/). Celera Assembled Human 0138 A number of commonly used computer software Genome (www.celera.com/products/human ann.cfm and fragment read assemblers capable of forming clusters of LifeSeq Gold (https://lifeseq gold.incyte.com). Additional expressed sequences, and aligning members of the cluster specialized annotated databases include annotative informa (individually or as an assembled contig) with other sequences tion on metabolic (http://www.genome.ad.jp/kegg/metabo (e.g., genomic database) are now available. These packages lism.html) and regulatory pathways (http://www.genome.ad. include but are not limited to. The TIGR Assembler Sutton jp/kegg/regulation.html), and protein-protein interactions G. et al. (1995) Genome Science and Technology 1:9-19. (http://dip.doe-mbi.ucla.edu/), etc. GAP Bonfield J.K. etal. (1995) Nucleic Acids Res. 23:4992 0147 Alternatively, ontologies can be generated from an 4999), CAP2 Huang X. et al. (1996) Genomics 33:21-31), analysis of at least one biomolecular sequence in each of the the Genome Construction Manager Laurence C B. Et al. clusters of the present invention. (1994) Genomics 23:192-2011, Bio Image Sequence Assem 0148 Preferably, analysis of the biomolecular sequence is bly Manager, SeqMan Swindell S R. and Plasterer J. N. effected by literature text mining. Since manual review of (1997) Methods Mol. Biol. 70:75-89, and LEADS and Gen related-literature may be a daunting task, computational Carta (Compugen Ltd. Israel). extraction of text information is preferably effected. 0.139. It will be appreciated that since applying sequence 014.9 Thus, the method of the present invention can also homology analysis on large number of sequences is compu process literature and other textual information and utilize tationally intensive, local alignment (i.e., the alignment of processed textual data for generating additional ontological portions of protein sequences) is preferably effected prior to annotations. For example, text information contained in the global alignment (alignment of protein sequences along their sequence-related publications and definition lines in entire length), as described in Example 6 of the Examples sequence records of sequence databases can be extracted and section. processed. Ontological annotations derived from processed 0140. Once progressive clusters are formed, one or more text data are then assigned to the sequences in the correspond ontological annotations (i.e., assigning an ontology) are ing clusters. assigned to each cluster. 0150 Ontological annotations can also be extracted from 0141 Systematic and standardized ontological nomencla sequence associated Medical subject heading (MeSH) terms ture is preferably used. Such nomenclature (i.e., keywords) which are assigned to published papers. can be obtained from several sources. For example, ontologi 0151. Additional information on text mining is provided cal annotations derived from three main ontologies: molecu in Example 7 of the Examples section and is disclosed in lar function, biological process and cellular component are “Mining Text Using Keyword Distributions. Ronen Feld available from the Gene Ontology Consortium (www.gene man, Ido Dagan, and Haym Hirsh, Proceedings of the 1995 ontology.org). Workshop on Knowledge Discovery in Databases, “Finding 0142. Alternatively a list of homogenized ontological Associations in Collections of Text, Ronen Feldman and nomenclature can be obtained from AcroMed—a computer Haym Hirsh, Machine Learning and Data Mining: Methods generated database of biomedical acronyms and the associ and Applications, edited by R. S. Michalski, I. Bratko, and M. ated long forms extracted from the recent Medline abstracts Kubat, John Wiley & Sons, Ltd., 1997 "Technology Text (http://www.expasy.org/tools/). Mining, Turning Information Into Knowledge: A White Paper 0143 Optionally, various conversion tables which link from IBM,” edited by Daniel Tkach, Feb. 17, 1998, each of Commission number, InterPro protein motifs and which is fully incorporated herein by reference. SwissProt keywords to gene ontology nodes are also avail 0152. It will be appreciated that text mining may be per able from www.geneontology.org and can be used with the formed, in this and other embodiments of the present inven present method. tion, for the text terms extracted from the definitions of gene US 2011/0091454 A1 Apr. 21, 2011

or protein sequence records, retrievable from databases Such which shares identical level of homology compared to an as GenBank and Swiss-Prot and title line, abstract of scien ontologically annotated protein in the cluster is assigned the tific papers, retrievable from Medline database (e.g. http:// same ontological annotation. This procedure progresses from www.ncbi.nlm.nih.gov/PubMed/). the highest level of homology to a lower threshold level with 0153 Computer-dedicated software for biological text a predetermined increment resolution. Newly discovered analysis is available from http://www.expasy.org/tools/. homologies enable assignment of existing ontological anno Examples include, but are not limited to, MedMiner—A soft tations to biomolecular sequences sharing homologous ware system which extracts and organizes relevant sentences sequences and being previously unannotated or partially in the literature based on a gene, gene-gene or gene-drug annotated (see Examples 5-9 of the Examples section). query; Protein Annotator's Assistant—A Software system 0159. Once assignment of an annotation is effected, anno which assists protein annotators in the task of assigning func tated clusters are disassembled resulting in annotation of each tions to newly sequenced proteins; and XplorMed-A Soft biomolecular sequence of the cluster. ware system which explores a set of abstracts derived from a 0160 Such annotated biomolecular sequences are then bibliographic search in MEDLINE. tested for false annotation. This is effected using the follow 0154 Alternatively, assignment of ontological annota ing scoring parameters: tions may be effected by analyzing molecular, cellular and/or 0.161 (i) A degree of homology characterizing the pro functional traits of the biomolecular sequences. Prediction of gressive cluster—accuracy of the annotation directly corre cellular localization may be done using any computer-dedi lates with the homology level used for the annotation process cated software. For example prediction of cellular localiza (see Examples 7-9 and 22 of the Examples section). tion can be done using the ProLoc computational platform 0162 (ii) Relevance of annotation to information obtained Einat Hazkani-Covo, Erez, Levanon, Galit Rotman, Dan from literature text mining—each assigned ontological anno Graur and Amit Novik; (2004) Evolution of multicellularity tation which results from literature text mining or functional in metazoa: comparative analysis of the Subcellular localiza or cellular prediction is assessed using scoring parameters tion of proteins in Saccharomyces, Drosophila and Cae such as LOD score (For further details see Example 7 of the norhabditis. Cell Biology International (in press), which Examples section). predicts protein localization based on various parameters 0163 The present invention also enables the use of the including, protein domains (e.g., prediction of trans-membra homologies identified according to the teachings of the nous regions and localization thereof within the protein), pI, present invention to annotate more sensitively and rapidly a protein length, amino acid composition, homology to pre query sequence. Essentially this involves building a sequence annotated proteins, recognition of sequence patterns which profile for each annotated cluster. A profile enables scoring of direct the protein to a certain organelle (such as, nuclear a biomolecular sequence according to functional domains localization signal, NLS, mitochondria localization signal), along a sequence and generally makes searches more sensi signal peptide and anchor modeling and using unique tive. Essentially, clustered sequences are also tested for rel domains from Pfam that are specific to a single compartment. evance to the cluster based upon shared functional domains 0155. Other examples for cellular localization prediction and other characteristic sequence features. software include PSORT Prediction of protein sorting sig 0164 Ontologically annotated biomolecular sequences nals and localization sites and TargetP Prediction of sub are stored in a database for further use. Additional informa cellular location, both available from http://www.expasy.org/ tion on generation and contents of such databases is provided tools/, See also Example 22. hereinunder. 0156 Prediction of functional annotations may also be effected by motif analysis of the biomolecular sequences of 0.165 Such a database can be used to query functional the present invention. Thus for example, by implementing domains and sequences comprising thereof. Alternatively, the any motif analysis Software, which is based on proteinhomol database can be used to query a sequence, and retrieve the ogy (see for example, http://motif genome.ad.jp/ and http:// compatible annotations. www.accelrys.com/products/grailprofindex.html), it is pos 0166 Although the present methodology can be effected sible to predict functional motifs of DNA sequences using prior art systems modified for Such purposes, due to the including repeats, promoter sequences and CpG islands and large amounts of data processed and the vast amounts of of encoded proteins such as Zinc finger and leucine Zipper. processing needed, the present methodology is preferably Such functional annotations may also be extracted from data effected using a dedicated computational system. bases of protein families, domains and functional sites Such as 0.167 Thus, according to another aspect of the present InterPro (http://www.ebi.ac.uk/interpro/). invention and as illustrated in FIGS. 1a-b, there is provided a 015.7 Functional annotations may also be extracted by system for generating a database of annotated biomolecular adopting annotations from ortholohgous species (i.e., from Sequences. different species) Such as, for example, from viral proteoms. 0168 System 10 includes a processing unit 12, which Viral proteins have evolved to defy the host immune system executes a Software application designed and configured for and as such may provide functional annotations to ortholo annotating biomolecular sequences, as described herein hgous proteins which exhibit sufficient level of homology in above. System 10 further serves for storing biomolecular at least functional domains thereof. As such, such an annota sequence information and annotations in a retrievable/search tion may be, for example, “immune system related’. Detailed able database 18. Database 18 further includes information description of the method which is used to obtain such anno pertaining to database generation. tations is provided in U.S. Pat. Appl. No. 60/480,752. 0169. System 10 may also include a user interface 14 (e.g., 0158 Due to the progressive nature of the clusters of the a keyboard and/or amouse, monitor) for inputting database or present invention, ontology assignment starts at the highest database related information, and for providing database level of homology. Any biomolecular sequence in the cluster, information to a user. US 2011/0091454 A1 Apr. 21, 2011

0170 System 10 of the present invention may be any com ber of biomolecular sequences from different species can puting platform known in the art including but not limited to provide a comprehensive account of sequence, structural and a personal computer, a work station, a mainframe and the like. functional information pertaining to the biomolecular 0171 Preferably, database 18 is stored on a computer sequences of interest. readable media Such as a magnetic optico-magnetic or optical disk. The Hierarchical Annotation Approach 0172 System 10 of the present invention may be used by 0179. “Hierarchical annotation” refers to any ontology a user to query the stored database of annotations and and Subontology, which can be hierarchically ordered. sequence information to retrieve biomolecular sequences Examples include but are not limited to a tissue expression stored therein according to inputted annotations or to retrieve hierarchy, a developmental expression hierarchy, apathologi annotations according to a biomolecular sequence query. cal expression hierarchy, a cellular expression hierarchy, an 0173 It will be appreciated that the connection between intracellular expression hierarchy, a taxonomical hierarchy, a user interface 14 and processing unit 12 is bi-directional. functional hierarchy and so forth. Likewise, processing unit 12 and database 18 also share a 0180 According to another aspect of the present invention two-way communication channel, wherein processing unit there is provided a method of annotating biomolecular 12 may also take input from database 18 in performing anno sequences according to a hierarchy of interest. The method is tations and iterative annotations. Further, user interface 14 is effected as follows. linked directly to database 18, such a user may dispatch 0181 First, a dendrogram representing the hierarchy of queries to database 18 and retrieve information stored therein. interest is computationally constructed. As used herein a As such, user interface 14 allows a user to compile queries, “dendrogram' refers to a branching diagram containing mul send instructions, view querying results and performing spe tiple nodes and representing a hierarchy of categories based cific analyses on the results as needed. on a degree of similarity or number of shared characteristics. 0174. In performing ontological annotations, processing 0182 Each of the multiple nodes of the dendrogram is unit 12 may take input from one or more application modules annotated by at least one keyword describing the node, and 16. Application module 16 performs a specific operation and enabling literature and database text mining, as is further produced a relevant annotative input for processing unit 12. described hereinunder. A list of keywords can be obtained For example, application module 16 may perform cellular from the GO Consortium (www.geneontlogy.org); measures localization analysis on a biomolecular sequence query, are taken to include as many keywords, and to include key thereby determining the cellular localization of the encoded words which might be out of date. For example, for tissue protein. Such a functional annotation is then input to and used annotation (see FIG. 4), a hierarchy was built using all avail by processing unit 12. Examples for application software for able tissue/libraries sources available in the GenBank, while cellular localization prediction are provided hereinabove. considering the following parameters: ignoring GenBank 0175 System 10 of the present invention may also be synonyms, building anatomical hierarchies, enabling flexible connected to one or more external databases 20. External distinction between tissue types (normal versus pathology) database 20 is linked to processing unit 12 in a bi-directional and tissue classification levels (organs, systems, cell types, manner, similar to the connection between database 18 and etc.). processing unit 12. External database 20 may include any 0183. It will be appreciated that the dendrogram of the background information and/or sequence information that present invention can be illustrated as a graph, a list, a map or pertains to the biomolecular sequence query. External data a matrix or any other graphic or textual organization, which base 20 may be a proprietary database or a publicly available can describe a dendrogram. An example of a dendrogram database which is accessible through a public network Such as illustrating the gastrointestinal tissue hierarchy is provided in the Internet. External database 20 may feed relevant informa FIG 2. tion to processing unit 12 as it effects iterative ontological 0184. In a second step, each of the biomolecular sequences annotation. External database 20 may also receive and store is assigned to at least one specific node of the dendrogram. ontological annotations generated by processing unit 12. In 0185. The biomolecular sequences according to this this case external database 20 may interact with other com aspect of the present invention can be annotated biomolecular ponents of system 10 like database 18. sequences, unannotated biomolecular sequences or partially 0176 It will be appreciated that the databases and appli annotated biomolecular sequences. cation modules of system 10 can be directly connected with 0186 Annotated biomolecular sequences can be retrieved processing unit 12 and/or user interface 14 as is illustrated in from pre-existing annotated databases as described herein FIG. 1a, or such a connection can be achieved via a network above. 22, as is illustrated in FIG. 1b. 0187. For example, in GenBank, relevant annotational 0177 Network 22 may be a private network (e.g., a local information is provided in the definition and keyword fields. area network), a secured network, or a public network (Such In this case, classification of the annotated biomolecular as the Internet), or a combination of public and private and/or sequences to the dendrogram nodes is directly effected. A secured networks. search for Suitable annotated biomolecular sequences is per 0.178 Thus, the present invention provides a well-charac formed using a set of keywords which are designed to classify terized approach for the systemic annotation of biomolecular the biomolecular sequences to the hierarchy (i.e., same key sequences. The use of text information analysis, annotation words that populate the dendrogram) scoring system and robust sequence clustering procedure 0188 In cases where the biomolecular sequences are enables, for the first time, the creation of the best possible unannotated or partially annotated, extraction of additional annotations and assignment thereof to a vast number of bio annotational information is effected prior to classification to molecular sequences sharing homologous sequences. The dendrogram nodes. This can be effected by sequence align availability of ontological annotations for a significant num ment, as described hereinabove. Alternatively, annotational US 2011/0091454 A1 Apr. 21, 2011

information can be predicted from structural studies. Where type. For example, differentially expressed sequences of a needed, nucleic acid sequences can be transformed to amino single contig which are annotated as being expressed in sev acid sequences to thereby enable more accurate annotational eral different tissue types of a single specific organ or a prediction. specific system, are also annotated by the present invention to 0189 Finally, each of the assigned biomolecular a higher hierarchy level thus denoting association with the sequences is recursively classified to nodes hierarchically specific organ or system. In Such cases using keywords alone higher than the specific nodes, such that the root node of the would not efficiently identify differentially expressed dendrogram encompasses the full biomolecular sequence set, sequences. Thus for example, a sequence found to be which can be classified according to a certain hierarchy, while expressed in Sarcoma, Ewing sarcoma tumors, pnet, rhab the offspring of any node represent apartitioning of the parent domyosarcoma, liposarcoma and mesenchymal cell tumors, Set. can not be assigned to specific sarcomas, but still can be 0190. For example, a biomolecular sequence found to be annotated as mesenchymal cell tumor specific. Using this specifically expressed in "rhabdomyosarcoma', will be clas hierarchical annotation approach in combination with sified also to a higher hierarchy level, which is "sarcoma, and advanced sequence clustering and assembly algorithms, then to "Mesenchimal cell tumors” and finally to a highest capable of predicting alternative splicing, may facilitate a hierarchy level “Tumor. In another example, a sequence simple and rapid identification of gene expression patterns. found to be differentially expressed in endometrium cells, 0.197 Although the present methodology can be effected will be classified also to a higher hierarchy level, which is using prior art systems modified for Such purposes, due to the “uterus', and then to “women genital system” and to “genital large amounts of data processed and the vast amounts of system” and finally to a highest hierarchy level “genitouri processing needed, the present methodology is preferably nary system'. The retrieval can be performed according to effected using a dedicated computational system. each one of the requested levels. 0198 Such a system is described hereinabove. The system 0191 Since annotation of publicly available databases is includes a processing unit which executes a Software appli at times unreliable, newly annotated biomolecular sequences cation designed and configured for hierarchically annotating are confirmed using computational or laboratory approaches biomolecular sequences as described hereinabove. The sys as is further described hereinbelow. tem further serves for storing biomolecular sequence infor 0.192 It will be appreciated that once temporal or spatial mation and annotations in a retrievable/searchable database. annotations of sequences are established using the teachings of the present invention, it is possible to identify those Annotation of Differentially Expressed Alternatively sequences, which are differentially expressed (i.e., exhibit Spliced Sequences spatial or temporal pattern of expression in diverse cells or 0199 Although numerous methods have been developed tissues). Such sequences are assigned to only a portion of the to identify differentially expressed genes, none of these nodes, which constitute the hierarchical dendrogram. addressed splice variants, which occur in over 50% of human 0193 Changes in gene expression are important determi genes. Given the common sequence features of splice variants nants of normal cellular physiology, including cell cycle it is very difficult to identify splice variants which expression regulation, differentiation and development, and they directly is differential, using prior art methodologies. Therefore contribute to abnormal cellular physiology, including devel assigning unique sequence features to differentially opmental anomalies, aberrant programs of differentiation and expressed splice variants may have an important impact to the cancer. Accordingly, the identification, cloning and charac understanding of disease development and may serve as valu terization of differentially expressed genes can provide rel able markers to various pathologies. evant and important insights into the molecular determinants 0200 Thus, according to another aspect of the present of processes such as growth, development, aging, differentia invention there is provided a method of identifying sequence tion and cancer. Additionally, identification of such genes can features unique to differentially expressed mRNA splice vari be useful in development of new drugs and diagnostic meth ants. The method is effected as follows. ods for treating or preventing the occurrence of such diseases. 0201 First, unique sequence features are computationally 0194 Newly annotated sequences identified according to identified in splice variants of alternatively spliced expressed the present invention are tested under physiological condi Sequences. tions (i.e., temperature, pH, ionic strength, Viscosity, and like 0202 As used herein the phrase “splice variants’ refers to biochemical parameters which are compatible with a viable naturally occurring nucleic acid sequences and proteins organism, and/or which typically exist intracellularly in a encoded therefrom which are products of alternative splicing. viable cultured yeast cell or mammalian cell). This can be Alternative splicing refers to intron inclusion, exon exclu effected using various laboratory approaches such as, for Sion, or any addition or deletion of terminal sequences, which example, FISH analysis, PCR, RT-PCR, real-time PCR, results in sequence dissimilarities between the splice variant Southern blotting, northern blotting, electrophoresis and the sequence and the wild-type sequence. like (see Examples 13-20 and 27 of the Examples section) or 0203 Although most alternatively spliced variants result more elaborate approaches which are detailed in the Back from alternative exon usage, some result from the retention of ground section. introns not spliced-out in the intermediate stage of RNA (0195 It will be appreciated that true involvement of dif transcript processing. ferentially expressed genes in a biological process is better 0204 As used herein the phrase “unique sequence fea confirmed using an appropriate cell or animal model, as fur tures' refers to donor/acceptor concatenations (i.e., exon ther described hereinunder. exonjunctions), intron sequences, alternative exon sequences 0196. The hierarchical annotation approach enables to and alternative polyadenylation sequences. assign an appropriate annotation level even in cases where 0205. Once a unique sequence feature is identified, the expression is not restricted to a specific tissue type or cell expression pattern of the splice variant is determined. If the US 2011/0091454 A1 Apr. 21, 2011

splice variant is differentially expressed then the unique fea 79:344-8. Furthermore, since the genomic regions which ture thereof is annotated accordingly. align with an EST sequence correspond to exon sequences 0206 Alternatively spliced expressed sequences of this and alignment gaps correspond to introns, the putative splice aspect of the present invention, can be retrieved from numer sites at exon/intron boundaries can be confirmed. Because ous publicly available databases. Examples include but are splice donor and acceptor sites primarily reside within the not limited to ASDB an alternative splicing database gen intron sequence, this methodology can provide validation erated using GenBank and Swiss-Prot annotations (http:// which is independent of the EST evidence. Reverse tran cbc.g.nersc.gov/asdb, ASMaml)B-a database of alternative scriptase artifacts or other cDNA synthesis errors may also be splices in human, mouse and rat (http://166.111.30.65/AS filtered out using this approach. Improper inclusion of MAMDB.html). Alternative splicing database—a database genomic sequence in ESTs can also be excluded by requiring of alternative splices from literature (http://cgsigm.cshl.org/ pairs of mutually exclusive splices in different ESTs. new alt exon db2/), Yeast intron database-Database of 0213. Additionally, it will be appreciated that observing a intron in yeast (http://www.cse.ucsc.edu/research/compbiof given splice variant in one EST but not in a second EST may yeast introns.html). The Intronerator—alternative splicing be insufficient, as the latter can be an un-spliced EST rather in C. elegans based on analysis of EST data (http://www.cse. than a biological significant intron inclusion. Therefore mea ucsc.edu/~kent/intronerator), ISIS Intron Sequence Infor Sures are taken to focus on mutually exclusive splice variants, mation System including a section of human alternative two different splice variants observed in different ESTs, splices (http://isis.bit.uq.edu.au/). TAP Transcript Assem which overlap in a genomic sequence. A more stringent fil bly Program result of alternative splicing (http://stl.wustl. tering may be applied by requiring two splice variants to share edu/-zkan/TAP/) and HASDB database of alternative one splice site but differ in another. Another filter which can splices detected in human EST data. be used to identify true splicing events is sequence conserva 0207 Additionally, alternative splicing sequence data uti tion. Essentially, exons and the borders of human introns lized by this aspect of the present invention can be obtained by which are identified in mice genome and/or supported by any of the following bioinformatical approaches. mouse ESTs are considered true splicing events (see Example 0208 (i) Genomically aligned ESTs—the method identi 21 of the Examples section). fies ESTs which come from the same gene and looks for 0214. Once splice variants are identified, identification of differences between them that are consistent with alternative unique sequence features therewithin can be effected compu splicing, such as large insertion or deletion in one EST. Each tationally by identifying insertions, deletions and donor-ac candidate splice variant can be further assessed by aligning ceptor concatenations in ESTs relative to mRNA and prefer the ESTs with respective genomic sequence. This reveals ably genomic sequences. candidate exons (i.e., matches to the genomic sequence) sepa 0215. As mentioned hereinabove, once alternatively rated by candidate splices (i.e., large gaps in the EST-genomic spliced sequences (having unique sequence features) are alignment). Since intronic sequences at splice junctions (i.e., identified, determination of their expression patterns is donor/acceptor concatenations) are highly conserved (essen effected in order to assign an annotation to the unique tially 99.24% of introns have a GT-AG at their 5' and 3' ends, sequence feature thereof. respectively) sequence data can be used to Verify candidate 0216 Expression patternidentification may be effected by splices Burset et al. (2000) Nucleic Acids Res. 28:4364-75 qualifying annotations which are preassociated with the alter LEADS module Shoshan, et al. Proceeding of SPIE (eds. M. natively spliced expressed sequences, as described herein L. Bittner, Y. Chen, A. N. Dorsel, E. D. Dougherty) Vol. 4266, above. This can be accomplished by scoring the annotations. pp. 86-95 (2001). R. Sorek, G. Ast, D. Graur, Genome Res. In For example scoring pathological expression annotations can press; Compugen Ltd. U.S. patent application Ser. No. be effected according to: (i) prevalence of the alternatively 09/133,987. spliced expressed sequences in normal tissues; (ii) prevalence 0209 (ii) Identification based on intron information. The of the alternatively spliced expressed sequences in pathologi method creates a database of individual intron sequences cal tissues; (iii) prevalence of the alternatively spliced annotated in GenBank and utilizes such sequences to search expressed sequence in total tissues; and (iv) number of tissues for EST sequences which include the intronic sequences and/or tissue types expressing the alternatively spliced Croft et al. (2000) Nat. Genet. 24:340-1). expressed sequences. Preferably expression pattern of alter 0210 (iii) EST alignment to expressed sequences—looks natively spliced sequences is determined as described in the for insertions and deletions in ESTs relative to a set of known “Frequency-based annotative approach' section, which fol mRNAs. Such a method enables to uncover alternatively lows. spliced variants with having to align ESTs with genomic 0217. Alternatively, identifying the expression pattern of sequence Brett et al. (2000) FEBS Lett. 474-83-86. the alternatively spliced expressed sequences of the present 0211. It will be appreciated that in order to avoid false invention, is accomplished by detecting the presence of the positive identification of novel splice isoforms, a set of filters unique sequence feature in biological samples. This can be is applied. For example, sequences are filtered to exclude effected by any hybridization-based technique known in the ESTs having sequence deviations. Such as chimerism, ran art, such as northern blot, dot blot, RNase protection assay, dom variation in which a given EST sequence or potential RT-PCR and the like. vector contamination at the ends of an EST. 0218. To this end oligonucleotides probes, which are sub 0212 Filtering can be effected by aligning ESTs with stantially homologous to nucleic acid sequences that flank corresponding genomic sequences. Chimeric ESTs can be and/or extend across the unique sequence features of the easily excluded by requiring that each EST aligns completely alternatively spliced expressed sequences of the present to a single genomic locus. Genomic location found by homol invention are generated. ogy search and alignment can often be checked against radia 0219 Preferably, oligonucleotides which are capable of tion hybrid mapping data Muneer et al (2002) Genomic hybridizing under Stringent, moderate or mild conditions, as US 2011/0091454 A1 Apr. 21, 2011 used in any polynucleotide hybridization assay are utilized. 0226. Thus, identifying splice variants having unique Further description of hybridization conditions is provided sequence features enables annotation and thus identification hereinunder. of functionally altered variants. 0220 Oligonucleotides generated by the teachings of the 0227. Identification of putative functionally altered splice present invention may be used in any modification of nucleic variants, according to this aspect of the present invention, can acid hybridization based techniques, which are further be effected by identifying sequence deviations from func detailed hereinunder. General features of oligonucleotide tional domains of wild-type gene products. synthesis and modifications are also provided hereinunder. 0228 Identification of functional domains can be effected 0221 Aside from being useful in identifying specific by comparing a wild-type gene product with a series of pro splice variants, oligonucleotides generated according to the files prepared by alignment of well characterized proteins teachings of the present invention may also be widely used as from a number of different species. This generates a consen diagnostic, prognostic and therapeutic agents in a variety of sus profile, which can then be matched with the query disorders which are associated with the polynucleotides of sequence. Examples of programs Suitable for Such identifica the present invention (e.g., specific splice variants). tion include, but are not limited to, InterPro Scan Integrated 0222 For example, regulation of splicing is involved in search in PROSITE. Pfam, PRINTS and other family and 15% of genetic diseases Krawzczak et al. (1992) Hum. domain databases; Scanprosite—Scans a sequence against Genet. 90:41-54 and may contribute for example to cancer PROSITE or a pattern against SWISS-PROT and TrEMBL: mis-splicing of exon 18 in BRCA1, which is caused by a MotifScan Scans a sequence against protein profile data polymorphism in an exonic enhancer Liu et al. (2001) Nature bases (including PROSITE); Frame-ProfileScan Scans a Genet. 27:55-58. short DNA sequence against protein profile databases (in 0223 Thus, oligonucleotides generated according to the cluding PROSITE); Pfam HMM search—scans a sequence teachings of the present invention can be included in diagnos against the Pfam protein families database; FingerPRINT tic kits. Such kits, may include oligonucleotides which are Scan Scans a protein sequence against the PRINTS Protein directed to the newly uncovered splice variant alone and also Fingerprint Database: FPAT Regular expression searches to previously uncovered splice variants or wild-type (w.t) in protein databases; PRATT Interactively generates con sequences of the same gene which were previously associated served patterns from a series of unaligned proteins; with a disease of interest. For example, oligonucleotides sets PPSEARCH Scans a sequence against PROSITE (allows a pertaining to a specific disease associated with differential graphical output); at EBI; PROSITE scan Scans a sequence expression of an alternatively spliced transcript can be pack against PROSITE (allows mismatches); at PBIL: PATTIN aged in a one or more containers with appropriate buffers and PROT Scans a protein sequence or a protein database for preservatives along with Suitable instructions for use and used one or several pattern(s); at PBIL: SMART Simple Modu for diagnosis or for directing therapeutic treatment. Addi lar Architecture Research Tool; at EMBL, TEIRESIAS - tional information on Such diagnostic kits is provided here Generate patterns from a collection of unaligned protein or inunder. DNA sequences; at IBM, all available from http://www.ex 0224. It will be appreciated that an ability to identify alter pasy.org/tools/. natively spliced sequences, also facilitates identification of 0229. It will be appreciated that functionally altered splice the various products of alternative splicing. variants may also include a sequence alteration at a post 0225 Recent studies indicate that most alternative splic translation modification consensus site. Such as, for example, ing events result in an altered protein product International a tyrosine Sulfation site, a glycosylation site, etc. Examples of human genome sequencing consortium (2001) Nature 409: post-translational modification prediction programs include 860-921; Modrek et al. (2001) Nucleic Acids Res. 29:2850 but are not limited to: SignalP Prediction of signal peptide 2859. The majority of these changes appear to have a func cleavage sites; ChloroP Prediction of chloroplast transit tional relevance (i.e., up-regulating or down-regulating peptides: MITOPROT Prediction of mitochondrial target activity, “gain of function' or “loss of function, respectively. ing sequences; Predotar—Prediction of mitochondrial and See terminology section). Such as the replacement of the plastid targeting sequences; NetOGlyc Prediction of type amino or carboxyl terminus, or in-frame addition and O-glycosylation sites in mammalian proteins; DictyCGlyc— removal of a functional domain. For example, alternative Prediction of GlcNAc O-glycosylation sites in Dictyostel splicing can lead to the use of a different site for translation ium; YinCYang O-beta-GlcNAc attachment sites in initiation (i.e., alternative initiation), a different translation eukaryotic protein sequences; big-PI Predictor GPI Modi termination site due to a frameshift (i.e., truncation or exten fication Site Prediction; DGPI Prediction of GPI-anchor sion), or the addition or removal of a stop codon in the alter and cleavage sites (Mirror site); NetPhos—Prediction of native coding sequence (i.e., alternative termination). Addi Serine. Threonine and Tyrosine phosphorylation sites in tionally, alternative splicing can change an internal sequence eukaryotic proteins; NetPicoRNA Prediction of protease region due to an in-frame insertion or deletion. One example cleavage sites in picornaviral proteins; NMT Prediction of of the latter is the new FC receptor B-like protein, whose N-terminal N-myristoylation; Sulfinator Prediction of C-terminal transmembrane domain and cytoplasmic tail, tyrosine sulfation sites all available from http://www.expasy. which is important for signal transduction in this class of org/tools/. receptors, is replaced with a new transmembrane domain and 0230. Once putative functionally altered splice variants tail by alternative polyadenylation. Another example is the are identified, they are validated by experimental verification truncated Growth Hormone Receptor, which lacks most of its and functional studies, using methodologies well known in intracellular domain and has been shown to heterodimerize the art. with the full-length receptor, thus causing inhibition of Sig 0231. The Examples section which follows illustrates naling by Growth Hormone Ross, R. J. M. Growth hormone identification and annotation of splice variants. Identified and & IGF Research, 9:42-46, (1999). annotated sequences are contained within the enclosed US 2011/0091454 A1 Apr. 21, 2011

CD-ROMs 1-4. Some of these sequences represent (i.e., are can be used in accordance with the present invention include, transcribed from) entirely new splice variants, while others but are not limited to, chi square, Fisher's exact test, phi, represent new splice variants of known sequences. In any Yule's Q, Lambda and Taub. Preferably, to calculate an exact case, the sequences contained in the enclosed CD-ROMs are p-value for a two by two frequency table with a small number novel in that they include previously undisclosed sequence of expected frequencies, Fisher's exact test is used. regions in the context of a known gene or an entirely new 0243 Genes exhibiting differential pattern of expression sequence in the context of an unknown gene. uncovered using the methodology of the present invention can be efficiently utilized as tissue markers and as putative Frequency-Based Annotative Approach drug targets. 0232. The present invention also contemplates spatial and 0244 As mentioned above, alternatively spliced tran temporal gene annotations through comparing relative abun Scripts may be extremely useful as cancer markers and dance in libraries of different origins. draugs, since it appears likely that there may be striking 0233. Thus, according to still another aspect of the present contrasts in usage of alternatively spliced transcript variants invention there is provided a method of comparing an expres between normal and tumor tissue in alterations in the general sion level of a gene of interest in at least two types of tissues. levels of gene expression Caballero Dis Markers. (2001): 17 0234. As used herein the phrase “at least two types of (2):67-75). tissues” refers to tissues of different developmental origin, 0245. For example, members of the CD44 family of cell different pathological origin or different cellular composi Surface hyaluronate-binding proteins have been implicated in tion. cell migration, cell-matrix interactions and tumor progres 0235. The method is effected by obtaining a contig Sion. Interestingly, normal spinal nerves and primary assembled from a plurality of expressed sequences (e.g., Schwann cell cultures express standard CD44 (CD44s) but ESTs, mRNAs) representing the gene of interest; and com not alternatively spliced variant isoforms. In contrast, paring the number of the plurality of expressed sequences Schwann cell tumors express both “wild-type” CD44 and a corresponding to the contig, which are expressed in each of number of variants. Implicating a role for CD44 splice vari the at least two tissue types, to thereby compare the expres ants in cancer and as such in the development of potent sion level of the gene of interest in the at least two tissue types. diagnostic and therapeutic tools. 0236 Expressed sequences for generating the contig of 0246 Thus, the present invention also envisages compar this aspect of the present invention can be retrieved from ing an expression level of at least two splice variants of a gene pre-existing publicly available databases or generated as of interest in a tissue. The method is effected by: described in the “ontological annotation approach' section 0247 Obtaining a contig including exonal sequence pre hereinabove. sentation of the at least two splice variants of the gene of 0237. A number of sequence assembly software are interest, the contig being assembled from a plurality of known in the art, which can be used to generate the contig of expressed sequences; the gene of interest. Such software are described in the “onto 0248 Identifying at least one contig sequence region logical annotation approach' section hereinabove. unique to a portion (i.e., at least one and not all) of the at least 0238 Alternatively, the contig of this aspect of the present two splice variants of the gene of interest. Identification of invention can be obtained from pre-existing publicly avail Such unique sequence region is effected using computer able databases. Examples include, but are not limited to, the alignment software such as described hereinabove. TIGR database (www.tigr.org), the SANBI database (http:// 0249 Comparing a number of the plurality of expressed www.za.embnet.org/), the SIB database which generates con sequences in the tissue having the at least one contig sequence tig sequence information from Unigene clusters, the MIPS region with a number of the plurality of expressed sequences database (http://mips.gsfide?) and the DoTS database (http:// not-having the at least one contig sequence region, to thereby www.allgenes.org/). compare the expression level of the at least two splice variants 0239. It will be appreciated that the contig according to of the gene of interest in the tissue. this aspect of the present invention can be composed of a 0250 One configuration of the above-described method plurality of expressed sequences, which present partial or ology is described in details in Example 23c of the Examples complete exonal coverage of the gene of interest. section which follows. 0240 Prior to, concomitant with or following contig 0251 Biomolecular sequences (i.e., nucleic acid and assembly, expressed sequences are filtered to exclude polypeptide sequences) uncovered using the above-described sequences of poor quality (i.e., vector contaminants, low methodology are annotated using the teachings of the present complexity sequences, sequences which originate from Small invention. Thus, for example, the hierarchical annotation libraries e.g., Smaller than 1000 sequences), and to score true approach can be used to assigna differentially expressed gene expression in the at least two types of tissues. product to higher hierarchies. For example, gene products 0241 Expressed sequences, which originate from samples identified by the “Frequency-based annotative approach” wherein clone frequency reflects mRNA abundance are engine as being overexpresed in prostate tumor, lung tumor, highly scored. Thus expressed sequences from “non-normal head and neck tumor, stomach tumor, colon tumor, mammary ized expression libraries are highly scored, while expressed tumor, kidney tumor, ovary tumor, uterus/cervix tumor, thy sequences from “normalized’ libraries are poorly scored. roid tumor, adrenal tumor, pancreas tumor, liver tumor and Such scoring rules are described in details in Example 23 of skin tumor might also be specific to other types of epithelial the Examples section which follows. tumors. Gene products identified by the engine as being over 0242 Comparing the number of the plurality of expressed expressed in bone and muscle tumors might also be specific to sequences corresponding to the contig which are expressed in other types of sarcomas. Gene products identified by the each of the at least two tissue types is preferably effected by engine as being overexpressed in bone marrow tumor, blood statistical pairing analysis. Examples of statistical tests which cancer, T cell tumor and lymph nodes tumor may also be US 2011/0091454 A1 Apr. 21, 2011 specific to other types of blood cancers. Sequence data uncov response to the same antigen (has been proved in mice models ered by the above described methodologies and correspond for Myasthenia Gravis and type I Diabetes). ing annotative data are stored in a database for future use (see, 0257. In addition, any novel variant of autoantigens, not for example, files “Transcripts nucleotide seqs part1. necessarily secreted, may be used for 'specific immunoad “Transcripts nucleotide seqs part2”, “Transcripts nucle Sorption' leading to a specific immunodepletion of anti otide seqs part3. “Transcripts nucleotide seqs part4. bodies when used in immunoadsorption columns. “protein seqs”, “PropG seqs”, “Summary table', 'Annota 0258. It will be appreciated that splice variants of autoan tions.gz”. “Transcripts.gz’, and “Proteins.gz' of the enclosed tigens may also have diagnostic value. The diagnosis of many CD-ROMs 1-4). autoimmune disorders is based on looking for specific 0252. As mentioned hereinabove, biomolecular autoantibodies to autoantigens known to be associated with a sequences uncovered using the methodology of the present autoimmune condition. Most of the diagnostic techniques are invention can be efficiently utilized as tissue or pathological based on having a recombinant form of the autoantigen and markers and as putative drugs or drug targets for treating or using it to look for serum autoantibodies. It is possible that preventing the disease. what is considered an autoantigen is not the “true’ autoanti 0253) Some examples are summarized infra: gen but rather a variant thereof. For example, TPO is a known 0254 For example, gene products (nucleic acid and/or autoantigen in thyroid autoimmunity. It has been shown that protein products), which exhibit tumor specific expression its variant TPOzanelli also take part in the autoimmune pro (i.e., tumor associated antigens, TAAS) can be utilized for cess and can bind the same antibodies as TPO Biochemistry. in-vitrogeneration of antibodies and/or for in-vivo immuni 2001 Feb. 27: 40(8):2572-9). Antibodies formed against the Zation/cancer vaccination, essentially eliciting an immune true autoantigen may bind to other variants of the same gene response against Such gene products and cells expressing due to sequence overlap but with reduced affinity. Novel same (see e.g., U.S. Pat. No. 4,235,877, Vaccine preparation splice variant of the genes in Table 15 may be revealed as true is generally described in, for example, M. F. Powell and M.J. autoantigens, therefore their use for detection of autoantibod Newman, eds., "Vaccine Design (the Subunit and adjuvant ies is expected to result in a more sensitive and specific test. approach). Plenum Press (NY, 1995); Other references 0259 Apart of clinical applications, the biomolecular describing adjuvants, delivery vehicles and immunization in sequences of the present invention can find other commercial general include Rolland, Crit. Rev. Therap. Drug Carrier Sys uses such as in the food, agricultural, electro-mechanical, tems 15:143-198, 1998; Fisher-Hochet al., Proc. Natl. Acad. optical and cosmetic industries http://www.physics.unc.edu/ Sci. USA 86:317-321, 1989; Flexner et al., Ann. N.Y. Acad. ~rsuper/XYZweb/XYZchipbiomotors.rs1.doc. http://www. Sci. 569:86-103, 1989: Flexner et al., Vaccine 8:17-21, 1990; bio.org/er/industrial.asp. For example, newly uncovered U.S. Pat. Nos. 4,603, 112, 4,769,330, and 5,017,487; WO gene products, which can disintegrate connective tissues, can 89/01973: U.S. Pat. No. 4,777,127; GB 2,200,651; EP 0,345, be used as potent anti Scarring agents for cosmetic purposes. 242: WO 91/02805; Berkner, Biotechniques 6:616-627, Other applications include, but are not limited to, the making 1988: Rosenfeld et al., Science 252:431-434, 1991: Kolls et of gels, emulsions, foams and various specific products, al., Proc. Natl. Acad. Sci. USA 91:215-219, 1994; Kass including photographic films, tissue replacers and adhesives, Eisler et al., Proc. Natl. Acad. Sci. USA 90:11498-11502, food and animal feed, detergents, textiles, paper and pulp, and 1993; Guzman et al., Circulation 88:2838-2848, 1993; and chemicals manufacturing (commodity and fine, e.g., bioplas Guzman et al., Cir. Res. 73:1202-1207, 1993; Ulmer et al., tics). Science 259:1745-1749, 1993: Cohen, Science 259:1691 0260 The nucleic acid sequences of the invention can be 1692, 1993; U.S. Pat. Nos. 4,436,727; 4,877,611; 4.866,034 "isolated or “purified.” In the event the nucleic acid is and 4,912,094; U.S. Pat. Nos. 6,008,200 and 5,856,462: genomic DNA, it is considered "isolated when it does not Zitvogel et al., Nature Med. 4:594-600, 19980. include coding sequence(s) of a gene or genes immediately 0255. The tumor-specific gene products of the present adjacent thereto in the naturally occurring genome of an invention, in particular membrane bound, can be utilized as organism; although some or all of the 5' or 3' non-coding targeting molecules for binding therapeutic toxins, antibodies sequence of an adjacent gene can be included. For example, and Small molecules, to thereby specifically target the tumor an isolated nucleic acid (DNA or RNA) can include some or cell. Alternatively, neoplastic properties of the tumor-specific all of the 5' or 3' non-coding sequence that flanks the coding tumor specific gene products (nucleic acid and/or protein sequence (e.g., the DNA sequence that is transcribed into, or products) of the present invention, may be beneficially used in the RNA sequence that gives rise to, the promoter or an the promotion of wound healing and neovascularization in enhancer in the mRNA). For example, an isolated nucleic acid ischemic conditions and diabetes. can contain less than about 5 kb (e.g., less than about 4 kb, 3 0256 Secreted splice variants of known autoantigens kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb) of the 5' and/or 3' sequence associated with a specific autoimmune syndrome, such as for that naturally flanks the nucleic acid molecule in a cell in example, those listed in Table 15, below, can be used to treat which the nucleic acid naturally occurs. In the event the Such syndromes. Typically, autoimmune disorders are char nucleic acid is RNA or mRNA, it is “isolated” or “purified” acterized by a number of different autoimmune manifesta from a natural Source (e.g., a tissue) or a cell culture when it tions (e.g., multiple endocrine syndromes). For these reasons is substantially free of the cellular components with which it secreted variants may be used to treat any combination of naturally associates in the celland, if the cell was cultured, the autoimmune phenomena of a disease as detailed in Table 15, cellular components and medium in which the cell was cul below. The therapeutic effect of these splice variants may be tured (e.g., when the RNA or mRNA is in a form that contains a result of (i) competing with autoantigens for binding with less than about 20%, 10%, 5%, 1%, or less, of other cellular autoantibodies; (ii) antigen-specific immunotherapy, essen components or culture medium). When chemically synthe tially suggesting that systemic administration of a protein sized, a nucleic acid (DNA or RNA) is "isolated' or “purified’ antigen can inhibit the Subsequent generation of the immune when it is substantially free of the chemical precursors or US 2011/0091454 A1 Apr. 21, 2011

other chemicals used in its synthesis (e.g., when the nucleic naturally occurring sequences. For example, the variant DNA acid is in a form that contains less than about 20%, 10%, 5%, sequences of the invention can be incorporated into a vector, 1%, or less, of the chemical precursors or other chemicals). into the genomic DNA of a prokaryote or eukaryote, or made 0261 Variants, fragments, and other mutant nucleic acids part of a hybrid gene. Moreover, variants (or, where appro are also envisaged by the present invention. As noted above, priate, the proteins they encode) can be used in the diagnostic where a given biomolecular sequence represents a new gene assays and therapeutic regimes described below. (rather than a new splice variant of a known gene), the nucleic 0266 The sequence of nucleic acids of the invention can acids of the invention include the corresponding genomic also be varied to maximize expression in a particular expres DNA and RNA. Accordingly, where a given SEQID repre sion system. For example, as few as one and as many as about sents a new gene, variations or mutations can occur not only 20% of the codons in a given sequence can be altered to in that nucleic acid sequence, but in the coding regions, the optimize expression in bacterial cells (e.g., E. coli), yeast, non-coding regions, or both, of the genomic DNA or RNA human, insect, or other cell types (e.g., CHO cells). from which it was made. 0267. The nucleic acids of the invention can also be 0262 The nucleic acids of the invention can be double shorter or longer than those disclosed on CD-ROMs 1, 2 and Stranded or single-stranded and can, therefore, either be a 4. Where the nucleic acids of the invention encode proteins, sense strand, an antisense strand, or a portion (i.e., a frag the protein-encoding sequences can differ from those repre ment) of either the sense or the antisense strand. The nucleic sented by specific sequences of file “Protein. seqs' in CD acids of the invention can be synthesized using standard ROM 2 and “Proteins.gz” in CD-ROM4. For example, the nucleotides or nucleotide analogs orderivatives (e.g., inosine, encoded proteins can be shorter or longer than those encoded phosphorothioate, or acridine substituted nucleotides), which by one of the nucleic acid sequences of the present invention. can alter the nucleic acid's ability to pair with complementary Nucleotides can be deleted from, or added to, either or both sequences or to resist nucleases. Indeed, the stability or solu ends of the nucleic acid sequences of the present invention or bility of a nucleic acid can be altered (e.g., improved) by the novel portions of the sequences that represent new splice modifying the nucleic acid's base moiety, Sugar moiety, or variants. Alternatively, the nucleic acids can encode proteins phosphate backbone. For example, the nucleic acids of the in which one or more amino acid residues have been added to, invention can be modified as taught by Toulmé Nature Bio or deleted from, one or more sequence positions within the tech. 19:17, (2001) or Faria et al. Nature Biotech. 19:40-44, nucleic acid sequences. (2001), and the deoxyribose phosphate backbone of nucleic 0268. The nucleic acid fragments can be short (e.g., 15-30 acids can be modified to generate peptide nucleic acids nucleotides). For example, in cases where peptides are to be PNAs; see Hyrup et al., (1996) Bioorganic & Medicinal expressed therefrom Such polynucleotides need only contain Chemistry 4:5-23. a Sufficient number of nucleotides to encode novel antigenic 0263 PNAS are nucleic acid “mimics': the molecule's epitopes. In cases where nucleic acid fragments serve as DNA natural backbone is replaced by a pseudopeptide backbone or RNA probes or PCR primers, fragments are selected of a and only the four nucleotide bases are retained. This allows length Sufficient for specific binding to one of the sequences specific hybridization to DNA and RNA under conditions of representing a novel gene or a unique portion of a novel splice low ionic strength. PNAS can be synthesized using standard variant. Solid phase peptide synthesis protocols as described, for 0269 Nucleic acids used as probes or primers are often example by Hyrup et al. (supra) and Perry-O'Keefe et al. referred to as oligonucleotides, and they can hybridize with a Proc. Natl. Acad. Sci. USA (1996) 93: 14670-675). PNAs of sense orantisense strand of DNA or RNA. Nucleic acids that the nucleic acids described herein can be used in therapeutic hybridize to a sense Strand (i.e., a nucleic acid sequence that and diagnostic applications. encodes protein, e.g., the coding strand of a double-stranded 0264. Moreover, the nucleic acids of the invention include cDNA molecule) or to an mRNA sequence are referred to as not only protein-encoding nucleic acids perse (e.g., coding antisense oligonucleotides. Oligonucleotides which specifi sequences produced by the polymerase chain reaction (PCR) cally hybridize with the troponin variants of the present or following treatment of DNA with an endonuclease), but invention (SEQID NOs: 74,76,78,80, 82,84 and 66) and not also, for example, recombinant DNA that is: (a) incorporated with wild-type tropoinin are preferably directed at the unique into a vector (e.g., an autonomously replicating plasmid or nucleic acid sequence set forth in SEQID NO: 87. Alterna virus), (b) incorporated into the genomic DNA of a prokary tively, such oligonucleotides can be directed at a nucleic acid ote or eukaryote, or (c) part of a hybrid gene that encodes an sequence which bridges the unique sequence with common additional polypeptide sequence (i.e., a sequence that is het upstream or downstream sequences (see FIG. 21). erologous to the nucleic acid sequences of the present inven 0270 Antisense oligonucleotides can be used to specifi tion or fragments, other mutants, or variants thereof). cally inhibit transcription of any of the nucleic acid sequences 0265. The present invention includes naturally occurring of the present invention. sequences of the nucleic acid sequences described above, 0271 Design of antisense molecules must be effected allelic variants (same locus; functional or non-functional), while considering two aspects important to the antisense homologs (different locus), and orthologs (different organ approach. The first aspect is delivery of the oligonucleotide ism) as well as degenerate variants of those sequences and into the cytoplasm of the appropriate cells, while the second fragments thereof. The degeneracy of the genetic code is well aspect is design of an oligonucleotide which specifically known, and one of ordinary skill in the art will be able to make binds the designated mRNA within cells in a way which nucleotide sequences that differ from the nucleic acid inhibits translation thereof. sequences of the present invention but nevertheless encode 0272. The prior art teaches of a number of delivery strat the same proteins as those encoded by the nucleic acid egies which can be used to efficiently deliveroligonucleotides sequences of the present invention. The variant sequences into a wide variety of cell types see, for example, Luft (1998) (e.g., degenerate variants) can be used in the same manner as JMol Med 76(2): 75-6; Kronenwettet al. (1998) Blood 91(3): US 2011/0091454 A1 Apr. 21, 2011

852-62; Rajur et al. (1997) Bioconjug Chem 8(6): 935-40: represent new splice variants. These ribozymes can include a Lavigne et al. (1997) Biochem Biophys Res Commun237(3): catalytic sequence encoding a protein that cleaves mRNA 566-71 and Aoki et al. (1997) Biochem Biophys Res Com see U.S. Pat. No. 5,093,246 or Haselhoff and Gerlach, mun 231(3): 540-5). Nature 334:585-591, (1988). For example, a derivative of a 0273. In addition, algorithms for identifying those tetrahymena L-19 IVS RNA can be constructed in which the sequences with the highest predicted binding affinity for their nucleotide sequence of the active site is complementary to the target mRNA based on a thermodynamic cycle that accounts nucleotide sequence to be cleaved in an mRNA of the inven for the energetics of structural alterations in both the target tion (e.g., one of the nucleic acid sequences of the present mRNA and the oligonucleotide are also available see, for invention; see, U.S. Pat. Nos. 4,987,071 and 5,116,742). example, Walton et al. (1999) Biotechnol Bioeng 65(1): 1-9. Alternatively, the mRNA sequences of the present invention 0274. Such algorithms have been successfully used to can be used to select a catalytic RNA having a specific ribo implement an antisense approach in cells. For example, the nuclease activity from a pool of RNA molecules see, e.g., algorithm developed by Walton et al. enabled scientists to Bartel and Szostak, Science 261:141 1-1418, (1993); see also Successfully design antisense oligonucleotides for rabbit Krolet al., Bio-Techniques 6:958-976, (1988). beta-globin (RBG) and mouse tumor necrosis factor-alpha 0281 Alternatively, small interfering RNA oligonucle (TNF-C) transcripts. The same research group has more otides can be used to specifically inhibit transcription of any recently reported that the antisense activity of rationally of the nucleic acid sequences of the present invention. RNA selected oligonucleotides against three model target mRNAS interference is a two step process. the first step, which is (human lactate dehydrogenase A and B and rat gp130) in cell termed as the initiation step, input dsRNA is digested into culture as evaluated by a kinetic PCR technique proved effec 21-23 nucleotide (nt) small interfering RNAs (siRNA), prob tive in almost all cases, including tests against three different ably by the action of Dicer, a member of the RNase III family targets in two cell types with phosphodiester and phospho of dsRNA-specific ribonucleases, which processes (cleaves) rothioate oligonucleotide chemistries. dsRNA (introduced directly or via a transgene or a virus) in an 0275. In addition, several approaches for designing and ATP-dependent manner. Successive cleavage events degrade predicting efficiency of specific oligonucleotides using an in the RNA to 19-21 bp duplexes (siRNA), each with 2-nucle vitro system were also published (Matveeva et al. (1998) otide 3' overhangs Hutvagner and Zamore Curr. Opin. Nature Biotechnology 16, 1374-1375). Genetics and Development 12:225-232 (2002); and Bern 0276. Several clinical trials have demonstrated safety, fea stein Nature 409:363-366 (2001). sibility and activity of antisense oligonucleotides. For 0282. In the effector step, the siRNA duplexes bind to a example, antisense oligonucleotides Suitable for the treat nuclease complex to from the RNA-induced silencing com ment of cancer have been successfully used (Holmund et al. plex (RISC). An ATP-dependent unwinding of the siRNA (1999) Curr Opin Mol Ther 1(3):372-85), while treatment of duplex is required for activation of the RISC. The active RISC hematological malignancies via antisense oligonucleotides then targets the homologous transcript by base pairing inter targeting c-myb gene, p53 and Bcl-2 had entered clinical actions and cleaves the mRNA into 12 nucleotide fragments trials and had been shown to be tolerated by patients Gerwitz from the 3' terminus of the siRNA Hutvagner and Zamore (1999) Curr Opin Mol Ther 1(3):297-306). Curr. Opin. Genetics and Development 12:225-232 (2002); 0277 More recently, antisense-mediated suppression of Hammondet al. (2001) Nat. Rev. Gen. 2:110-119 (2001); and human heparanase gene expression has been reported to Sharp Genes. Dev. 15:485-90 (2001). Although the mecha inhibit pleural dissemination of human cancer cells in a nism of cleavage is still to be elucidated, research indicates mouse model Uno etal. (2001) Cancer Res 61 (21):7855-60. that each RISC contains a single siRNA and an RNase 0278 Thus, the current consensus is that recent develop Hutvagner and Zamore Curr. Opin. Genetics and Develop ments in the field of antisense technology which, as described ment 12:225-232 (2002). above, have led to the generation of highly accurate antisense 0283 Because of the remarkable potency of RNAi, an design algorithms and a wide variety of oligonucleotide amplification step within the RNAi pathway has been sug delivery systems, enable an ordinarily skilled artisan to gested. Amplification could occur by copying of the input design and implement antisense approaches Suitable for dsRNAs which would generate more siRNAs, or by replica downregulating expression of known sequences without hav tion of the siRNAs formed. Alternatively or additionally, ing to resort to undue trial and error experimentation. amplification could be effected by multiple turnover events of 0279 Antisense oligonucleotides can also be C.-anomeric the RISC Hammond et al. Nat. Rev. Gen. 2:110-119 (2001), nucleic acids, which form specific double-stranded hybrids Sharp Genes. Dev. 15:485-90 (2001); Hutvagner and Zamore with complementary RNA in which, contrary to the usual Curr. Opin. Genetics and Development 12:225-232 (2002). b-units, the strands run parallel to each other Gaultier et al., For more information on RNAi see the following reviews Nucleic Acids Res. 15:6625-6641, (1987). Alternatively, TuschlChemBiochem. 2:239-245 (2001); Cullen Nat. Immu antisense nucleic acids can comprise a 2'-O-methylribonucle nol. 3:597-599 (2002); and Brantl Biochem. Biophys. Act. otide Inoue et al., Nucleic Acids Res. 15:6131-6148, (1987) 1575:15-25 (2002). or a chimeric RNA-DNA analogue Inoue et al., FEBS Lett. 0284. Synthesis of RNAi molecules suitable for use with 215:327-330, (1987). the present invention can be effected as follows. First, the an 0280. The nucleic acid sequences described above can mRNA sequence of interest is scanned downstream of the also include ribozymes catalytic sequences. Such a ribozyme AUG start codon for AA dinucleotide sequences. Occurrence will have specificity for a protein encoded by the novel of each AA and the 3' adjacent 19 nucleotides is recorded as nucleic acids described herein (by virtue of having one or potential siRNA target sites. Preferably, siRNA target sites more sequences that are complementary to the cDNAs that are selected from the open reading frame, as untranslated represent novel genes or the novel portions (i.e., the portions regions (UTRs) are richer in regulatory protein binding sites. not found in related splice variants) of the sequences that UTR-binding proteins and/or translation initiation complexes US 2011/0091454 A1 Apr. 21, 2011

may interfere with binding of the siRNA endonuclease com are amplified; e.g., RT-PCR), colony hybridization methods, plex Tuschl ChemBiochem. 2:239-245. It will be appreci in situ hybridization, and the like. ated though, that siRNAS directed at untranslated regions 0291. The present invention also encompasses pairs of may also be effective, as demonstrated for GAPDH wherein oligonucleotides (these can be used, for example, to amplify siRNA directed at the 5' UTR mediated about 90% decrease the new genes, orportions thereof, or the novel portions of the in cellular GAPDH mRNA and completely abolished protein splice variant in, for example, potentially diseased tissue) and level (www.ambion.com/techlib/tn/91/912.html). groups of oligonucleotides (e.g., groups that exhibit a certain 0285) Second, potential target sites are compared to an degree of homology (e.g., nucleic acids that are 90% identical appropriate genomic database (e.g., human, mouse, rat etc.) to one another) or that share one or more functional using any sequence alignment software, such as the BLAST attributes). software available from the NCBI server (www.ncbi.nlm.nih. 0292. When used, for example, as probes, the nucleic gov/BLAST/). Putative target sites which exhibit significant acids of the invention can be labeled with a radioactive iso homology to other coding sequences are filtered out. tope (e.g., using polynucleotide kinase to add P-labeled 0286 Qualifying target sequences are selected as template ATP to the oligonucleotide used as the probe) or an enzyme. for siRNA synthesis. Preferred sequences are those including Other labels, such as chemiluminescent, fluorescent, or colo low G/C content as these have proven to be more effective in rimetric, labels can be used. mediating gene silencing as compared to those with G/C 0293 As noted above, the invention features nucleic acids content higher than 55%. Several target sites are preferably that are complementary to those represented by the nucleic selected along the length of the target gene for evaluation. For acid sequences of the present invention or novel portions better evaluation of the selected siRNAs, a negative control is thereof (i.e., novel fragments) and as such are capable of preferably used in conjunction. Negative control siRNA pref hybridizing therewith. In many cases, nucleic acids that are erably include the same nucleotide composition as the siR used as probes or primers are absolutely or completely NAS but lack significant homology to the genome. Thus, a complementary to all, or a portion of the target sequence. scrambled nucleotide sequence of the siRNA is preferably However, this is not always necessary. The sequence of a used, provided it does not display any significant homology to useful probe or primer can differ from that of a target any other gene. sequence so long as it hybridizes with the target under the 0287 DNAZyme molecules can also be used to specifi stringency conditions described herein (or the conditions rou cally inhibit transcription of any of the nucleic acid sequences tinely used to amplify sequences by PCR) to form a stable of the present invention. duplex. 0288 DNAZyme molecules are capable of specifically 0294) Hybridization of a nucleic acid probe to sequences cleaving an mRNA transcript or DNA sequence of interest. in a library or other sample of nucleic acids is typically DNAZymes are single-stranded polynucleotides which are performed under moderate to high Stringency conditions. capable of cleaving both single and double stranded target Nucleic acid duplex or hybrid stability is expressed as the sequences (Breaker, R. R. and Joyce, G. Chemistry and Biol melting temperature (Tm), which is the temperature at which ogy 1995: 2:655; Santoro, S. W. & Joyce, G. F. Proc. Natl. a probe dissociates from a target DNA and, therefore, helps Acad. Sci. USA 1997: 943:4262) A general model (the “10 define the required stringency conditions. To identify 23” model) for the DNAZyme has been proposed. “10-23” sequences that are related or substantially identical to that of DNAZymes have a catalytic domain of 15 deoxyribonucle a probe, it is useful to first establish the lowest temperature at otides, flanked by two Substrate-recognition domains of which only homologous hybridization occurs with a particu seven to nine deoxyribonucleotides each. This type of lar concentration of salt (e.g., SSC or SSPE). (The terms DNAZyme can effectively cleave its substrate RNA at purine: “identity” or “identical” as used herein are equated with the pyrimidine junctions (Santoro, S. W. & Joyce, G. F. Proc. terms "homology' or "homologous'). Then, assuming a 1% Natl. Acad. Sci. USA 199; for rev of DNAZymes see Khachi mismatch requires a 1° C. decrease in the Tm, the temperature gian, L. M. Curr Opin Mol Ther 4:1 19-21 (2002). of the wash (e.g., the final wash) following the hybridization 0289 Examples of construction and amplification of syn reaction is reduced accordingly. For example, if sequences thetic, engineered DNAZymes recognizing single and having at least 95% identity with the probe are sought, the double-stranded target cleavage sites have been disclosed in final wash temperature is decreased by 5° C. In practice, the U.S. Pat. No. 6,326,174 to Joyce et al. DNAZymes of similar change in Tm can be between 0.5° C. and 1.5° C. per 1% design directed against the human Urokinase receptor were mismatch recently observed to inhibit Urokinase receptor expression, 0295 The hybridization conditions described here can be and Successfully inhibit colon cancer cell metastasis in vivo employed when the nucleic acids of the invention are used in, (Itoh et al., 20002, Abstract 409, Ann Meeting Am Soc Gen for example, diagnostic assays, or when it is desirable to Ther www.asgt.org). In another application, DNAZymes identify, for example, the homologous genes that fall within complementary to bcr-abl oncogenes were successful in the scope of the invention (as stated elsewhere, the invention inhibiting the oncogenes expression in leukemia cells, and encompasses allelic variants, homologues and orthologues of lessening relapse rates in autologous bone marrow transplant the sequences that represent new genes). Homologous genes in cases of CML and ALL. will hybridize with the sequences that represent new genes 0290 Oligonucleotides having as few as 9-10 nucleotides under a stringency condition described herein. (e.g., 12-14, 15-17, 18-20, 21-23, or 24-27 nucleotides) can 0296. The following is an example of “high stringency’ be useful as probes or expression templates and are within the hybridization conditions: 68°C. in (a) 5xSSC/5xDenhardt's Scope of the present invention. Indeed, fragments that contain solution/1.0% SDS, (b) 0.5 M NaHPO (pH 7.2)/1 mM about 15-20 nucleotides can be used in Southern blotting, EDTA/7% SDS, or (c) 50% formamide/0.25 MNaHPO (pH Northern blotting, dot or slot blotting, PCR amplification 7.2)/0.25 M NaC1/1 mM EDTA/7% SDS, and washing is methods (where naturally occurring or mutant nucleic acids carried out with (a) 0.2xSSC/0.1% SDS at room temperature US 2011/0091454 A1 Apr. 21, 2011 20 or at 42°C., (b) 0.1XSSC/0.1% SDS at 68° C., or (c) 40 mM (or any other test for homology or Substantial identity NaHPO (pH 7.2)/1 mM EDTA and either 1% or 5% SDS at described herein), the sequences should be aligned for maxi 500 C. mum homology, as described elsewhere here. 0297 “Moderately stringent hybridization conditions 0301 The present invention also encompasses mutants constitute, for example, the hybridization conditions e.g., naturally-occurring or synthetic nucleic acids that described above and one or more washes in 3xSSC at 42°C. exhibit an identity level of at least 50%, at least 55%, at least Of course, Salt concentration and temperature can be varied to achieve the optimal level of identity between the probe and 60%, at least 65%, at least 70%, at least 75%, at least 80%, at the target nucleic acid. This is well known in the art, and least 85%, at least 90%, say 95-100% to any of the nucleic additional guidance is available in, for example, Sambrook et acid sequences set forth in the files “Transcripts nucleotide al., 1989, Molecular Cloning, A Laboratory Manual, Cold seqs part1'. “Transcripts nucleotide seqs part2”, “Tran Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., Scripts nucleotide seqs part3. “Transcripts nucleotide and Ausubel et al. (eds.), 1995, Current Protocols in Molecu seqs part4”, “PropG seqs, and “Transcripts.gz' of the lar Biology, John Wiley & Sons, New York, N.Y. enclosed CD-ROM1, CD-ROM2 and CD-ROM4, as deter 0298 As mentioned hereinabove, the nucleic acid mined using the BlastN software of the National Center of sequences of the present invention can be modified to encode Biotechnology Information (NCBI) using default param substitution mutants of the wild type forms. Substitution eters, which encode proteins that retain Substantially at least mutants can include amino acid residues that represent either one, or preferably substantially all of the biological activities a conservative or non-conservative change (or, where more of the referenced protein (i.e., encoding a polypeptide having than one residue is varied, possibly both). A “conservative' an amino acid sequence which exhibits a homology level of at Substitution is one in which one amino acid residue is least 50%, at least 55%, at least 60%, at least 65%, at least replaced with another having a similar side chain. Families of 70%, at least 75%, at least 80%, at least 85%, at least 90%, say amino acid residues having similar side chains have been 95-100% to any of the amino acid sequences set forth in the defined in the art. These families include amino acids with files “protein seqs' and “Proteins.gz' of the enclosed CD basic side chains (e.g., lysine, arginine, histidine), acidic side ROM2 and CD-ROM4, as determined using the BlastP soft chains (e.g., aspartic acid, glutamic acid), uncharged polar ware of the National Center of Biotechnology Information side chains (e.g., glycine, asparagine, glutamine, serine, (NCBI) using default parameters). What constitutes “sub threonine, tyrosine, cysteine), nonpolar side chains (e.g., ala stantially all may vary considerably. For example, in some nine, Valine, leucine, isoleucine, proline, phenylalanine, instances, a variant or mutant protein may be about 5% as methionine, tryptophan), beta-branched side chains (e.g., effective as the protein from which it was derived. But if that threonine, Valine, isoleucine) and aromatic side chains (e.g., level of activity is sufficient to achieve a biologically signifi tyrosine, phenylalanine, tryptophan, histidine). The invention cant result (e.g., transport of a sufficient number of ions across includes polypeptides that include one, two, three, five, or a cell membrane), the variant or mutant protein is one that more conservative amino acid Substitutions, where the result retains substantially all of at least one of the biological activi ing mutant polypeptide has at least one biological activity that ties of the protein from which it was derived. A “biologically is the same, or Substantially the same, as a biological activity active' variant or mutant (e.g., fragment) of a protein can of the wild type polypeptide. participate in an intra- or inter-molecular interaction that can 0299 Fragments or other mutant nucleic acids can be be characterized by specific binding between molecules two made by mutagenesis techniques well known in the art, or more identical molecules (in which case, homodimeriza including those applied to polynucleotides, cells, or organ tion could occur) or two or more different molecules (in isms (e.g., mutations can be introduced randomly along all or which case, heterodimerization could occur). Often, a bio part of the nucleic acid sequences of the present invention by logically active fragment will be recognizable by virtue of a saturation mutagenesis). The resultant mutant proteins can be recognizable domain or motif, and one can confirm biological screened for biological activity to identify those that retain activity experimentally. More specifically, for example, one activity or exhibit altered activity. can make (by synthesis or recombinant techniques) a nucleic 0300. In certain embodiments, nucleic acids of the inven acid fragment that encodes a potentially biologically active tion differ from the nucleic acid sequences provided in files portion of a protein of the present invention by inserting the “Transcripts nucleotide seqs part1'. “Transcripts nucle active fragment into an expression vector, and expressing the otide seqs part2. “Transcripts nucleotide seqs part3. protein (expression constructs and expression systems are new'. “Transcripts nucleotide seqs part4”, “PropG seqs'. described further below), and finally assessing the ability of and “Transcripts.gz” (provided in CD-ROM1, CD-ROM2 the protein to function. and CD-ROM4) by at least one, but less than 10, 20, 30, 40, 0302) The present invention also encompasses chimeric 50, 100, or 200 nucleotides or, alternatively, at less than 1%. nucleic acid sequences that encode fusion proteins. For 5%, 10% or 20% of the nucleotides in the subject nucleic acid example, a nucleic acid sequence of the invention can include (excluding, of course, splice variants known in the art). Simi a sequence that encodes a hexa-histidine tag (to facilitate larly, in certain embodiments, proteins of the invention can purification of bacterially-expressed proteins) or a hemagglu differ from those encoded by those included in Files “Protein. tinin tag (to facilitate purification of proteins expressed in seqs' and “Proteins.gz' (provided in CD-ROM2 and CD eukaryotic cells). ROM4) by at least one, but less than 10, 20, 30, 40, 50, 100, 0303. The fused heterologous sequence can also encode a or 200 amino acid residues or, alternatively, at less than 1%. portion of an immunoglobulin (e.g., the constant region (Fc) 5%, 10% or 20% of the amino acid residues in a subject ofan IgG molecule), a detectable marker, or a signal sequence protein (excluding, of course, proteins encoded by splice (e.g., a sequence that is recognized and cleaved by a signal variants known in the art (proteins of the invention are peptidase in the host cell in which the fusion protein is described in more detail below)). If necessary for this analysis expressed). Fusion proteins containing an Fc region can be US 2011/0091454 A1 Apr. 21, 2011 purified using a protein A column, and they have increased hormone, or some other polypeptide (e.g., that used in the stability (e.g., a greater circulating half-life) in vivo. tetracycline-inducible system, "Tet-On” and "Tet-Off: see, 0304 Detectable markers are well known in the art and e.g., Clontech Inc. (Palo Alto, Calif.), Gossen and Bujard can be used in the context of the present invention. For Proc. Natl. Acad. Sci. USA 89:5547, 1992, and Paillard, example, the expression vector puR278 (Ruther et al., Human Gene Therapy 9:983, 1989). EMBO.J., 2:1791, 1983) can be used to fuse a nucleic acid of 0309 The expression vector will be selected or designed the invention to the lacZ gene (which encodes B-galactosi depending on, for example, the type of host cell to be trans dase). formed and the level of protein expression desired. For 0305. A nucleic acid sequence of the invention can also be example, when the host cells are mammalian cells, the fused to a sequence that, when expressed, improves the quan expression vector can include viral regulatory elements. Such tity or quality (e.g., solubility) of the fusion protein. For as promoters derived from polyoma, Adenovirus 2, cytome example, pGEX vectors can be used to express the proteins of galovirus and Simian Virus 40. The nucleic acid inserted (i.e., the invention fused to glutathione S- (GST). In the sequence to be expressed) can also be modified to encode general. Such fusion proteins are soluble and can be easily residues that are preferentially utilized in E. coli (Wada et al., purified from lysed cells by adsorption to glutathione-agarose Nucleic Acids Res. 20:2111-21 18, 1992). These modifica beads followed by elution in the presence of free glutathione. tions can be achieved by standard DNA synthesis techniques. The pGEX vectors (Pharmacia Biotech Inc.; Smith and 0310 Expression vectors can be used to produce the pro Johnson, Gene 67:31-40, 1988) are designed to include teins encoded by the nucleic acid sequences of the invention thrombin or factor Xa protease cleavage sites so that the ex vivo (e.g., the expressed proteins can be purified from cloned target gene product can be released from the GST expression systems such as those described herein) or in vivo moiety. Other useful vectors include pMAL (New England (i.e., for example, whole organisms). Proteins can be Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, expressed in Vivo in a way that restores expression to within N.J.), which fuse maltose E binding protein and protein A, normal limits and/or restores the temporal or spatial patterns respectively, to a protein of the invention. of expression normally observed. Alternatively, proteins can 0306 A signal sequence, when present, can facilitate be aberrantly expressed in vivo (i.e., at a time or place, or to an secretion of the fusion protein from a cell, and can be cleaved extent, that does not normally occur in vivo). For example, off by the host cell. The nucleic acid sequences of the present proteins can be over expressed or under expressed with invention can also be fused to “inactivating sequences, respect to expression in a wild-type state; expressed at a which render the fusion protein encoded, as a whole, inactive. different developmental stage; expressed at a different time Such proteins can be referred to as “preproteins.” and they can during the cell cycle; or expressed in a tissue or cell type be converted into an active form of the protein by removal of where expression does not normally occur. the inactivating sequence. 0311. The present invention also encompasses various 0307 The present invention also encompasses expression engineered cells, including cells that have been engineered to constructs (e.g., plasmids, cosmids, and other vectors that express or over-express a nucleic acid sequence described transport nucleic acids) that include a nucleic acid of the herein. Accordingly, the cells can be transformed with a invention in a sense or antisense orientation. The nucleic expression construct, Such as those described above. A “trans acids can be operably linked to a regulatory sequence (e.g., a formed cell is a cell into which (or into an ancestor of which) promoter, enhancer, or other expression control sequence, one has introduced a nucleic acid that encodes a protein of the Such as a polyadenylation signal) that facilitates expression of invention. The nucleic acid can be introduced by any of the the nucleic acid. The vector can replicate autonomously or art-recognized techniques for introducing nucleic acids into a integrate into a host genome, and can be a viral vector, such as host cell (e.g., calcium phosphate or calcium chloride co a replication defective retrovirus, an adenovirus, oran adeno precipitation, DEAE-dextran-mediated transfection, lipofec associated virus. tion, or electroporation). 0308 When present, the regulatory sequence can direct 0312. The phrases “transformed cell' or “host cell” refer constitutive or tissue-specific expression of the nucleic acid. not only to the particular subject cell, but also to the progeny Tissue-specific promoters include, for example, the liver or potential progeny of such cells. Mutations or environmen specific albumin promoter (Pinkert et al., Genes Dev. 1:268 tal influences may modify the cells in Succeeding generations 277, 1987), lymphoid-specific promoters (Calame and Eaton, and, even though Such progeny may not be identical to the Adv. Immunol. 43:235-275, 1988), such as those of T cell parent cell, they are nevertheless within the scope of the receptors (Winoto and Baltimore, EMBO J. 8:729-733, 1989) invention. The cells of the invention can be "isolated” cells or and immunoglobulins (Banerji et al., Cell 33:729-740, 1982; “purified preparations' of cells (e.g., an invitro preparation of Queen and Baltimore, Cell 33:741-748, 1983), the neuron cells), either of which can be obtained from multicellular specific neurofilament promoter (Byrne and Ruddle, Proc. organisms such as plants and animals (in which case the Natl. Acad. Sci. USA 86:5473-5477, 1989), pancreas-spe purified preparation would constitute a subset of the cells cific promoters (Edlund et al., Science 230:912-916, 1985), from the organism). In the case of unicellular microorganisms and mammary gland-specific promoters (e.g., milk whey pro (e.g., microbial cells), the preparation is purified when at least moter; see U.S. Pat. No. 4,873.316 and European Application 10% (e.g., 25%, 50%, 75%, 80%, 90%. 95% or more) of the Publication No. 264,166). Developmentally-regulated pro cells within it are the cells of interest (e.g., the cells that moters can also be used. Examples of such promoters include express a protein of the invention). the murine hox promoters (Kessel and Gruss, Science 249: 0313 The expression vectors of the invention can be 374-379, 1990) and the fetoprotein promoter (Campes and designed to express proteins in prokaryotic or eukaryotic Tilghman, Genes Dev. 3:537-546, 1989). Moreover, the pro cells. For example, polypeptides of the invention can be moter can be an inducible promoter. For example, the pro expressed in bacterial cells (e.g., E. coli), fungi, yeast, or moter can be regulated by a steroid hormone, a polypeptide insect cells (e.g., using baculovirus expression vectors). For US 2011/0091454 A1 Apr. 21, 2011 22 example, a baculovirus Such as Autographa Californica host cell chromosome is selected for by including 0.01-300 nuclear polyhedrosis virus (AcNPV), which grows in mM methotrexate in the cell culture medium (as described in Spodoptera frugiperda cells, can be used as a vector to Ausubel et al., Supra). This dominant selection can be accom express foreign genes. A nucleic acid of the invention can be plished in most cell types. cloned into a non-essential region (for example the polyhe 0318 Moreover, recombinant protein expression can be drin gene) of the viral genome and placed under control of a increased by DHFR-mediated amplification of the trans promoter (e.g., the polyhedrin promoter). Successful inser fected gene. Methods for selecting cell lines bearing gene tion of the nucleic acid results in inactivation of the polyhe amplifications are described in Ausubel et al. (Supra) and drin gene and production of non-occluded recombinant virus generally involve extended culture in medium containing (i.e., virus lacking the proteinaceous coat encoded by the gradually increasing levels of methotrexate. DHFR-contain polyhedrin gene). These recombinant viruses are then typi ing expression vectors commonly used for this purpose cally used to infect insect cells (e.g., Spodoptera frugiperda include pCVSEII-DHFR and pAdD26SV(A) (which are also cells) in which the inserted gene is expressed (see, e.g., Smith described in Ausubel et al., Supra). et al., J. Virol. 46:584, 1983 and U.S. Pat. No. 4,215,051). If 0319. A number of other selection systems can be used. desired, mammalian cells can be used in lieu of insect cells, These include those based on herpes simplex virus thymidine provided the virus is engineered so that the nucleic acid is kinase, hypoxanthine-guanine phosphoribosyl-transferase, placed under the control of a promoter that is active in mam and adenine phosphoribosyltransferase genes, which can be malian cells. employed in tk, hgprt, or aprt cells, respectively. In addition, 0314. Useful mammalian cells include rodent cells, such gpt, which confers resistance to mycophenolic acid (Mulli as Chinese hamster ovary cells (CHO) or COS cells, primate gan et al., Proc. Natl. Acad. Sci. USA, 78:2072, 1981); neo, cells. Such as African green monkey kidney cells, rabbit cells, which confers resistance to the aminoglycoside G-418 (Col or pig cells). The mammalian cells can also be human cells berre-Garapin et al., J. Mol. Biol. 150:1, 1981); and hygro, (e.g., a hematopoietic cell, a fibroblast, or a tumor cell). For which confers resistance to hygromycin (Santerre et al., Gene example, HeLa cells, 293 cells, 3T3 cells, and WI38 cells are 30:147, 1981), can be used. useful. Other suitable host cells are known to those skilled in 0320 In view of the foregoing, it is clear that one can the art and are discussed further in Goeddel Gene Expression synthesize proteins encoded by the nucleic acid sequences of Technology: Methods in Enzymology 185, Academic Press, the present invention (i.e., recombinant proteins). Methods of San Diego, Calif., (1990). generating and recombinant proteins are well known in the 0315 Proteins can also be produced in plant cells, if art. Recombinant protein purification can be effected by affin desired. For plant cells, viral expression vectors (e.g., cauli ity. Where a protein of the invention has been fused to a flower mosaic virus and tobacco mosaic virus) and plasmid heterologous protein (e.g., a maltose binding protein, a B-ga expression vectors (e.g., Tiplasmid) are suitable. These cells lactosidase protein, or a trpE protein), antibodies or other and other types are available from a wide range of Sources agents that specifically bind to the latter can facilitate purifi e.g., the American Type Culture Collection, Manassas, Va.; cation. The recombinant protein can, if desired, be further see also, e.g., Ausubel et al., Current Protocols in Molecular purified (e.g., by high performance liquid chromatography or Biology, John Wiley & Sons, New York, (1994). The optimal other standard techniques see, Fisher, Laboratory Tech methods of transformation (by, for example, transfection) niques. In Biochemistry And Molecular Biology, Eds. Work and, as noted above, the choice of expression vehicle will and Burdon, Elsevier, (1980). depend on the host system selected. Transformation and 0321. Other purification schemes are known as well. For transfection methods are described in, for example, Ausubel example, non-denatured fusion proteins can be purified from et al., Supra; expression vehicles can be chosen from those human cell lines as described by Janknecht et al. (Proc. Natl. provided in, for example, Pouwels et al., Cloning Vectors: A Acad. Sci. USA, 88:8972, 1981). In this system, a nucleic Laboratory Manual, (1985), Supp. (1987). The host cells acid is Subcloned into a vaccinia recombination plasmid Such harboring the expression vehicle can be cultured in conven that it is translated, in frame, with a sequence encoding an tional nutrient media, adapted as needed for activation of a N-terminal tag consisting of six histidine residues. Extracts of chosen nucleic acid, repression of a chosen nucleic acid, cells infected with the recombinant vaccinia virus are loaded selection of transformants, or amplification of a chosen onto Ni" nitriloacetic acid-agarose columns, and histidine nucleic acid. tagged proteins are selectively eluted with imidazole-con 0316 Expression systems can be selected based on their taining buffers. ability to produce proteins that are modified (e.g., by phos 0322. Alternatively, Chemical synthesis can also be uti phorylation, glycosylation, or cleavage) in Substantially the lized to generate the proteins of the present invention e.g., same way they would be in a cell in which they are naturally proteins can be synthesized by the methods described in Solid expressed. Alternatively, the system can be one in which Phase Peptide Synthesis, 2nd Ed., The Pierce Chemical Co., naturally occurring modifications do not occur, or occur in a Rockford, Ill., (1984). different position, or to a different extent, than they otherwise 0323. The invention also features expression vectors that would. can be transcribed and translated in vitro using, for example, 0317. If desired, the host cells can be those of a stably a T7 promoter and T7 polymerase. Thus, the invention transfected cell line. Vectors suitable for stable transfection of encompasses methods of making the proteins described mammalian cells are available to the public (see, e.g., Pou herein in vitro. wels et al. (Supra) as are methods for constructing them (see, 0324 Sufficiently purified proteins can be used as e.g., Ausubel et al. (Supra). In one example, a nucleic acid of described herein. For example, one can administer the protein the invention is cloned into an expression vector that includes to a patient, use it in diagnostic or screening assays, or use it the dihydrofolate reductase (DHFR) gene. Integration of the to generate antibodies (these methods are described further plasmid and, therefore, the nucleic acid it contains, into the below). US 2011/0091454 A1 Apr. 21, 2011

0325 The cells perse can also be administered to patients rial) with which they may have, at one time, been associated in the context of replacement therapies. For example, a (naturally or otherwise). Purity can be measured by any nucleic acid of the present invention can be operably linked to appropriate standard method (e.g., column chromatography, an inducible promoter (e.g., a steroid hormone receptor-regu polyacrylamide gel electrophoresis, or HPLC analysis lated promoter) and introduced into a human or nonhuman 0330. The proteins of the present invention also include (e.g., porcine) cell and then into a patient. Optionally, the cell those encoded by novel fragments or other mutants (i.e., can be cultivated for a time or encapsulated in a biocompat naturally-occurring or synthetic) or variants of the protein ible material. Such as poly-lysine alginate. See, e.g., Lanza, encoding sequences of the present invention. Thus, the Nature Biotechnol. 14:1107, (1996); Joki et al. Nature Bio present invention envisages polypeptide sequences having technol. 19:35, 2001; and U.S. Pat. No. 5,876,742. When a amino acid sequences which exhibit a homology level of at steroid hormone receptor-regulated promoter is used, protein least 50%, at least 55%, at least 60%, at least 65%, at least production can be regulated in the Subject by administering a 70%, at least 75%, at least 80%, at least 85%, at least 90%, say steroid hormone to the subject. Implanted recombinant cells 95-100% to any of the polypeptide sequences set forth in the can also express and secrete an antibody that specifically files “protein seqs', and “Proteins.gz of the enclosed CD binds to one of the proteins encoded by the nucleic acid ROM2 and CD-ROM4, as determined using the BlastP soft sequences of the present invention. The antibody can be any ware of the National Center of Biotechnology Information antibody or any antibody derivative described herein. An (NCBI) using default parameters. These proteins can retain antibody “specifically binds” to a particular antigen when it substantially all (e.g., 70%, 80%, 90%, 95%, or 99%) of the binds to that antigen but not, to a detectable level, to other biological activity of the full-length protein from which they molecules in a sample (e.g., a tissue or cell culture) that were derived and can, therefore, be used as agonists or mimet naturally includes the antigen. ics of the proteins from which they were derived. The manner 0326. While the host cells described above express recom in which biological activity can be determined is described binant proteins, the invention also encompasses cells in which generally herein, and specific assays (e.g., assays of enzy gene expression is disrupted (e.g., cells in which a gene has matic activity or ligand-binding ability) are known to those of been knocked out). These cells can serve as models of disor ordinary skill in the art. In some instances, retention of bio ders that are related to mutated or mis-expressed alleles and logical activity is not necessary or desirable. For example, are also useful in drug screening. fragments that retain little, if any, of the biological activity of 0327 Protein expression can also be regulated in cells a full-length protein can be used as immunogens, which, in without using the expression constructs described above. turn, can be used as therapeutic agents (e.g., to generate an Instead, one can modify the expression of an endogenous immune response in a patient), diagnostic agents (e.g., to gene within a cell (e.g., a cell line or microorganism) by detect the presence of antibodies or other proteins in a tissue inserting a heterologous DNA regulatory element into the sample obtained from a patient), or to generate or test anti genome of the cell such that the element is operably linked to bodies that specifically bind the proteins of the invention. the endogenous gene. For example, an endogenous gene that 0331. In other instances, the proteins encoded by nucleic is “transcriptionally silent.” (i.e., not expressed at detectable acids of the invention can be modified (e.g., fragmented or levels) can be activated by inserting a regulatory element that otherwise mutated) so their activities oppose those of the promotes the expression of a normally expressed gene prod naturally occurring protein (i.e., the invention encompasses uct in that cell. Techniques such as targeted homologous variants of the proteins encoded by nucleic acids of the inven recombination can be used to insert the heterologous DNA tion that are antagonistic to a biological process). One of (see, e.g., U.S. Pat. No. 5.272,071 and WO 91/06667). ordinary skill in the art will recognize that the more extensive 0328. The polypeptides of the present invention include the mutation, the more likely it is to affect the biological the protein sequences contained in the Files “Protein.seqs' of activity of the protein (this is not to say that minor modifica CD-ROM2 and “Proteins.gz” of the enclosed CD-ROM4 and tions cannot do so as well). Thus, it is likely that mutant those encoded by the nucleic acids described herein (So long proteins that are agonists of those encoded by wild type as those nucleic acids contain coding sequence and are not proteins will differ from those wild type proteins only at wholly limited to an untranslated region of a nucleic acid non-essential residues or will contain only conservative Sub sequence), regardless of whether they are recombinantly pro stitutions. Conversely, antagonists are likely to differ at an duced (e.g., produced in and isolated from cultured cells), essential residue or to contain non-conservative Substitutions. otherwise manufactured (by, for example, chemical synthe Moreover, those of ordinary skill in the art can engineer sis), or isolated from a natural biological source (e.g., a cellor proteins so that they retain desirable traits (i.e., those that tissue) using standard protein purification techniques. make them efficacious in a particular therapeutic, diagnostic, 0329. The terms “peptide,” “polypeptide,” and “protein' or screening regime) and lose undesirable traits (i.e., those are used herein interchangeably to refer to a chain of amino that produce side effects, or produce false-positive results acid residues, regardless of length or post-translational modi through non-specific binding). fication (e.g., glycosylation orphosphorylation). Proteins (in 0332. In the event a protein of the invention is encoded by cluding antibodies that specifically bind to the products of a new gene, the invention encompasses proteins that arise those nucleic acid sequences that encode protein or fragments following alternative transcription, RNA splicing, transla thereof) and other compounds can be "isolated' or “purified.” tional- or post-translational events (e.g., the invention encom The proteins and compounds of the present invention are passes splice variants of the new genes). In the event a protein "isolated or “purified when they exist as a composition that of the invention is encoded by a novel splice variant, the is at least 60% (e.g., 70%, 75%, 80%, 85%, 90%, 95%, or invention encompasses proteins that arise following alterna 99% or more) by weight the protein or compound of interest. tive translational- or post-translational events (i.e., the inven Thus, the proteins of the invention are substantially free from tion does not encompass proteins encoded by known splice the cellular material (or other biological or cell culture mate variants, but does encompass other variants of the novel splice US 2011/0091454 A1 Apr. 21, 2011 24 variant). Post-translational modifications are discussed above tarity determining regions' (CDR), which are interspersed in the context of expression systems. with more highly conserved “framework regions’ (FR). 0333. The fragmented or otherwise mutant proteins of the These regions have been precisely defined see, Kabat et al., invention can differ from those encoded by the nucleic acids Sequences of Proteins of Immunological Interest, Fifth Edi of the invention to a limited extent (e.g., by at least one but less tion, U.S. Department of Health and Human Services, NIH than 5, 10 or 15 amino acid residues). As with other, more Publication No. 91-3242, (1991) and Chothia. et al., J. Mol. extensive mutations, the differences can be introduced by Biol. 196:901-917, (1987), and antibodies or antibody frag adding, deleting, and/or Substituting one or more amino acid ments containing one or more of them are within the scope of residues. Alternatively, the mutant proteins can differ from the invention. the wild type proteins from which they were derived by at 0337 The antibodies of the invention can also include a least one residue but less than 5%, 10%, 15% or 20% of the heavy and/or light chain constant region constant regions residues when analyzed as described herein. If the mutant and typically mediate binding between the antibody and host wild type proteins are different lengths, they can be aligned tissues or factors, including effector cells of the immune and analyzed using the algorithms described above. system and the first component (Cld) of the classical comple 0334. Useful variants, fragments, and other mutants of the ment system, and can therefore form heavy and light immu proteins encoded by the nucleic acids of the invention can be noglobulin chains, respectively. For example, the antibody identified by screening combinatorial libraries of these vari can be a tetramer (two heavy and two light immunoglobulin ants, fragments, and other mutants for agonist or antagonist chains, which can be connected by, for example, disulfide activity. For example, libraries of fragments (e.g., N-terminal, bonds). The heavy chain constant region contains three C-terminal, or internal fragments) of one or more of the domains (CH1, CH2 and CH3), whereas the light chain con proteins of the invention can be used to generate populations stant region has one (CL). of fragments that can be screened and, once identified, iso 0338 An antigen-binding fragment of the invention can lated. The proteins can include those in which one or more be: (i) a Fab fragment (i.e., a monovalent fragment consisting cysteine residues are added or deleted, or in which a glyco of the VL, VH, CL and CH1 domains); (ii) a F(ab'), fragment sylated residue is added or deleted. Methods for screening (i.e., a bivalent fragment containing two Fab fragments linked libraries (e.g., combinatorial libraries of proteins made from by a disulfide bond at the hinge region); (iii) a Fd fragment point mutants or cDNA libraries) for proteins or genes having consisting of the VH and CH1 domains; (iv) a Fv fragment a particular property are known in the art. These methods can consisting of the VL and VH domains of a single arm of an be adapted for rapid screening. Recursive ensemble mutagen antibody, (v) a dAb fragment Ward et al., Nature 341:544 esis (REM), a new technique that enhances the frequency of 546, (1989), which consists of a VH domain; and (vi) an functional mutants in libraries, can be used in combination isolated complementarity determining region (CDR). with screening assays to identify useful variants of the pro 0339 F(ab')2 fragments can be produced by pepsin diges teins of the present invention Arkin and Yourvan, Proc. Natl. tion of the antibody molecule, and Fab fragments can be Acad. Sci. USA 89:7811-7815, (1992); Delgrave et al., Pro generated by reducing the disulfide bridges of F(ab')2 frag tein Engineering 6:327-331, (1993). ments. Alternatively, Fab expression libraries can be con 0335 Cell-based assays can be exploited to analyze var structed Huse et al., Science 246:1275, (1989) to allow iegated libraries constructed from one or more of the proteins rapid and easy identification of monoclonal Fab fragments of the invention. For example, cells in a cell line (e.g., a cell with the desired specificity. Methods of making other anti line that ordinarily responds to the protein(s) of interest in a bodies and antibody fragments are known in the art. For substrate-dependent manner) can be transfected with a library example, although the two domains of the Fv fragment, VL of expression vectors. The transfected cells are then contacted and VH, are coded for by separate genes, they can be joined, with the protein and the effect of the expression of the mutant using recombinant methods or a synthetic linker that enables on signaling by the protein (Substrate) can be detected (e.g., them to be made as a single protein chain in which the VL and by measuring redox activity or protein folding). Plasmid VH regions pair to form monovalent molecules known as DNA can then be recovered from the cells that score for single chain Fv (scFv); see e.g., Bird et al., Science 242:423 inhibition, or alternatively, potentiation of signaling by the 426, (1988); Huston et al., Proc. Natl. Acad. Sci. USA protein (substrate). Individual clones are then further charac 85:5879-5883, (1988); Colcher et al., Ann. NY Acad. Sci. terized. 880:263-80, (1999); and Reiter, Clin. Cancer Res. 2:245-52, 0336. The invention also contemplates antibodies (i.e., (1996). Techniques for producing single chain antibodies are immunoglobulin molecules) that specifically bind (see the also described in U.S. Pat. Nos. 4,946,778 and 4,704,692. definition above) to the proteins described herein and anti Such single chain antibodies are encompassed within the body fragments (e.g., antigen-binding fragments or other term “antigen-binding fragment of an antibody. These anti immunologically active portions of the antibody). For body fragments are obtained using conventional techniques example, an antibody which specifically binds the troponin known to those of ordinary skill in the art, and the fragments variants of the present invention is preferably directed to the are screened for utility in the same manner that intact anti unique amino acid sequence region which is not shared by bodies are screened. Moreover, a single chain antibody can wild-type troponin (see FIG. 21, SEQ ID NO: 87). Alterna form dimers or multimers and, thereby, become a multivalent tively, such an antibody can be directed to an amino acid antibody having specificities for different epitopes of the sequence which bridges the unique sequence region and com same target protein. mon sequence regions. Antibodies are proteins, and those of 0340. The antibody can be a polyclonal (i.e., part of a the invention can have at least one or two heavy chain variable heterogeneous population of antibody molecules derived regions (VH), and at least one or two light chain variable from the Sera of the immunized animals) or a monoclonal regions (VL). The VH and VL regions can be further subdi antibody (i.e., part of a homogeneous population of antibod vided into regions of hypervariability, termed “complemen ies to a particular antigen), either of which can be recombi US 2011/0091454 A1 Apr. 21, 2011 nantly produced (e.g., produced by phage display or by com genes or the myriad immunoglobulin variable region genes). binatorial methods, as described in, e.g., U.S. Pat. No. 5,223, Alternatively, the antibody can be a non-human antibody 409, WO 92/18619, WO 91/17271; WO 92/20791; WO (e.g., a rodent (e.g., a mouse or rat), goat, or non-human 92/15679; WO93/01288; WO92/01047; WO92/09690; WO primate (e.g., monkey) antibody). 90/028.09: Fuchs et al., Bio/Technology 9:1370-1372, 0345 Methods of producing antibodies are well known in (1991); Hay et al. Human Antibody Hybridomas 3:81–85, the art. For example, as noted above, human monoclonal (1992); Huse et al. Science 246:1275-1281, (1989); Griffths antibodies can be generated in transgenic mice carrying the et al. EMBO.J. 12:725-734, (1993); Hawkins et al., J. Mol. human immunoglobulingenes rather than those of the mouse. Biol 226:889-896, (1992); Clackson et al. Nature 352:624 Splenocytes obtained from these mice (after immunization 628, (1991); Gramet al., Proc. Natl. Acad. Sci. USA89:3576 with an antigen of interest) can be used to produce hybrido 3580, (1992); Garrad et al., Bio/Technology 9:1373-1377, mas that secrete human mAbs with specific affinities for (1991); Hoogenboom et al. Nucl. Acids Res. 19:4133-4137, epitopes from a human protein (see, e.g., WO 91/00906, WO (1991); and Barbas et al., Proc. Natl. Acad. Sci. USA 91/10741; WO92/03918: WO 92/03917; Lonberg et al., 88:7978-7982, (1991). In one embodiment, an antibody is Nature 368:856-859, 1994; Green et al., Nature Genet. 7:13 made by immunizing an animal with a protein encoded by a 21, 1994; Morrisonetal. Proc. Natl. Acad. Sci. USA81:6851 nucleic acid of the invention (one, of course, that contains 6855, 1994; Bruggeman et al., Immunol. 7:33-40, 1993: coding sequence) or a mutant or fragment (e.g., an antigenic Tuaillon et al., Proc. Natl. Acad. Sci. USA 90:3720-3724, peptide fragment) thereof. Alternatively, an animal can be 1993; and Bruggeman et al., Eur. J. Immunol. 21:1323-1326, immunized with a tissue sample (e.g., a crude tissue prepa 1991). ration, a whole cell (living, lysed, or fractionated) or a mem 0346. The antibody can also be one in which the variable brane fraction). Thus, antibodies of the invention can specifi region, or a portion thereof (e.g., a CDR), is generated in a cally bind to a purified antigen or a tissue (e.g., a tissue non-human organism (e.g., a rat or mouse). Thus, the inven section, a whole cell (living, lysed, or fractionated) or a mem tion encompasses chimeric, CDR-grafted, and humanized brane fraction). antibodies and antibodies that are generated in a non-human 0341 In the event an antigenic peptide is used, it can organism and then modified (in, e.g., the variable framework include at least eight (e.g., 10, 15, 20, or 30) consecutive or constant region) to decrease antigenicity in a human. Chi amino acid residues found in a protein of the invention. The meric antibodies (i.e., antibodies in which different portions antibodies generated can specifically bind to one of the pro are derived from different animal species (e.g., the variable teins in their native form (thus, antibodies with linear or region of a murine mAb and the constant region of a human conformational epitopes are within the invention), in a dena immunoglobulin) can be produced by recombinant tech tured or otherwise non-native form, or both. Conformational niques known in the art. For example, a gene encoding the Fc epitopes can sometimes be identified by identifying antibod constant region of a murine (or other species) monoclonal ies that bind to a protein in its native form, but not in a antibody molecule can be digested with restriction enzymes denatured form. to remove the region encoding the murine Fc, and the equiva 0342. The host animal (e.g., a rabbit, mouse, guinea pig, or lent portion of a gene encoding a human Fc constant region rat) can be immunized with the antigen, optionally linked to a can be substituted therefore see European Patent Application carrier (i.e., a Substance that stabilizes or otherwise improves Nos. 125,023; 184, 187; 171496; and 173,494; see also WO the immunogenicity of an associated molecule), and option 86/01533: U.S. Pat. No. 4,816,567: Better et al., Science ally administered with an adjuvant (see, e.g., Ausubel et al., 240:1041-1043, (1988); Liu et al., Proc. Natl. Acad. Sci. USA Supra). An exemplary carrier is keyhole limpet hemocyanin 84:3439-3443, (1987); Liu et al., J. Immunol. 139:3521 (KLH) and exemplary adjuvants, which will be selected in 3526, (1987); Sun et al., Proc. Natl. Acad. Sci. USA 84:214 view of the host animal's species, include Freund's adjuvant 218, (1987); Nishimura et al., Cancer Res. 47: 999-1005, (complete or incomplete), adjuvant mineral gels (e.g., alumi (1987); Wood et al., Nature 314:446-449, (1985); Shaw et al., num hydroxide), Surface active Substances Such as lysoleci J. Natl. Cancer Inst. 80:1553-1559, (1988); Morrison et al., thin, pluronic polyols, polyanions, peptides, oil emulsions, Proc. Natl. Acad. Sci. USA81:6851, (1984); Neuberger et al., dinitrophenol, BCG (bacille Calmette-Guerin), and Coryne Nature 312:604, (1984); and Takeda et al., Nature 314:452, bacterium parvum. KLH is also sometimes referred to as an (1984). adjuvant. The antibodies generated in the host can be purified 0347 In a humanized or CDR-grafted antibody, at least by, for example, affinity chromatography methods in which one or two, but generally all three of the recipient CDRs (of the polypeptide antigen is immobilized on a resin. heavy and or light immuoglobulin chains) will be replaced 0343 Epitopes encompassed by an antigenic peptide may with a donor CDR. One need only replace the number of be located on the Surface of the protein (e.g., in hydrophilic CDRs required for binding of the humanized antibody to a regions), or in regions that are highly antigenic (such regions protein described herein or a fragment thereof. The donor can can be selected, initially, by virtue of containing many be a rodent antibody, and the recipient can be a human frame charged residues). An Emini Surface probability analysis of work or a human consensus framework. Typically, the immu human protein sequences can be used to indicate the regions noglobulin providing the CDRs is called the “donor” (and is that have a particularly high probability of being localized to often that of a rodent) and the immunoglobulin providing the the surface of the protein. framework is called the “acceptor.” The acceptor framework 0344) The antibody can be a fully human antibody (e.g., an can be a naturally occurring (e.g., a human) framework, a antibody made in a mouse that has been genetically engi consensus framework or sequence, or a sequence that is at neered to produce an antibody from a human immunoglobu least 85% (e.g., 90%. 95%, 99%) identical thereto. A “con lin sequence, such as that of a human immunoglobulin gene sensus sequence' is one formed from the most frequently (the kappa, lambda, alpha (IgA1 and IgA2), gamma (IgG1. occurring amino acids (or nucleotides) in a family of related IgG2, IgG3, IgG4), delta, epsilon and mu constant region sequences (see, e.g., Winnaker, From Genes to Clones, Ver US 2011/0091454 A1 Apr. 21, 2011 26 lagsgesellschaft, Weinheim, Germany, 1987). Each position (by, for example, affinity chromatography or immunoprecipi in the consensus sequence is occupied by the amino acid tation) or to detect them in, for example, a cell lysate or residue that occurs most frequently at that position in the supernatant (by Western blotting, ELISAs, radioimmune family (where two occur equally frequently, either can be assays, and the like) or a histological section. One can there included). A "consensus framework” refers to the framework fore determine the abundance and pattern of expression of a region in the consensus immunoglobulin sequence. particular protein. This information can be useful in making a 0348 An antibody can be humanized by methods known diagnosis or in evaluating the efficacy of a clinical test. in the art. For example, humanized antibodies can be gener ated by replacing sequences of the FV variable region that are 0353. The invention also includes the nucleic acids that not directly involved in antigen binding with equivalent encode the antibodies described above and vectors and cells sequences from human FV variable regions. General methods (e.g., mammalian cells Such as CHO cells or lymphatic cells) for generating humanized antibodies are provided by Morri that contain them. Similarly, the invention includes cell lines son Science 229:1202-1207, (1985), Oi et al. BioTech (e.g., hybridomas) that make the antibodies of the invention niques 4:214, (1986), and Queen et al. (U.S. Pat. Nos. 5,585, and methods of making those cell lines. 089; 5,693,761 and 5,693,762). Those nucleic acid sequences 0354) Non-human transgenic animals are also within the required by these methods can be obtained from a hybridoma Scope of the invention. These animals can be used to study the producing an antibody the polypeptides of the present inven function or activity of proteins of the invention and to identify tion, or fragments thereof. The recombinant DNA encoding or evaluate agents that modulate their activity. A “transgenic the humanized antibody, or fragment thereof, can then be animal can be a mammal (e.g., a mouse, rat, dog, pig, cow, cloned into an appropriate expression vector. sheep, goat, or non-human primate), an avian (e.g., a 0349 Humanized or CDR-grafted antibodies can be pro chicken), or an amphibian (e.g. a frog) having one or more duced such that one, two, or all CDRs of an immunoglobulin cells that include a transgene (e.g., an exogenous DNA mol chain can be replaced see, e.g., U.S. Pat. No. 5.225,539; ecule or a rearrangement (e.g., deletion of) endogenous chro Jones et al., Nature 321:552-525, (1986); Verhoeyan et al., mosomal DNA). The transgene can be integrated into or can Science 239:1534, (1988); and Beidler et al., J. Immunol. occur within the genome of the cells of the animal, and it can 141:4053-4060, (1988). Thus, the invention features human direct the expression of an encoded gene product in one or ized antibodies in which specific amino acid residues have been substituted, deleted or added (in, e.g., in the framework more types of cells or tissues. Alternatively, a transgene can region to improve antigen binding). For example, a human “knockout' or reduce gene expression. This can occur when ized antibody will have framework residues identical to those an endogenous gene has been altered by homologous recom of the donor or to amino acid residues other than those of the bination, which occurs between it and an exogenous DNA recipient framework residue. To generate Such antibodies, a molecule that was introduced into a cell of the animal (e.g., an selected, small number of acceptor framework residues of the embryonic cell) at a very early stage in the animal's develop humanized immunoglobulin chain are replaced by the corre ment. sponding donor amino acids. The Substitutions can occur 0355 Intronic sequences and polyadenylation signals can adjacent to the CDR or in regions that interact with a CDR be included in the transgene and, when present, can increase (U.S. Pat. No. 5,585,089, see especially columns 12-16). expression. One or more tissue-specific regulatory sequences Other techniques for humanizing antibodies are described in can also be operably linked to a transgene of the invention to EP 519596 A1. direct expression of protein to particular cells (exemplary 0350. In certain embodiments, the antibody has an effector regulatory sequences are described above, and many others function and can fix complement, while in others it can nei are known to those of ordinary skill in the art). ther recruit effector cells nor fix complement. The antibody can also have little or no ability to bind an Fc receptor. For 0356. A “founder animal is one that carries a transgene of example, it can be an isotype or Subtype, or a fragment or the invention in its genome or expresses mRNA from the other mutant that cannot bind to an Fc receptor (e.g., the transgene in its cells or tissues. Founders can be bred to antibody can have a mutant (e.g., a deleted) Fc receptor bind produce a line of transgenic animals carrying the founder's ing region). The antibody may or may not alter (e.g., increase transgene or bred with founders carrying other transgenes (in or decrease) the activity of a protein to which it binds. which case the progeny would bear the transgenes borne by 0351. In other embodiments, the antibody can be coupled both founders). Accordingly, the invention features founder to a heterologous Substance, such as a toxin (e.g., ricin, diph animals, their progeny, cells or populations of cells obtained theria toxin, or active fragments thereof), another type of therefrom, and proteins obtained therefrom. For example, a therapeutic agent (e.g., an antibiotic), or a detectable label. A nucleic acid of the invention can be placed under the control detectable label can include an enzyme (e.g., horseradish of a promoter that directs expression of the encoded protein in peroxidase, alkaline phosphatase, B-galactosidase, or acetyl the milk or eggs of the transgenic animal. The protein can then cholinesterase), a prosthetic group (e.g., Streptavidin/biotin be purified or recovered from the animal's milk or eggs. and avidin/biotin), or a fluorescent, luminescent, biolumines Animals suitable for Such purpose include pigs, cows, goats, cent, or radioactive material. (e.g., umbelliferone, fluores sheep, and chickens. cein, fluorescein isothiocyanate, rhodamine, dichlorotriazi 0357 Biomolecular sequences of the present invention nylamine fluorescein, dansyl chloride or phycoerythrin can be classified to functional groups based on known activity (which are fluorescent), luminol (which is luminescent), of homologous sequences. This functional group classifica luciferase, luciferin, and aequorin (which are biolumines tion, allows the identification of diseases and conditions, cent), and 'I, I, S or H (which are radioactive)). which may be diagnosed and treated based on the novel 0352. The antibodies of the invention (e.g., a monoclonal sequence information and annotations as described in the antibody) can be used to isolate the proteins of the invention present invention. US 2011/0091454 A1 Apr. 21, 2011 27

0358. This functional group classification includes the fol 0372 Hydrolases Acting on Amino Acids: lowing groups: 0373 The phrase “hydrolases acting on amino acids' 0359 Proteins Involved in Drug-Drug Interactions: refers to hydrolases acting on a pair of amino acids. 0360. The phrase “proteins involved in drug-drug interac 0374 Pharmaceutical compositions including such pro tions' refers to proteins involved in a biological process teins or protein encoding sequences, antibodies directed which mediates the interaction between at least two con against Such proteins or polynucleotides capable of altering Sumed drugs. expression of Such proteins, may be used to treat diseases in 0361 Pharmaceutical compositions including such pro which the transfer of a glycosyl chemical group from one teins or protein encoding sequences, antibodies directed molecule to another is abnormal thus, a beneficial effect may against Such proteins or polynucleotides capable of altering beachieved by modulation of such reaction. Antibodies and expression of Such proteins, may be used to modulate drug polynucleotides such as PCR primers and molecular probes drug interactions. Antibodies and polynucleotides such as designed to identify such proteins or protein encoding PCR primers and molecular probes designed to identify such sequences may be used for diagnosis of Such diseases. proteins or protein encoding sequences may be used for diag 0375 Examples of such diseases include, but are not lim nosis of Such drug-drug interactions. ited to reperfusion of clotted blood vessels by TPA (Tissue 0362. Examples of these conditions include, but are not Plasminogen Activator) which converts the abundant, but limited to the cytochrom P450 protein family, which is inactive, Zymogen plasminogen to plasmin by hydrolyzing a involved in the metabolism of many drugs. single ARG-VAL bond in plasminogen. 0363 Examples of proteins involved in drug-drug interac 0376 Transaminases: tions are listed in Table 16, below. 0377 The term “transaminases” refers to enzymes trans 0364 Proteins Involved in the Metabolism of a Pro-Drug ferring an amine group from one compound to another. to a Drug: 0378 Pharmaceutical compositions including such pro teins or protein encoding sequences, antibodies directed 0365. The phrase “proteins involved in the metabolism of against Such proteins or polynucleotides capable of altering a pro-drug to a drug” refers to proteins that activate an inactive pro-drug by chemically chaining it into a biologically active expression of Such proteins, may be used to treat diseases in compound. Preferably, the metabolizing enzyme is expressed which the transfer of an amine group from one molecule to another is abnormal thus, a beneficial effect may be achieved in the target tissue thus reducing systemic side effects. by modulation of Such reaction. Antibodies and polynucle 0366 Pharmaceutical compositions including such pro otides such as PCR primers and molecular probes designed to teins or protein encoding sequences, antibodies directed identify Such proteins or protein encoding sequences may be against Such proteins or polynucleotides capable of altering used for diagnosis of Such diseases. expression of Such proteins, may be used to modulate the 0379 Examples of such transaminases include, but are not metabolism of a pro-drug into drug. Antibodies and poly limited to two liver enzymes, frequently used as markers for nucleotides such as PCR primers and molecular probes liver function SGOT (Serum Glutamic-Oxalocetic Tran designed to identify Such proteins or protein encoding saminase AST) and SGPT (Serum Glutamic-Pyruvic Tran sequences may be used for diagnosis of Such conditions. saminase ALT). 0367 Examples of these proteins include, but are not lim 0380. Immunoglobulins: ited to esterases hydrolyzing the cholesterol lowering drug 0381. The term “immunoglobulins' refers to proteins that simvastatin into its hydroxy acid active form. are involved in the immune and complement systems such as 0368. MDR Proteins: antigens and autoantigens, immunoglobulins, MHC and 0369. The phrase “MDR proteins” refers to Multi Drug HLA proteins and their associated proteins. Resistance proteins that are responsible for the resistance of a 0382 Pharmaceutical compositions including such pro cell to a range of drugs, usually by exporting these drugs teins or protein encoding sequences, antibodies directed outside the cell. Preferably, the MDR proteins are ABC bind against Such proteins or polynucleotides capable of altering ing cassette proteins. Preferably, drug resistance is associated expression of Such proteins, may be used to treat diseases with resistance to chemotherapy. involving the immune system Such as inflammation, autoim 0370 Pharmaceutical compositions including such pro mune diseases, infectious diseases, and cancerous processes. teins or protein encoding sequences, antibodies directed Antibodies and polynucleotides such as PCR primers and against Such proteins or polynucleotides capable of altering molecular probes designed to identify Such proteins or pro expression of Such proteins, may be used to treat diseases in tein encoding sequences may be used for diagnosis of Such which the transport of molecules and macromolecules Such as diseases. neurotransmitters, hormones, Sugar etc. is abnormal leading 0383 Examples of such diseases and molecules that may to various pathologies. Antibodies and polynucleotides Such be target for diagnostics include, but are not limited to mem as PCR primers and molecular probes designed to identify bers of the complement family such as C3 and C4 that their Such proteins or protein encoding sequences may be used for blood level is used forevaluation of autoimmune diseases and diagnosis of such diseases. allergy state and C1 inhibitor that its absence is associated 0371 Examples of MDR proteins include, but are not with angioedema. Thus, new variants of these genes are limited to the multi-drug resistant transporter MDR1/P-gly expected to be markers for similar events. Mutation in vari coprotein, which is the gene product of MDR1, belonging to ants of the complement family may be associated with other the ATP-binding cassette (ABC) superfamily of membrane immunological syndromes, such as increased bacterial infec transporters. This protein was shown to increase the resis tion that is associated with mutation in C3. C1 inhibitor was tance of malignant cells to therapy by exporting the therapeu shown to provide safe and effective inhibition of complement tic agent out of the cell. activation after reperfused acute myocardial infarction and US 2011/0091454 A1 Apr. 21, 2011 28 may reduce myocardial injury Eur. Heart J. 2002, 23(21): kinase, Calpain, calcium-dependent protein serine/threonine 1670-7, thus, its variant may have the same or improved phosphatase, calcium sensing proteins, calcium storage pro effect. teins. 0384 Transcription Factor Binding: 0394 Pharmaceutical compositions including such pro 0385. The phrase “transcription factor binding refers to teins or protein encoding sequences, antibodies directed proteins involved in transcription process by binding to against Such proteins or polynucleotides capable of altering nucleic acids, such as transcription factors, RNA and DNA expression of Such proteins, may be used to treat calcium binding proteins, Zinc fingers, helicase, isomerase, histones, involved diseases. Antibodies and polynucleotides such as and nucleases. PCR primers and molecular probes designed to identify such 0386 Pharmaceutical compositions including such pro proteins or protein encoding sequences may be used for diag teins or protein encoding sequences, antibodies directed nosis of Such diseases. against Such proteins or polynucleotides capable of altering expression of Such proteins may be used to treat diseases 0395. Examples of such diseases include, but are not lim involving transcription factors binding proteins. Such treat ited to diseases related to hypercalcemia, hypertension, car ment may be based on transcription factor that can be used to diovascular disease, muscle diseases, gastro-intestinal dis for modulation of gene expression associated with the dis eases, uterus relaxing, and uterus. An example for therapy use ease. Antibodies and polynucleotides such as PCR primers of calcium binding proteins variant may be treatment of emer and molecular probes designed to identify such proteins or gency cases of hypercalcemia, with secreted variants of cal protein encoding sequences may be used for diagnosis of cium storage proteins. Such diseases. 0396 Oxidoreductase: 0387 Examples of such diseases include, but are not lim 0397) The term “oxidoreductase' refers to enzymes that ited to breast cancer associated with ErbB-2 expression that catalyze the removal of hydrogen atoms and electrons from was shown to be successfully modulated by a transcription the compounds on which they act. Preferably, oxidoreduc factor Proc. Natl. Acad. Sci. USA. 2000, 97(4): 1495-500. tases acting on the following groups of donors: CH-OH, Examples of novel transcription factors used for therapeutic CH-CH, CH NH, CH NH; oxidoreductases acting on protein production include, but are not limited to those NADH or NADPH, nitrogenous compounds, sulfur group of described for Erythropoietin production J. Biol. Chem. donors, heme group, hydrogen group, diphenols and related 2000, 275(43):33.850-60; J. Biol. Chem. 2000, Substances as donors: oxidoreductases acting on peroxide as 275(43):33850-60 and zinc fingers proteintranscription fac acceptor, superoxide radicals as acceptor, oxidizing metal tors (ZFP-TF) variants J. Biol. Chem. 2000,275(43):33.850 ions, CH groups: oxidoreductases acting on reduced ferre 60. doxin as donor, oxidoreductases acting on reduced fla 0388 Small GTPase Regulatory/Interacting Proteins: Vodoxin as donor, and oxidoreductases acting on the alde 0389. The phrase “Small GTPase regulatory/interacting hyde or oxo group of donors. proteins’ refers to proteins capable of regulating or interact 0398 Pharmaceutical compositions including such pro ing with GTPase such as RAB escort protein, guanyl-nucle teins or protein encoding sequences, antibodies directed otide exchange factor, guanyl-nucleotide exchange factor against Such proteins or polynucleotides capable of altering adaptor, GDP-dissociation inhibitor, GTPase inhibitor, expression of Such proteins, may be used to treat diseases GTPase activator, guanyl-nucleotide releasing factor, GDP caused by abnormal activity of oxidoreductases. Antibodies dissociation stimulator, regulator of G-protein signaling, and polynucleotides such as PCR primers and molecular RAS interactor, RHO interactor, RAB interactor, and RAL probes designed to identify such proteins or protein encoding interactor. sequences may be used for diagnosis of Such diseases. 0390 Pharmaceutical compositions including such pro 0399. Examples of such diseases include, but are not lim teins or protein encoding sequences, antibodies directed ited to malignant and autoimmune diseases in which the against Such proteins or polynucleotides capable of altering enzyme DHFR (DiHydroFolateReductase) that participates expression of Such proteins, may be used to treat diseases in in folate metabolism and essential for de novo glycine and which G-proteases mediated signal-transduction is abnormal, purine synthesis is the target for the widely used drug Meth either as a cause, or as a result of the disease. Antibodies and otrexate (MTX). polynucleotides such as PCR primers and molecular probes (0400 Receptors: designed to identify Such proteins or protein encoding 04.01 The term “receptors' refers to protein-binding sites sequences may be used for diagnosis of Such diseases. on a cell's Surface or interior, that recognize and binds to 0391 Examples of such diseases include, but are not lim specific messenger molecule leading to a biological response, ited to diseases related to prenylation. Modulation of preny Such as signal transducers, complement receptors, ligand lation was shown to affect therapy of diseases such as dependent nuclear receptors, transmembrane receptors, GPI osteoporosis, ischemic heart disease, and inflammatory pro anchored membrane-bound receptors, various coreceptors, cesses. Small GTPases regulatory/interacting proteins are internalization receptors, receptors to neurotransmitters, hor major component in the prenylation post translation modifi mones and various other effectors and ligands. cation, and are required to the normal activity of prenylated 0402 Pharmaceutical compositions including such pro proteins. Thus, their variants may be used for therapy of teins or protein encoding sequences, antibodies directed prenylation associated diseases. against Such proteins or polynucleotides capable of altering 0392 Calcium Binding Proteins: expression of Such proteins, may be used to treat diseases 0393. The phrase “calcium binding proteins’ refers to pro caused by abnormal activity of receptors, preferably, recep teins involve in calcium binding, preferably, calcium binding tors to neurotransmitters, hormones and various other effec proteins, ligand binding or carriers, such as diacylglycerol tors and ligands. Antibodies and polynucleotides such as PCR US 2011/0091454 A1 Apr. 21, 2011 29 primers and molecular probes designed to identify Such pro Therefore, serine/threonine kinases specific for the 5-HT(2A) teins or protein encoding sequences may be used for diagno serotonin receptor may serve as drug targets for a disease Such sis of Such diseases. as Schizophrenia. Other diseases that may be treated through 0403. Examples of such diseases include, but are not lim serine/thereonine kinases modulation are Peutz-Jeghers Syn ited to, chronic myelomonocytic leukemia caused by growth drome (PJS, a rare autosomal-dominant disorder character factor B receptor deficiency Rao D. S., et al., (2001) Mol. ized by hamartomatous polyposis of the gastrointestinal tract Cell. Biol., 21 (22):7796-806, thrombosis associated with and melanin pigmentation of the skin and mucous mem protease-activated receptor deficiency Sambrano G. R., et branes Hum. Mutat. 2000, 16(1):23–30), breast cancer On al., (2001) Nature, 413 (6851):26-7, hypercholesterolemia cogene. 1999, 18(35):4968-73). Type 2 diabetes insulin resis associated with low density lipoprotein receptor deficiency tance Am. J. Cardiol. 2002, 90(5A): 11G-18G, and fanconi Koivisto U. M., et al., (2001) Cell, 105(5):575-85), familial anemia Blood. 2001, 98(13):3650-7. Hibernian fever associated with tumor necrosis factor recep 04.09 Channel/Pore Class Transporters: tor deficiency Simon A., et al., (2001) Ned Tijdschr 0410 The phrase “Channel/pore class transporters' refers Geneeskd, 145(2):77-8, colitis associated with immunoglo to proteins that mediate the transport of molecules and mac bulin E receptor expression Dombrowicz, D., et al., (2001).J. romolecules across membranes, such as C-type channels, Exp. Med., 193(1):25-34), and alagile syndrome associated porins, and pore-forming toxins. with Jagged 1 Stankiewicz. P. et al., (2001) Am. J. Med. 0411 Pharmaceutical compositions including Such pro Genet., 103(2):166-71, breast cancer associated with teins or protein encoding sequences, antibodies directed mutated BRCA2 and androgen, hypertension associated with against Such proteins or polynucleotides capable of altering B and C. adrenergic receptors, diabetes associated with the expression of Such proteins, may be used to treat diseases in insulin receptor. Therapeutic applications of nuclear recep which the transport of molecules and macromolecules are tors variants may be based on Secreted version of receptors abnormal, therefore leading to various pathologies. Antibod Such as the thyroid nuclear receptor that by binding plasma ies and polynucleotides such as PCR primers and molecular free thyroid hormone to reduce its levels may have a thera probes designed to identify such proteins or protein encoding peutic effect in cases of thyrotoxicosis. A secreted version of sequences may be used for diagnosis of Such diseases. glucocorticoid nuclear receptor, by binding plasma free cor 0412. Examples of such diseases include, but are not lim tisol, thus, reducing, may have a therapeutic effect in cases of ited to diseases of the nerves system such as Parkinson, dis Cushing's disease (a disease associated with high cortisole eases of the hormonal system, diabetes and infectious dis levels in the plasma). eases such as bacterial and fungal infections. One specific 04.04] Secreted soluble TNF receptor is an example for a example is the C-hemolysin, which is produced by S. aureus molecule, which can be used to treat conditions in which creating ion conductive pores in the cell membrane, thereby downregulation of TNF levels or activity is beneficial, includ deminishing its integrity. ing, but not limited to, Rheumatoid Arthritis, Juvenile Rheu 0413 Hydrolases, Acting on Acid Anhydrides: matoid Arthritis, Psoriatic Arthritis and Ankylosing 0414. The phrase “hydrolases, acting on acid anhydrides’ Spondylitis. refers to hydrolytic enzymes that are acting on acid anhy 04.05 Protein Serine/Threonine Kinases: drides, such as hydrolases acting on acid anhydrides in phos 0406. The phrase “protein serine/threonine kinases” refers phorus-containing anhydrides or in Sulfonyl-containing to proteins which phosphorylate serine/threonine residues, anhydrides, hydrolases catalyzing transmembrane movement mainly involved in signal transduction, Such as transmem of substances, and involved in cellular and subcellular move brane receptor protein serine/threonine kinase, 3-phosphoi ment. nositide-dependent protein kinase, DNA-dependent protein 0415 Pharmaceutical compositions including such pro kinase, G-protein-coupled receptor phosphorylating protein teins or protein encoding sequences, antibodies directed kinase, SNF1A/AMP-activated protein kinase, casein kinase, against Such proteins or polynucleotides capable of altering calmodulin regulated protein kinase, cyclic-nucleotide expression of Such proteins may be used to treat diseases in dependent protein kinase, cyclin-dependent protein kinase, which the hydrolase-related activities are abnormal. Antibod eukaryotic translation initiation factor 2C. kinase, galactosyl ies and polynucleotides such as PCR primers and molecular transferase-associated kinase, glycogen synthase kinase 3. probes designed to identify such proteins or protein encoding protein kinase C, receptor signaling protein serine/threonine sequences may be used for diagnosis of Such diseases. kinase, ribosomal protein S6 kinase, and IkB kinase. 0416 Examples of such diseases include, but are not lim 04.07 Pharmaceutical compositions including such pro ited to glaucoma treated with carbonic anhydrase inhibitors teins or protein encoding sequences, antibodies directed (e.g. Dorzolamide), peptic ulcer disease treated with H()K against Such proteins or polynucleotides capable of altering ()ATPase inhibitors that were shown to affect disease by expression of Such proteins, may be used to treat diseases blocking gastric carbonic anhydrase (e.g. Omeprazole). ameliorated by a modulating kinase activity. Antibodies and 0417 , Transferring Phosphorus-Containing polynucleotides such as PCR primers and molecular probes Groups: designed to identify Such proteins or protein encoding 0418. The phrase “transferases, transferring phosphorus sequences may be used for diagnosis of Such diseases. containing groups' refers to enzymes that catalyze the trans 04.08 Examples of such diseases include, but are not lim fer of phosphate from one molecule to another, Such as phos ited to schizophrenia. 5-HT(2A) serotonin receptor is the photransferases using the following groups as acceptors: principal molecular target for LSD-like hallucinogens and alcohol group, carboxyl group, nitrogenous group, phos atypical antipsychotic drugs. It has been shown that a major phate; with regeneration of donors cata mechanism for the attenuation of this receptor signaling fol lyzing intramolecular transfers; ; lowing agonist activation typically involves the phosphoryla nucleotidyltransferase; and phosphotransferases for other tion of serine and/or threonine residues by various kinases. Substituted phosphate groups. US 2011/0091454 A1 Apr. 21, 2011 30

0419 Pharmaceutical compositions including such pro 0428 Examples of such diseases include, but are not lim teins or protein encoding sequences, antibodies directed ited to C-1 antitrypsin (a natural serine proteases, which pro against Such proteins or polynucleotides capable of altering tects the lung and liver from proteolysis) deficiency associ expression of Such proteins may be used to treat diseases in ated with emphysema, COPD and liver chirosis. C-1 which the transfer of a phosphorous containing functional antitrypsin is also used for diagnostics in cases of unexplained group to a modulated moiety is abnormal. Antibodies and liver and lung disease. A variant of this enzyme may act as polynucleotides such as PCR primers and molecular probes protease inhibitor or a diagnostic target for related diseases. designed to identify Such proteins or protein encoding Electron Transporters: sequences may be used for diagnosis of Such diseases. 0429 0430. The term “Electron transporters' refers to ligand 0420 Examples of such diseases include, but are not lim binding or carrier proteins involved in electron transport Such ited to acute MI Ann. Emerg. Med. 2003, 42(3):343-50, as flavin-containing electron transporter, cytochromes, elec Cancer Oral. Dis. 2003, 9(3): 119-28; J. Surg. Res. 2003, tron donors, electron acceptors, electron carriers, and cyto 113(1):102-8) and Alzheimer's disease Am. J. Pathol. 2003, chrome-c oxidases. 163(3):845-58. Examples for possible utilities of such trans ferases for drug improvement include, but are not limited to 043.1 Pharmaceutical compositions including such pro aminoglycosides treatment (antibiotics) to which resistance teins or protein encoding sequences, antibodies directed is mediated by aminoglycoside phosphotransferases Front. against Such proteins or polynucleotides capable of altering Biosci. 1999, 1; 4:D9-21. Using aminoglycoside phospho expression of Such proteins, may be used to treat diseases in transferases variants or inhibiting these enzymes may reduce which beneficial effect may be achieved by modulating the aminoglycosides resistance. Since aminoglycosides can be activity of electron transporters. Antibodies and polynucle toxic to Some patients, proving the expression of aminogly otides such as PCR primers and molecular probes designed to coside phosphotransferases in a patient can deter from treat identify Such proteins or protein encoding sequences may be ing him with aminoglycosides and risking the patient in vain. used for diagnosis of Such diseases. 0432 Examples of such diseases include, but are not lim 0421 Phosphoric Monoester Hydrolases: ited to cyanide toxicity, resulting from cyanide binding to 0422 The phrase “phosphoric monoester hydrolases” ubiquitous metalloenzymes rendering them inactive, and refers to hydrolytic enzymes that are acting on ester bonds, Such as nuclease, Sulfuric ester hydrolase, carboxylic ester interfering with the electron transport. Novel electron trans hydrolase, thiolester hydrolase, phosphoric monoester porters to which cyanide can bind may serve as drug targets hydrolase, phosphoric diester hydrolase, triphosphoric for new cyanide antidotes. monoester hydrolase, diphosphoric monoester hydrolase, 0433 Transferases, Transferring Glycosyl Groups: and phosphoric triester hydrolase. 0434. The phrase “transferases, transferring glycosyl 0423 Pharmaceutical compositions including Such pro groups' refers to enzymes that catalyze the transfer of a teins or protein encoding sequences, antibodies directed glycosyl chemical group from one molecule to another Such against Such proteins or polynucleotides capable of altering as murein lytic endotransglycosylase E, and sialyltransferase. expression of Such proteins, may be used to treat diseases in 0435 Pharmaceutical compositions including such pro which the hydrolytic cleavage of a covalent bond with accom teins or protein encoding sequences, antibodies directed panying addition of water (—H being added to one product of against Such proteins or polynucleotides capable of altering the cleavage and—OH to the other), is abnormal. Antibodies expression of Such proteins, may be used to treat diseases in and polynucleotides such as PCR primers and molecular which the transfer of a glycosyl chemical group is abnormal. probes designed to identify Such proteins or protein encoding Antibodies and polynucleotides such as PCR primers and sequences may be used for diagnosis of Such diseases. molecular probes designed to identify Such proteins or pro 0424 Examples of such diseases include, but are not lim tein encoding sequences may be used for diagnosis of Such ited to diabetes and CNS diseases such as Parkinson and diseases. CaCC. 0436 Ligases, Forming Carbon-Oxygen Bonds: 0425 Enzyme Inhibitors: 0437. The phrase “ligases, forming carbon-oxygen 0426. The term “enzyme inhibitors' refers to inhibitors bonds' refers to enzymes that catalyze the linkage between and Suppressors of other proteins and enzymes, such as carbon and oxygen Such as ligase forming aminoacyl-tRNA inhibitors of kinases, phosphatases, chaperones, guanylate and related compounds. cyclase, DNA gyrase, ribonuclease, proteasome inhibitors, 0438 Pharmaceutical compositions including such pro diazepam-binding inhibitor, ornithine decarboxylase inhibi teins or protein encoding sequences, antibodies directed tor, GTPase inhibitors, dUTP pyrophosphatase inhibitor, against Such proteins or polynucleotides capable of altering phospholipase inhibitor, proteinase inhibitor, protein biosyn expression of Such proteins, may be used to treat diseases in thesis inhibitors, and C.-amylase inhibitors. which the linkage between carbon and oxygen in an energy 0427 Pharmaceutical compositions including such pro dependent process is abnormal. Antibodies and polynucle teins or protein encoding sequences, antibodies directed otides such as PCR primers and molecular probes designed to against Such proteins or polynucleotides capable of altering identify Such proteins or protein encoding sequences may be expression of Such proteins, may be used to treat diseases in used for diagnosis of Such diseases. which beneficial effect may be achieved by modulating the 0439 Ligases: activity of inhibitors and Suppressors of proteins and 0440 The term “ligases’ refers to enzymes that catalyze enzymes. Antibodies and polynucleotides such as PCR prim the linkage of two molecules, generally utilizing ATP as the ers and molecular probes designed to identify such proteins or energy donor, also called synthetase. Examples for ligases are protein encoding sequences may be used for diagnosis of enzymes such as B-alanyl-dopamine hydrolase, carbon-oxy Such diseases. gen bonds forming ligase, carbon-sulfur bonds forming US 2011/0091454 A1 Apr. 21, 2011 ligase, carbon-nitrogen bonds forming ligase, carbon-carbon phosphate 3-kinase, phosphofructokinase, pyridoxal kinase, bonds forming ligase, and phosphoric ester bonds forming sphinganine kinase, riboflavin kinase, 2-dehydro-3-deoxyga ligase. lactonokinase, 2-dehydro-3-deoxygluconokinase, 4-diphos 0441 Pharmaceutical compositions including Such pro phocytidyl-2C-methyl-D-erythritol kinase, GTP pyrophos teins or protein encoding sequences, antibodies directed phokinase, L-fuculokinase, L-ribulokinase, L-xylulokinase, against Such proteins or polynucleotides capable of altering isocitrate dehydrogenase (NADP) kinase, acetate kinase, expression of Such proteins, may be used to treat diseases in allose kinase, carbamate kinase, cobinamide kinase, diphos which the joining together of two molecules in an energy phate-purine nucleoside kinase, fructokinase, glycerate dependent process is abnormal. Antibodies and polynucle kinase, hydroxymethylpyrimidine kinase, hygromycin-B otides such as PCR primers and molecular probes designed to kinase, inosine kinase, kanamycin kinase, phosphomethylpy identify Such proteins or protein encoding sequences may be rimidine kinase, phosphoribulokinase, polyphosphate kinase, used for diagnosis of Such diseases. propionate kinase, pyruvate,water dikinase, rhamnulokinase, 0442. Examples of such diseases include, but are not lim tagatose-6-phosphate kinase, tetraacyldisaccharide 4'-kinase, ited to neurological disorders such as Parkinson's disease thiamine-phosphate kinase, undecaprenol kinase, uridylate Science. 2003, 302(5646):819-22; J. Neurol. 2003, 250 kinase, N-acylmannosamine kinase, D-erythro-sphingosine Suppl. 3:11125-11129 or epilepsy Nat. Genet. 2003, 35(2): kinase. 125-7, cancerous diseases Cancer Res. 2003, 63(17):5428 0449 Pharmaceutical compositions including such pro 37: Lab. Invest. 2003, 83(9): 1255-65, renal diseases Am. J. teins or protein encoding sequences, antibodies directed Pathol. 2003, 163(4): 1645-52), infectious diseases Arch. against Such proteins or polynucleotides capable of altering Virol. 2003, 148(9):1851-62 and fanconi anemia Nat. expression of Such proteins, may be used to treat diseases Genet. 2003, 35(2):165-70. which may be ameliorated by a modulating kinase activity. 0443 Hydrolases, Acting on Glycosyl Bonds: Antibodies and polynucleotides such as PCR primers and 0444 The phrase “hydrolases, acting on glycosyl bonds' molecular probes designed to identify Such proteins or pro refers to hydrolytic enzymes that are acting on glycosyl bonds tein encoding sequences may be used for diagnosis of Such Such as hydrolases hydrolyzing N-glycosyl compounds, diseases. S-glycosyl compounds, and O-glycosyl compounds. 0450 Examples of such diseases include, but are not lim 0445 Pharmaceutical compositions including such pro ited to, acute lymphoblastic leukemia associated with spleen teins or protein encoding sequences, antibodies directed tyrosine kinase deficiency Goodman P. A., et al., (2001) against Such proteins or polynucleotides capable of altering Oncogene, 20(30):3969-78, ataxiatelangiectasia associated expression of Such proteins, may be used to treat diseases in with ATM kinase deficiency Boultwood J., (2001) J. Clin. which the hydrolase-related activities are abnormal. Antibod Pathol. 54(7):51 2-6, congenital haemolytic anaemia asso ies and polynucleotides such as PCR primers and molecular ciated with erythrocyte pyruvate kinase deficiency Zanella probes designed to identify Such proteins or protein encoding A., et al., (2001) Br. J. Haematol., 113(1):43-8), mevalonic sequences may be used for diagnosis of Such diseases. aciduria caused by mevalonate kinase deficiency Houten S. 0446. Examples of such diseases include cancerous dis M., et al., (2001) Eur. J. Hum. Genet., 9(4): 253-9, and acute eases J. Natl. Cancer Inst. 2003, 95(17): 1263-5; Carcinogen myelogenous leukemia associated with over-expressed esis. 2003, 24(7): 1281-2; author reply 1283 vascular dis death-associated protein kinase Guzman M. L., et al., (2001) eases J. Thorac. Cardiovasc. Surg. 2003, 126(2):344-57. Blood, 97(7):2177-9). gastrointestinal diseases such as colitis J. Immunol. 2003, 0451) Nucleotide Binding: 171(3):1556-63 or liver fibrosis World J. Gastroenterol. 0452. The term “nucleotide binding refers to ligand bind 2002, 8(5):901-7). ing or carrier proteins, involved in physical interaction with a 0447 Kinases: nucleotide, preferably, any compound consisting of a nucleo 0448. The term “kinases’ refers to enzymes which phos side that is esterified with orthophosphate or an oligophos phorylate serine/threonine or tyrosine residues, mainly phate at any hydroxyl group on the glycose moiety. Such as involved in signal transduction. Examples for kinases include purine nucleotide binding proteins. enzymes such as 2-amino-4-hydroxy-6-hydroxymethyldihy 0453 Pharmaceutical compositions including such pro dropteridine pyrophosphokinase, NAD() kinase, acetyl teins or protein encoding sequences, antibodies directed glutamate kinase, adenosine kinase, adenylate kinase, ade against Such proteins or polynucleotides capable of altering nylsulfate kinase, arginine kinase, aspartate kinase, choline expression of such proteins, may be used to treat diseases that kinase, creatine kinase, cytidylate kinase, deoxyadenosine are associated with abnormal nucleotide binding. Antibodies kinase, deoxycytidine kinase, deoxyguanosine kinase, and polynucleotides such as PCR primers and molecular dephospho-CoA kinase, diacylglycerol kinase, dolichol probes designed to identify such proteins or protein encoding kinase, ethanolamine kinase, galactokinase, glucokinase, sequences may be used for diagnosis of Such diseases. glutamate 5-kinase, glycerol kinase, glycerone kinase, gua 0454. Examples of such diseases include, but are not lim nylate kinase, hexokinase, homoserine kinase, hydroxyeth ited to Gout (a syndrome characterized by high urate level in ylthiazole kinase, inositol/phosphatidylinositol kinase, keto the blood). Since urate is a breakdown metabolite of purines, hexokinase, mevalonate kinase, nucleoside-diphosphate reducing purines serum levels could have a therapeutic effect kinase, pantothenate kinase, phosphoenolpyruvate carbox in Gout disease. ykinase, phosphoglycerate kinase, phosphomevalonate 0455 Tubulin Binding: kinase, protein kinase, pyruvate dehydrogenase (lipoamide) 0456. The term “tubulin binding refers to binding pro kinase, pyruvate kinase, ribokinase, ribose-phosphate pyro teins that bind tubulin such as microtubule binding proteins. phosphokinase, selenide, water dikinase, shikimate kinase, 0457 Pharmaceutical compositions including such pro thiamine pyrophosphokinase, thymidine kinase, thymidylate teins or protein encoding sequences, antibodies directed kinase, uridine kinase, Xylulokinase, 1D-myo-inositol-tris against Such proteins or polynucleotides capable of altering US 2011/0091454 A1 Apr. 21, 2011 32 expression of Such proteins, may be used to treat diseases 0463 Molecular Function Unknown: which are associated with abnormal tubulin activity or struc 0464. The phrase “molecular function unknown refers to ture. Binding the products of the genes of this family, or various proteins with unknown molecular function, such as antibodies reactive therewith, can modulate a plurality of cell Surface antigens. tubulin activities as well as change microtubulin structure. 0465 Pharmaceutical compositions including such pro Antibodies and polynucleotides such as PCR primers and teins or protein encoding sequences, antibodies directed molecular probes designed to identify Such proteins or pro against Such proteins or polynucleotides capable of altering tein encoding sequences may be used for diagnosis of Such expression of Such proteins, may be used to treat diseases in diseases. which regulation of the recognition, or participation or bind of cell Surface antigens to other moieties may have therapeu 0458 Examples of such diseases include, but are not lim tic effect. Antibodies and polynucleotides such as PCR prim ited to, Alzheimer's disease associated with t-complex ers and molecular probes designed to identify such proteins or polypeptide 1 deficiency Schuller E., et al., (2001) Life Sci., protein encoding sequences may be used for diagnosis of 69(3):263-70, neurodegeneration associated with apoE defi Such diseases. ciency Masliah E., et al., (1995) Exp. Neurol., 136(2):107 0466. Examples of such diseases include, but are not lim 22, progressive axonopathy associated with disfuctional ited to, autoimmune diseases, various infectious diseases, neurofilaments Griffiths I. R., et al., (1989) Neuropathol. cancer diseases which involve non cell Surface antigens rec Appl. Neurobiol., 15(1):63-74, familial frontotemporal ognition and activity. dementia associated with tau deficiency astor P. et al., 0467 Enzyme Activators: (2001) Ann. Neurol. 49(2):263-7, and colon cancer Sup 0468. The term “enzyme activators' refers to enzyme pressed by APC White R. L., (1997) Pathol. Biol. (Paris), regulators such as activators of kinases, phosphatases, sph 45(3):240-4. En example for a drug whose target is tubulin is ingolipids, chaperones, guanylate cyclase, tryptophan the anticancer drug Taxol. Drugs having similar mecha hydroxylase, proteases, phospholipases, caspases, proprotein nism of action (interfering with tubulin polymerization) may convertase 2 activator, cyclin-dependent protein kinase 5 acti be developed based on tubulin binding proteins. vator, Superoxide-generating NADPH oxidase activator, sph 0459 Receptor Signaling Proteins: ingomyelin phosphodiesterase activator, monophenol 0460. The phrase “receptor signaling proteins’ refers to monooxygenase activator, proteasome activator, and GTPase receptor proteins involved in signal transduction Such as activator. receptor signaling protein serine/threonine kinase, receptor 0469 Pharmaceutical compositions including such pro signaling protein tyrosine kinase, receptor signaling protein teins or protein encoding sequences, antibodies directed tyrosine phosphatase, aryl hydrocarbon receptor nuclear against Such proteins or polynucleotides capable of altering translocator, hematopoeitin/interferon-class (D200-domain) expression of Such proteins, may be used to treat diseases in cytokine receptor signal transducer, transmembrane receptor which beneficial effect may be achieved by modulating the protein tyrosine kinase signaling protein, transmembrane activity of activators of proteins and enzymes. Antibodies and receptor protein serine/threonine kinase signaling protein, polynucleotides such as PCR primers and molecular probes receptor signaling protein serine/threonine kinase signaling designed to identify such proteins or protein encoding protein, receptor signaling protein serine/threonine phos sequences may be used for diagnosis of Such diseases. phatase signaling protein, Small GTPase regulatory/interact 0470 Examples of such diseases include, but are not lim ing protein, receptor signaling protein tyrosine kinase signal ited to all complement related diseases, as most complement ing protein, and receptor signaling protein serine/threonine proteins activate by cleavage other complement proteins. phosphatase. 0471 Transferases, Transferring One-Carbon Groups: 0461 Pharmaceutical compositions including Such pro 0472. The phrase “transferases, transferring one-carbon teins or protein encoding sequences, antibodies directed groups' refers enzymes that catalyze the transfer of a one against Such proteins or polynucleotides capable of altering carbon chemical group from one molecule to another Such as expression of Such proteins, may be used to treat diseases in methyltransferase, amidinotransferase, hydroxymethyl-, which the signal-transduction is abnormal, either as a cause, formyl- and related transferase, carboxyl- and carbamoyl or as a result of the disease. Antibodies and polynucleotides transferase. such as PCR primers and molecular probes designed to iden 0473 Pharmaceutical compositions including such pro tify such proteins or protein encoding sequences may be used teins or protein encoding sequences, antibodies directed for diagnosis of such diseases. against Such proteins or polynucleotides capable of altering 0462 Examples of such diseases include, but are not lim expression of Such proteins, may be used to treat diseases in ited to, complete hypogonadotropic hypogonadism associ which the transfer of a one-carbon chemical group from one ated with GnRH receptor deficiency Kottler M. L., et al., molecule to another is abnormal so that a beneficial effect (2000) J. Clin. Endocrinol. Metab., 85(9):3002-8), severe may be achieved by modulation of such reaction. Antibodies combined immunodeficiency disease associated with IL-7 and polynucleotides such as PCR primers and molecular receptor deficiency Puel A. and Leonard W. J., (2000) Curr. probes designed to identify such proteins or protein encoding Opin. Immunol. 12(4):468-7, schizophrenia associated sequences may be used for diagnosis of Such diseases. N-methyl-D-aspartate receptor deficiency Mohn A. R., et al., 0474 Transferases: (1999) Cell, 98(4):427-36, Yesinia-associated arthritis asso 0475. The term “transferases” refers to enzymes that cata ciated with tumor necrosis factor receptor p55 deficiency lyze the transfer of a chemical group, preferably, a phosphate Zhao Y. X., et al., (1999) Arthritis Rheum., 42(8): 1662-72), or amine from one molecule to another. It includes enzymes and Dwarfism of Sindh caused by growth hormone-releasing Such as transferases, transferring one-carbon groups, alde hormone receptor deficiency aheshwari H. G., et al., (1998) hyde or ketonic groups, acyl groups, glycosyl groups, alkyl or J. Clin. Endocrinol. Metab., 83 (11):4065-74). aryl (other than methyl) groups, nitrogenous, phosphorus US 2011/0091454 A1 Apr. 21, 2011 containing groups, Sulfur-containing groups, lipoyltrans kinase activity, cell adhesion receptor, neuroligin, calcium ferase, deoxycytidyl transferases. dependent cell adhesion molecule, selectin, calcium-inde 0476 Pharmaceutical compositions including such pro pendent cell adhesion molecule, and extracellular matrix pro teins or protein encoding sequences, antibodies directed tein. against Such proteins or polynucleotides capable of altering 0484 Pharmaceutical compositions including Such pro expression of Such proteins, may be used to treat diseases in teins or protein encoding sequences, antibodies directed which the transfer of a chemical group from one molecule to against Such proteins or polynucleotides capable of altering another is abnormal. Antibodies and polynucleotides such as expression of Such proteins, may be used to treat diseases in PCR primers and molecular probes designed to identify such which adhesion between adjoining cells is involved, typically proteins or protein encoding sequences may be used for diag conditions in which the adhesion is abnormal. Antibodies and nosis of Such diseases. polynucleotides such as PCR primers and molecular probes 0477 Examples of such diseases include, but are not lim designed to identify such proteins or protein encoding ited to cancerous diseases such as prostate cancer Urology. sequences may be used for diagnosis of Such diseases. 2003, 62(5 Suppl 1):55-62 or lung cancer Invest. New 0485 Examples of such diseases include, but are not lim Drugs. 2003, 21 (4):435-43; JAMA. 2003, 22:290(16):2149 ited to cancer in which abnormal adhesion may cause and 58), psychiatric disorders Am. J. Med. Genet. 2003, 15; enhance the process of metastasis and abnormal growth and 123B(1):64-9), colorectal disease such as Crohn's disease development of various tissues in which modulation adhesion Dis. Colon Rectum. 2003, 46(11): 1498-507 or celiac dis among adjoining cells can improve the condition. Leucocyte eases N Engl. J. Med. 2003, 349(17): 1673-4; author reply endothlial interactions characterized by adhesion molecules 1673-4, neurological diseases Such as Parkinson's disease J. involved in interactions between cells lead to a tissue injury Chem. Neuroanat. 2003, 26(2): 143-51), Alzheimer disease and ischemia reperfusion disorders in which activated signals Hum. Mol. Genet. 2003 21 or Charcot-Marie-Tooth Disease generated during ischemia may trigger an exuberant inflam Mol. Biol. Evol. 2003 31. matory response during reperfusion, provoking greater tissue 0478 Chaperones: damage than initial ischemic insult Crit. Care Med. 2002, 0479. The term “chaperones' refers to functional classes 30(5 Suppl): S214-9. The blockade of leucocyte-endothelial of unrelated families of proteins that assist the correct non adhesive interactions has the potential to reduce vascular and covalent assembly of other polypeptide-containing structures tissue injury. This blockade may be achieved using a soluble in Vivo, but are not components of these assembled structures variant of the adhesion molecule. when they a performing their normal biological function. The 0486 States of septic shock and ARDS involve large group of chaperones include proteins such as ribosomal chap recruitment of neutrophil cells to the damaged tissues. Neu erone, peptidylprolyl isomerase, lectin-binding chaperone, trophil cells bind to the endothelial cells in the target tissues nucleosome assembly chaperone, chaperonin ATPase, through adhesion molecules. Neutrophils possess multiple cochaperone, heat shock protein, HSP70/HSP90 organizing effector mechanisms that can produce endothelial and lung protein, fimbrial chaperone, metallochaperone, tubulin fold tissue injury, and interfere with pulmonary gas transfer by ing, and HSC70-interacting protein. disruption of surfactant activity Eur. J. Surg. 2002, 168(4): 0480 Pharmaceutical compositions including such pro 204-14. In such cases, the use of soluble variant of the teins or protein encoding sequences, antibodies directed adhesion molecule may decrease the adhesion of neutrophils against Such proteins or polynucleotides capable of altering to the damaged tissues. expression of Such proteins, may be used to treat diseases 0487. Examples of such diseases include, but are not lim which are associated with abnormal protein activity, struc ited to, Wiskott-Aldrich syndrome associated with WAS defi ture, degradation or accumulation of proteins. Antibodies and ciency Westerberg L., et al., (2001) Blood, 98(4): 1086-94, polynucleotides such as PCR primers and molecular probes asthma associated with intercellular adhesion molecule-1 designed to identify Such proteins or protein encoding deficiency Tang M. L. and Fiscus L. C., (2001) Pulm. Phar sequences may be used for diagnosis of Such diseases. macol. Ther., 14(3):203-10, intra-atrial thrombogenesis 0481 Examples of such diseases include, but are not lim associated with increased von Willebrand factor activity ited to neurological syndromes J. Neuropathol. Exp. Neurol. Fukuchi M., et al., (2001).J. Am. Coll. Cardiol., 37(5):1436 2003, 62(7):751–64; Antioxid Redox Signal. 2003, 5(3):337 42, junctional epidermolysis bullosa associated with laminin 48; J. Neurochem. 2003, 86(2):394-404, neurological dis 5-f-3 deficiency Robbins P. B., et al., (2001) Proc. Natl. eases such as Parkinson's disease Hum. Genet. 2003, 6: Acad. Sci., 98(9):5193-8, and hydrocephalus caused by neu Neurol Sci. 2003, 24(3): 159-60; J. Neurol. 2003, 250 Suppl ral adhesion molecule L1 deficiency Rolf B., et al., (2001) 3:III25-III29 ataxia J. Hum. Genet. 2003: 48(8):415-9 or Brain Res., 891 (1-2):247-52). Alzheimer diseases J. Mol. Neurosci. 2003, 20(3):283-6: J. 0488 Motor Proteins: Alzheimers Dis. 2003, 5(3): 171-7, cancerous diseases 0489. The term “motor proteins’ refers to proteins that Semin. Oncol. 2003,30(5):709-16), prostate cancer Semin. generate force or energy by the hydrolysis of ATP and that Oncol. 2003, 30(5):709-16 metabolic diseases J. Neuro function in the production of intracellular movement or trans chem. 2003, 87(1):248-56, infectious diseases, such as prion portation. Examples of Such proteins include microfilament infection EMBO J. 2003, 22(20):5435-5445). Chaperones motor, axonemal motor, microtubule motor, and kinetochore may be also used for manipulating therapeutic proteins bind motor (dynein, kinesin, or myosin). ing to their receptors therefore, improving their therapeutic 0490 Pharmaceutical compositions including such pro effect. teins or protein encoding sequences, antibodies directed 0482 Cell Adhesion Molecule: against Such proteins or polynucleotides capable of altering 0483 The phrase “cell adhesion molecule” refers to pro expression of Such proteins, may be used to treat diseases in teins that serve as adhesion molecules between adjoining which force or energy generation is impaired. Antibodies and cells such as membrane-associated protein with guanylate polynucleotides such as PCR primers and molecular probes US 2011/0091454 A1 Apr. 21, 2011 34 designed to identify Such proteins or protein encoding 0499 Transporters: sequences may be used for diagnosis of Such diseases. 0500. The term “transporters' refers to proteins that medi ate the transport of molecules and macromolecules, such as 0491 Examples of such diseases include, but are not lim channels, exchangers, and pumps. Transporters include pro ited to, malignant diseases where microtubules are drug tar teins such as: amine/polyamine transporter, lipid transporter, gets for a family of anticancer drugs such as myodystrophies neurotransmitter transporter, organic acid transporter, oxy and myopathies Trends Cell Biol. 2002, 12(12):585-91, gentransporter, water transporter, carriers, intracellular trans neurological disorders Neuron. 2003, 25; 40(1):25-40: ports, protein transporters, ion transporters, carbohydrate Trends Biochem. Sci. 2003, 28(10):558-65; Med. Genet. transporter, polyol transporter, amino acid transporters, Vita 2003, 40(9):671-5, and hearing impairment Trends Bio min/cofactor transporters, siderophore transporter, drug chem. Sci. 2003, 28(10):558-65). transporter, channel/pore class transporter, group transloca Defense/Immunity Proteins: tor, auxiliary transport proteins, permeases, murein trans 0492 porter, organic alcohol transporter, nucleobase, nucleoside, 0493. The term “defense/immunity proteins’ refers to pro and nucleotide and nucleic acid transporters. teins that are involved in the immune and complement sys 0501 Pharmaceutical compositions including such pro tems such as acute-phase response proteins, antimicrobial teins or protein encoding sequences, antibodies directed peptides, antiviral response proteins, blood coagulation fac against Such proteins or polynucleotides capable of altering tors, complement components, immunoglobulins, major his expression of Such proteins, may be used to treat diseases in tocompatibility complex antigens and opsonins. which the transport of molecules and macromolecules Such as 0494 Pharmaceutical compositions including such pro neurotransmitters, hormones, Sugar etc. is impaired leading teins or protein encoding sequences, antibodies directed to various pathologies. Antibodies and polynucleotides Such against Such proteins or polynucleotides capable of altering as PCR primers and molecular probes designed to identify expression of Such proteins, may be used to treat diseases Such proteins or protein encoding sequences may be used for involving the immunological system including inflammation, diagnosis of Such diseases. autoimmune diseases, infectious diseases, as well as cancer 0502. Examples of such diseases include, but are not lim ous processes or diseases which are manifested by abnormal ited to, glycogen storage disease caused by glucose-6-phos coagulation processes, which may include abnormal bleeding phate transporter deficiency Hiraiwa H., and Chou J. Y. or excessive coagulation. Antibodies and polynucleotides (2001) DNA Cell Biol., 20(8):447-53), tangier disease asso such as PCR primers and molecular probes designed to iden ciated with ATP-binding cassette transporter-1 deficiency tify such proteins or protein encoding sequences may be used McNeish.J., et al., (2000) Proc. Natl. Acad. Sci.,97(8):4245 for diagnosis of such diseases. 50, Systemic primary carnitine deficiency associated with 0495 Examples of such diseases include, but are not lim organic cation transporter deficiency Tang N. L., et al., ited to, late (C5-9) complement component deficiency asso (1999) Hum. Mol. Genet., 8(4):655-60, Wilson disease asso ciated with opsonin receptor allotypes Fijen C. A., et al., ciated with copper-transporting ATPases deficiency Payne (2000) Clin. Exp. Immunol. 120(2):338-45), combined A. S., et al., (1998) Proc. Natl. Acad. Sci. 95(18): 10854-9), immunodeficiency associated with defective expression of and atelosteogenesis associated with diastrophic dysplasia MHC class II genes Griscelli C., et al., (1989) Immunodefic. sulphate transporter deficiency Newbury-Ecob R., (1998) J. Rev. 1(2):135-53, loss of antiviral activity of CD4 T cells Med. Genet., 35(1):49-53, Central Nervous system diseases caused by neutralization of endogenous TNFC. Pavic I., et al., treated by inhibiting neurotransmitter transporter (e.g. (1993) J. Gen. Virol. 74 (Pt 10):2215-23), autoimmune dis Depression, treated with serotonin transporters inhibitors— eases associated with natural resistance-associated macroph Prozac), and Cystic fibrosis mediated by the chloride channel age protein deficiency Evans C. A., et al., (2001) Neuroge CFTR. Other transporter related diseases are cancer Onco netics, 3(2):69-78, Epstein-Barr virus-associated gene. 2003, 22(38):6005-12 and especially cancer resistant lymphoproliferative disease inhibited by combined GM-CSF to treatment Oncologist. 2003, 8(5):411-24; J. Med. Invest. and IL-2 therapy Baiocchi R. A., et al., (2001).J. Clin. Invest. 2003, 50(3–4): 126-35, infectious diseases, especially fungal 108(6):887–94), multiple sclerosis in which recombinant pro infections Annu. Rev. Phytopathol. 2003, 41:641-67, neu teins from the interferons family are the treatment of choice rological diseases, such as Parkinson FASEB.J. 2003, Sep. 4 and sepsis in which activated protein C is atherapeutic protein Epub ahead of print, diabetes where ATP-sensitive potas sium channel in beta cells is the target for insulin secreta itself. gogues, hypertension where calcium channels are the target 0496 Intracellular Transporters: for calcium blockers, and cardiovascular diseases, including 0497. The term “intracellular transporters’ refers to pro hypercholesterolemia Am. J. Cardiol. 2003, 92(4B):10K teins that mediate the transport of molecules and macromol 16K. ecules inside the cell. Such as intracellular nucleoside trans 0503. There are about 30 membrane transporter genes porter, vacuolar assembly proteins, Vesicle transporters, linked to a known genetic clinical syndrome. Secreted ver vesicle fusion proteins, type II protein secretors. sions of splice variants of transporters may be therapeutic as 0498 Pharmaceutical compositions including such pro the case with soluble receptors. These transporters may have teins or protein encoding sequences, antibodies directed the capability to bind the compound in the serum they would against Such proteins or polynucleotides capable of altering normally bind on the membrane. For example, a secreted expression of Such proteins, may be used to treat diseases in form ATP7B, a transporter involved in Wilson's disease, is which the transport of molecules and macromolecules is expected to bind plasma Copper, therefore have a desired abnormal leading to various pathologies. Antibodies and therapeutic effect in Wilson's disease. polynucleotides such as PCR primers and molecular probes (0504 Lyases: designed to identify Such proteins or protein encoding (0505. The term “lyases” refers to enzymes that catalyze sequences may be used for diagnosis of Such diseases. the formation of double bonds by removing chemical groups US 2011/0091454 A1 Apr. 21, 2011 from a substrate without hydrolysis or catalyze the addition of Scription factor binding, metarhodopsin binding, tubulin chemical groups to double bonds. It includes enzymes Such as binding, JUN kinase binding, RAN protein binding, protein carbon-carbon lyase, carbon-oxygen lyase, carbon-nitrogen signal sequence binding, importin C. export receptor, poly lyase, carbon-sulfur lyase, carbon-halide lyase, and phospho glutamine tract binding, protein carrier, B-catenin binding, rus-oxygen lyase. protein C-terminus binding, lipoprotein binding, cytoskeletal 0506 Pharmaceutical compositions including such pro protein binding protein, nuclear localization sequence bind teins or protein encoding sequences, antibodies directed ing, protein phosphatase 1 binding, adenylate cyclase bind against Such proteins or polynucleotides capable of altering ing, eukaryotic initiation factor 4E binding, calmodulin bind expression of Such proteins, may be used to treat diseases in ing, collagen binding, insulin-like growth factor binding, which the double bonds formation catalyzed by these lamin binding, profilin binding, tropomyosin binding, actin enzymes is impaired. Antibodies and polynucleotides such as binding, peroxisome targeting sequence binding, SNARE PCR primers and molecular probes designed to identify such binding, and cyclin binding. proteins or protein encoding sequences may be used for diag 0514 Pharmaceutical compositions including such pro nosis of Such diseases. teins or protein encoding sequences, antibodies directed 0507 Examples of such diseases include, but are not lim against Such proteins or polynucleotides capable of altering ited to, autoimmune diseases JAMA. 2003, 290(13): 1721-8; expression of Such proteins, may be used to treat diseases JAMA. 2003, 290(13): 1713-20), diabetes Diabetes. 2003, which are associated with impaired protein binding. Antibod 52(9): 2274-8), neurological disorders such as epilepsy J. ies and polynucleotides such as PCR primers and molecular Neurosci. 2003, 23(24):8471-9), Parkinson J. Neurosci. probes designed to identify such proteins or protein encoding 2003, 23(23):8302-9: Lancet. 2003, 362(9385):712 or sequences may be used for diagnosis of Such diseases. Creutzfeldt-Jakob disease Clin. Neurophysiol. 2003, 114(9): 0515. Examples of such diseases include, but are not lim 1724-8, and cancerous diseases J. Pathol. 2003, 201(1):37 ited to, neurological and psychiatric diseases J. Neurosci. 45: J. Pathol. 2003, 201(1):37-45; Cancer Res. 2003, 63(16): 2003, 23(25):8788-99; Neurobiol. Dis. 2003, 14(1): 146-56; 4952-9; Eur. J. Cancer. 2003, 39(13): 1899-903). J. Neurosci. 2003, 23(17):6956-64; Am. J. Pathol. 2003, 163 0508 Actin Binding Proteins: (2):609-19, and cancerous diseases Cancer Res. 2003, 0509. The phrase “actin binding proteins’ refers to pro 63(15):4299-304; Semin. Thromb. Hemost. 2003, 29(3):247 teins binding actin as actin cross-linking, actin bundling, 58; Proc. Natl. Acad. Sci. USA. 2003, 100(16):9506-11). F-actin capping, actin monomer binding, actin lateral bind 0516 Ligand Binding or Carrier Proteins: ing, actin depolymerizing, actin monomersequestering, actin 0517. The phrase “ligand binding or carrier proteins” filament severing, actin modulating, membrane associated refers to proteins involved in diverse biological functions actin binding, actin thin filament length regulation, and actin Such as: pyridoxal phosphate binding, carbohydrate binding, polymerizing proteins. magnesium binding, amino acid binding, cyclosporin Abind 0510) Pharmaceutical compositions including such pro ing, nickelbinding, chlorophyll binding, biotin binding, peni teins or protein encoding sequences, antibodies directed cillin binding, selenium binding, tocopherol binding, lipid against Such proteins or polynucleotides capable of altering binding, drug binding, oxygen transporter, electron trans expression of Such proteins, may be used to treat diseases in porter, steroid binding, juvenile hormone binding, retinoid which actin binding is impaired. Antibodies and polynucle binding, heavy metal binding, calcium binding, protein bind otides such as PCR primers and molecular probes designed to ing, glycosaminoglycan binding, folate binding, odorant identify Such proteins or protein encoding sequences may be binding, lipopolysaccharide binding and nucleotide binding. used for diagnosis of Such diseases. 0518) Pharmaceutical compositions including such pro 0511 Examples of such diseases include, but are not lim teins or protein encoding sequences, antibodies directed ited to, neuromuscular diseases such as muscular dystrophy against Such proteins or polynucleotides capable of altering Neurology. 2003, 61 (3):404-6. Cancerous diseases Urol expression of Such proteins, may be used to treat diseases ogy. 2003, 61(4):845-50; J. Cutan. Pathol. 2002, 29(7):430; which are associated with impaired function of these proteins. Cancer. 2002, 94(6):1777-86; Clin. Cancer Res. 2001, 7(8): Antibodies and polynucleotides such as PCR primers and 2415-24; Breast Cancer Res. Treat. 2001, 65(1):11-21), renal molecular probes designed to identify Such proteins or pro diseases such as glomerulonephritis J. Am. Soc. Nephrol. tein encoding sequences may be used for diagnosis of Such 2002, 13(2):322-31; Eur. J. Immunol. 2001, 31(4): 1221-7), diseases. and gastrointestinal diseases such as Crohn's disease J. Cell 0519 Examples of such diseases include, but are not lim Physiol. 2000, 182(2):303-9. ited to, neurological disorders J. Med. Genet. 2003, 40(10): 0512 Protein Binding Proteins: 733-40; J. Neuropathol. Exp. Neurol. 2003, 62(9):968-75; J. 0513. The phrase “protein binding proteins’ refers to pro Neurochem. 2003, 87(2):427-36, autoimmune diseases (N. teins involved in diverse biological functions through binding Engl.J.Med. 2003,349(16): 1526-33; JAMA. 2003, 290(13): other proteins. Examples of Such biological function include 1721-8; gastroesophageal reflux disease Dig. Dis. Sci. intermediate filament binding, LIM-domain binding, LLR 2003, 48(9): 1832-8), cardiovascular diseases J. Vasc. Surg. domain binding, clathrin binding, ARF binding, Vinculin 2003, 38(4):827-32, cancerous diseases Oncogene. 2003, binding, KU70 binding, troponin C binding PDZ-domain 22(43):6699-703: Br. J. Haematol. 2003, 123(2):288-96), binding, SH3-domain binding, fibroblast growth factor bind respiratory diseases Circulation. 2003, 108(15): 1839-44. ing, membrane-associated protein with guanylate kinase and ophtalmic diseases Ophthalmology. 2003, activity interacting, Wnt-protein binding, DEAD/H-box 110(10):2040-4: Am. J. Ophthalmol. 2003, 136(4):729-32). RNA helicase binding, B-amyloid binding, myosin binding, 0520 ATPases: TATA-binding protein binding DNA topoisomerase I bind 0521. The term “ATPases” refers to enzymes that catalyze ing, polypeptide hormone binding, RHO binding, FH1-do the hydrolysis of ATP to ADP releasing energy that is used in main binding, syntaxin-1 binding, HSC70-interacting, tran the cell. This group include enzymes such as plasma mem US 2011/0091454 A1 Apr. 21, 2011 36 brane cation-transporting ATPase, ATP-binding cassette against Such proteins or polynucleotides capable of altering (ABC) transporter, magnesium-ATPase, hydrogen-f sodium expression of Such proteins, may be used to treat diseases in translocating ATPase or ATPase translocating any other ele which the hydrolytic cleavage of a covalent bond with accom ments, arsenite-transporting ATPase, protein-transporting panying addition of water (-H being added to one product of ATPase, DNA translocase, P-type ATPase, and hydrolase, the cleavage and—OH to the other), is abnormal. Antibodies acting on acid anhydrides involved in cellular and Subcellular and polynucleotides such as PCR primers and molecular moVement. probes designed to identify such proteins or protein encoding 0522 Pharmaceutical compositions including such pro sequences may be used for diagnosis of Such diseases. teins or protein encoding sequences, antibodies directed 0531 Hydrolases: against Such proteins or polynucleotides capable of altering 0532. The term “hydrolases’ refers to hydrolytic enzymes expression of Such proteins, may be used to treat diseases Such as GPI-anchor transamidase, peptidases, hydrolases, which are associated with impaired conversion of the acting on ester bonds, glycosyl bonds, ether bonds, carbon hydrolysis of ATP to ADP or resulting energy use. Antibodies nitrogen (but not peptide) bonds, acid anhydrides, acid car and polynucleotides such as PCR primers and molecular bon-carbon bonds, acid halide bonds, acid phosphorus-nitro probes designed to identify Such proteins or protein encoding gen bonds, acid Sulfur-nitrogen bonds, acid carbon sequences may be used for diagnosis of Such diseases. phosphorus bonds, acid Sulfur-Sulfur bonds. 0523 Examples of such diseases include, but are not lim 0533 Pharmaceutical compositions including such pro ited to, infectious diseases Such as helicobacter pylori ulcers teins or protein encoding sequences, antibodies directed BMC Gastroenterology 2003, 3:31 (published 6 Nov. against Such proteins or polynucleotides capable of altering 2003), Neurological, muscular and psychiatric diseases Int. expression of Such proteins, may be used to treat diseases in J. Neurosci. 2003, 13(12): 1705-1717: Int. J. Neurosci. 2003, which the hydrolytic cleavage of a covalent bond with accom 113(11): 1579-1591; Ann. Neurol. 2003, 54(4):494-500), panying addition of water (-H being added to one product of Amyotrophic Lateral Sclerosis. Other Motor Neuron Disord. the cleavage and —OH to the other) is abnormal. Antibodies 2003 4(2):96-9, cardiovascular diseases J. Nippon. Med. and polynucleotides such as PCR primers and molecular Sch. 2003, 70(5):384-92; Endocrinology. 2003, probes designed to identify such proteins or protein encoding 144(10):4478-83), metabolic diseases Mol. Pathol. 2003, sequences may be used for diagnosis of Such diseases. 56(5):302-4; Neurosci. Lett. 2003,350(2):105-8), and peptic 0534 Examples of such diseases include, but are not lim ulcer disease treated with inhibitors of the gastric H"-K" ited to, cancerous diseases Cancer. 2003, 98(9): 1842-8; Can ATPase (e.g. Omeprazole) responsible for acid secretion in cer, 2003, 98(9):1822-9, neurological diseases such as Par the gastric mucosa. kinson diseases J. Neurol. 2003,250 Suppl3:B115-11124;J. 0524 Carboxylic Ester Hydrolases: Neurol. 2003, 250 Suppl 3:III2-III10, endocrinological dis 0525. The phrase carboxylic ester hydrolases” refers to eases such as pancreatitis Pancreas. 2003, 27(4):291-6 or hydrolytic enzymes acting on carboxylic ester bonds such as childhood genetic diseases Eur. J. Pediatr. 1997, 156(12): N-acetylglucosaminylphosphatidylinositol deacetylase, 935-8), coagulation diseases BMJ. 2003,327(7421):974-7, 2-acetyl-1-alkylglycerophosphocholine esterase, aminoacyl cardiovascular diseases Ann. Intern. Med. 2003, October tRNA hydrolase, arylesterase, carboxylesterase, cholinest 139(8):670-82, autoimmunity diseases J. Med. Genet. erase, gluconolactonase, sterol esterase, acetylesterase, car 2003, 40(10):761-6, and metabolic diseases Am. J. Hum. boxymethylenebutenolidase, protein-glutamate Genet. 2001, 69(5):1002-12). methylesterase, lipase, and 6-phosphogluconolactonase. 0535 Enzymes: 0526 Pharmaceutical compositions including such pro 0536 The term “enzymes' refers to naturally occurring or teins or protein encoding sequences, antibodies directed synthetic macromolecular substance composed mostly of against Such proteins or polynucleotides capable of altering protein, that catalyzes, to various degree of specificity, at least expression of Such proteins, may be used to treat diseases in one (bio)chemical reactions at relatively low temperatures. which the hydrolytic cleavage of a covalent bond with accom The action of RNA that has catalytic activity (ribozyme) is panying addition of water (—H being added to one product of often also regarded as enzymatic. Nevertheless, enzymes are the cleavage and —OH to the other) is abnormal so that a mainly proteinaceous and are often easily inactivated by heat beneficial effect may be achieved by modulation of such ing or by protein-denaturing agents. The Substances upon reaction. Antibodies and polynucleotides such as PCR prim which they act are known as substrates, for which the enzyme ers and molecular probes designed to identify such proteins or possesses a specific binding or active site. protein encoding sequences may be used for diagnosis of 0537. The group of enzymes include various proteins pos Such diseases. sessing enzymatic activities such as mannosylphosphate 0527 Examples of such diseases include, but are not lim transferase, para-hydroxybenzoate:polyprenyltransferase, ited to, autoimmune neuromuscular disease Myasthenia rieske iron-sulfur protein, imidazoleglycerol-phosphate Syn Gravis, treated with cholinesterase inhibitors. thase, sphingosine hydroxylase, tRNA 2'-phosphotrans 0528 Hydrolase, Acting on Ester Bonds: ferase, sterol C-24(28) reductase, C-8 sterol isomerase, C-22 0529. The phrase “hydrolase, acting on ester bonds' refers sterol desaturase, C-14 sterol reductase, C-3 sterol dehydro to hydrolytic enzymes acting on ester bonds such as genase (C-4 sterol decarboxylase), 3-keto sterol reductase, nucleases, Sulfuric ester hydrolase, carboxylic ester hydro C-4 methyl sterol oxidase, dihydronicotinamide riboside lases, thiolester hydrolase, phosphoric monoester hydrolase, quinone reductase, glutamate phosphate reductase, DNA phosphoric diester hydrolase, triphosphoric monoester repairenzyme, telomerase, C.-ketoacid dehydrogenase, 3-ala hydrolase, diphosphoric monoester hydrolase, and phospho nyl-dopamine synthase, RNA editase, aldo-keto reductase, ric triester hydrolase. alkylbase DNA glycosidase, glycogen debranching enzyme, 0530 Pharmaceutical compositions including such pro dihydropterin deaminase, dihydropterin oxidase, dimethylni teins or protein encoding sequences, antibodies directed trosamine demethylase, ecdysteroid UDP-glucosyl/UDP US 2011/0091454 A1 Apr. 21, 2011 37 glucuronosyltransferase, glycine cleavage system, helicase, 0542 Pharmaceutical compositions including such pro histone deacetylase, mevaldate reductase, monooxygenase, teins or protein encoding sequences, antibodies directed poly(ADP-ribose) glycohydrolase, pyruvate dehydrogenase, against Such proteins or polynucleotides capable of altering serine esterase, sterol carrier protein X-related thiolase, trans expression of Such proteins, may be used to treat diseases posase, tyramine-13 hydroxylase, para-aminobenzoic acid which are caused or due to abnormalities in cytoskeleton, (PABA) synthase, glu-tRNA (gln) amidotransferase, molyb including cancerous cells, and diseased cells such as cells that dopterin cofactor Sulfurase, lanosterol 14-O-demethylase, do not propagate, grow or function normally. Antibodies and aromatase, 4-hydroxybenzoate octaprenyltransferase, 7,8-di polynucleotides such as PCR primers and molecular probes hydro-8-oxoguanine-triphosphatase, CDP-alcohol phospho designed to identify such proteins or protein encoding transferase, 2,5-diamino-6-(ribosylamino)-4(3H)-pyrimi sequences may be used for diagnosis of Such diseases. donone 5'-phosphate deaminase, diphosphoinositol 0543. Examples of such diseases include, but are not lim polyphosphate phosphohydrolase, Y-glutamyl carboxylase, ited to, liver diseases such as cholestatic diseases Lancet. Small protein conjugating enzyme, Small protein activating 2003, 362(9390): 1112-9, vascular diseases J. Cell Biol. enzyme, 1-deoxyxylulose-5-phosphate synthase. 2'-phos 2003, 162(6): 1111-22, endocrinological diseases Cancer photransferase, 2-octoprenyl-3-methyl-6-methoxy-1,4-ben Res. 2003, 63(16):4836-41, neuromuscular disorders such Zoquinone hydroxylase, 2C-methyl-D-erythritol 2.4-cyclo as muscular dystrophy Neuromuscul. Disord. 2003, 13 (7-8): diphosphate synthase, 3.4 dihydroxy-2-butanone-4- 579-88), or myopathy Neuromuscul. Disord. 2003, 13(6): phosphate synthase, 4-amino-4-deoxychorismate lyase, 456-67 neurological disorders such as Alzheimer's disease 4-diphosphocytidyl-2C-methyl-D-erythritol synthase, ADP J. Alzheimers Dis. 2003, 5(3):209-28), cardiac disorders J. L-glycero-D-manno-heptose synthase, D-erythro-7,8-dihy Am. Coll. Cardiol. 2003, 42(2):319-27, skin disorders J. droneopterin triphosphate 2'-epimerase, N-ethylmaleimide Am. Coll. Cardiol. 2003, 42(2):319-27), and cancer Pro reductase, O-antigen ligase, O-antigen polymerase, UDP-2, teomics. 2003, 3(6):979-90. 3-diacylglucosamine hydrolase, arsenate reductase, carnitine 0544 Structural Proteins: racemase, cobalamin 5'-phosphate synthase, cobinamide 0545. The term "structural proteins’ refers to proteins phosphate guanylyltransferase, enterobactin synthetase, involved in the structure formation of the cell, such as struc enterochelin esterase, enterochelin synthetase, glycolate oxi tural proteins of ribosome, cell wall structural proteins, struc dase, integrase, lauroyl transferase, peptidoglycan Syn tural proteins of cytoskeleton, extracellular matrix structural thetase, phosphopantetheinyltransferase, phosphoglucos proteins, extracellular matrix glycoproteins, amyloid pro amine mutase, phosphoheptose isomerase, quinolinate Syn teins, plasma proteins, structural proteins of eye lens, struc thase, siroheme synthase, N-acylmannosamine-6-phosphate tural protein of chorion (sensu Insecta), structural protein of 2-epimerase, N-acetyl-anhydromuramoyl-L-alanine ami cuticle (sensu Insecta), puparial glue protein (sensu Diptera), dase, carbon-phosphorous lyase, heme-copper terminal oxi structural proteins of bone, yolk proteins, structural proteins dase, disulfide oxidoreductase, phthalate dioxygenase reduc of muscle, structural protein of Vitelline membrane (sensu tase, sphingosine-1-phosphate lyase, molybdopterin Insecta), structural proteins of peritrophic membrane (sensu oxidoreductase, dehydrogenase, NADPH oxidase, naringe Insecta), and structural proteins of nuclear pores. nin-chalcone synthase, N-ethylammeline chlorohydrolase, 0546 Pharmaceutical compositions including such pro polyketide synthase, aldolase, kinase, phosphatase, CoA-li teins or protein encoding sequences, antibodies directed gase, oxidoreductase, transferase, hydrolase, lyase, against Such proteins or polynucleotides capable of altering isomerase, ligase, ATPase, Sulfhydryl oxidase, lipoate-pro expression of Such proteins, may be used to treat diseases tein ligase, 6-1-pyrroline-5-carboxyate synthetase, lipoic which are caused by abnormalities in cytoskeleton, including acid synthase, and tRNA dihydrouridine synthase. cancerous cells, and diseased cells Such as cells that do not 0538 Pharmaceutical compositions including such pro propagate, grow or function normally. Antibodies and poly teins or protein encoding sequences, antibodies directed nucleotides such as PCR primers and molecular probes against Such proteins or polynucleotides capable of altering designed to identify such proteins or protein encoding expression of Such proteins, may be used to treat diseases sequences may be used for diagnosis of Such diseases. which can be ameliorated by modulating the activity of vari 0547 Examples of such diseases include, but are not lim ous enzymes which are involved both in enzymatic processes ited to, blood vessels diseases such as aneurysms Cardio inside cells as well as in cell signaling. Antibodies and poly vasc. Res. 2003, 60(1):205-13, joint diseases Rheum. Dis. nucleotides such as PCR primers and molecular probes Clin. North Am. 2003, 29(3):631-45), muscular diseases such designed to identify Such proteins or protein encoding as muscular dystrophies Cum Opin. Clin. Nutr. Metab. Care. sequences may be used for diagnosis of Such diseases. 2003, 6(4):435-9), neuronal diseases such as encephalitis 0539 Examples of such diseases include, but are not lim Neurovirol. 2003, 9(2):274-83, retinitis pigmentosa Dev. ited to diabetes where alpha-glucosidase is the target for Ophthalmol. 2003, 37: 109-25, and infectious diseases J. drugs which delay glucose absorption, Osteoporosis where Virol. Methods. 2003, 109(1):75-83: FEMS Immunol. Med. farnsesyl diphosphate synthase is the target for bisphospho Microbiol. 2003, 35(2):125-30; J. Exp. Med. 2003, 197(5): nates, thyroid autoimmune disease associated with thyroid 633-42. peroxidase, MUCOPOLYSACCHARIDOSES associated 0548 Ligands: with defects in lysosomal enzymes, Tay-Sachs Disease asso (0549. The term “ligands' refers to proteins that bind to ciated with defects in b-hexosaminidase and hypertension another chemical entity to form a larger complex, involved in where Angiotensin Converting Enzyme is the target for the various biological processes, such as signal transduction, common hypertension drugs—ACE inhibitors. metabolism, growth and differentiation, etc. This group of (0540 Cytoskeletal Proteins: proteins includes opioid peptides, baboon receptor ligand, 0541. The term “cytoskeletal proteins’ refers to proteins branchless receptor ligand, breathless receptor ligand, ephrin, involved in the structure formation of the cytoskeleton. frizzled receptor ligand, frizzled-2 receptor ligand, heartless US 2011/0091454 A1 Apr. 21, 2011 receptor ligand, Notch receptor ligand, patched receptor polymerase II transcription factors, enhancer binding, ligand ligand, punt receptor ligand, Ror receptor ligand, saxophone regulated transcription factor, and general RNA polymerase receptor ligand, SE20 receptor ligand, sevenless receptor II transcription factors. ligand, Smooth receptor ligand, thickveins receptor ligand, 0558 Pharmaceutical compositions including such pro Toll receptor ligand, Torso receptor ligand, death receptor teins or protein encoding sequences, antibodies directed ligand, Scavenger receptor ligand, neuroligin, integrin ligand. against Such proteins or polynucleotides capable of altering hormones, pheromones, growth factors, and Sulfonylurea expression of Such proteins, may be used to treat diseases receptor ligand. involving impaired function of RNA polymerase II transcrip 0550 Pharmaceutical compositions including such pro tion factors. Antibodies and polynucleotides such as PCR teins or protein encoding sequences, antibodies directed primers and molecular probes designed to identify such pro against Such proteins or polynucleotides capable of altering teins or protein encoding sequences may be used for diagno expression of Such proteins, may be used to treat diseases sis of Such diseases. involved in impaired hormone function or diseases which 0559 Examples of such diseases include, but are not lim involve abnormal secretion of proteins which may be due to ited to, cardiac diseases Cell Cycle. 2003, 202):99-104. abnormal presence, absence or impaired normal response to xeroderma pigmentosum Bioessays. 2001, 23(8):671-3: normal levels of secreted proteins. Those secreted proteins Biochim. Biophys. Acta. 1997, 1354(3):241-51, muscular include hormones, neurotransmitters, and various other pro atrophy J. Cell Biol. 2001, 152(1):75-85), neurological dis teins secreted by cells to the extracellular environment. Anti eases such as Alzheimer's disease Front Biosci. 2000, bodies and polynucleotides such as PCR primers and molecu 5:D244-57, cancerous diseases such as breast cancer Biol. lar probes designed to identify such proteins or protein Chem. 1999, 380(2): 117-28), and autoimmune disorders encoding sequences may be used for diagnosis of Such dis Clin. Exp. Immunol. 1997, 109(3):488-94). CaSCS. 0560 RNA Binding Proteins: 0551 Examples of such diseases include, but are not lim 0561. The phrase “RNA binding proteins” refers to RNA ited to, analgesia inhibited by orphanin FQ/nociceptin Shane binding proteins involved in splicing and translation regula R., et al., (2001) Brain Res., 907(1-2):109-16), stroke pro tion such as tRNA binding proteins, RNA helicases, double tected by estrogen Alkayed N.J., et al., (2001).J. Neurosci. stranded RNA and single-stranded RNA binding proteins, 21 (19):7543-50, atherosclerosis associated with growth hor mRNA binding proteins, snRNA cap binding proteins, 5S mone deficiency Elhadd T. A., et al., (2001) J. Clin. Endo RNA and 7S RNA binding proteins, poly-pyrimidine tract crinol. Metab., 86(9):4223-32), diabetes inhibited by C-ga binding proteins, SnRNA binding proteins, and AU-specific lactosylceramide Hong S., et al., (2001) Nat. Med., 7(9): RNA binding proteins. 1052-6, and Huntington's disease Rao D. S., et al., (2001) 0562 Pharmaceutical compositions including such pro Mol. Cell. Biol., 21(22):7796-806. teins or protein encoding sequences, antibodies directed 0552. Signal Transducer: against Such proteins or polynucleotides capable of altering 0553. The term “signal transducers' refers to proteins expression of Such proteins, may be used to treat diseases Such as activin inhibitors, receptor-associated proteins, C-2 involving transcription and translation factors such as heli macroglobulin receptors, morphogens, quorum sensing sig cases, isomerases, histones and nucleases, diseases where nal generators, quorum sensing response regulators, receptor there is impaired transcription, splicing, post-transcriptional signaling proteins, ligands, receptors, two-component sensor processing, translation or stability of the RNA. Antibodies molecules, and two-component response regulators. and polynucleotides such as PCR primers and molecular 0554 Pharmaceutical compositions including such pro probes designed to identify such proteins or protein encoding teins or protein encoding sequences, antibodies directed sequences may be used for diagnosis of Such diseases. against Such proteins or polynucleotides capable of altering 0563 Examples of such diseases include, but are not lim expression of Such proteins, may be used to treat diseases in ited to, cancerous diseases Such as lymphomas Tumori. which the signal-transduction is impaired, either as a cause, 2003, 89(3):278-84, prostate cancer Prostate. 2003, 57(1): or as a result of the disease. Antibodies and polynucleotides 80-92 or lung cancer J. Pathol. 2003, 200(5):640-6), blood such as PCR primers and molecular probes designed to iden diseases, such as fanconi anemia Curr. Hematol. Rep. 2003, tify such proteins or protein encoding sequences may be used 2(4):335-40, cardiovascular diseases such as atherosclerosis for diagnosis of such diseases. J. Thromb. Haemost. 2003, 1(7): 1381-90 muscle diseases 0555 Examples of such diseases include, but are not lim Trends Cardiovasc. Med. 2003, 13(5):188-95 and brain and ited to, altered sexual dimorphism associated with signal neuronal diseases Trends Cardiovasc. Med. 2003, 13(5): transducer and activator of transcription 5b Udy G. B., et al., 188-95; Neurosci. Lett. 2003, 342(1-2):41-4). (1997) Proc. Natl. Acad. Sci. USA, 94(14):7239-44), mul 0564) Nucleic Acid Binding Proteins: tiple sclerosis associated with sgp130 deficiency Padberg F. 0565. The phrase “nucleic acid binding proteins’ refers to et al., (1999) J. Neuroimmunol. 99(2):218-23, intestinal proteins involved in RNA and DNA synthesis and expression inflammation associated with elevated signal transducer and regulation such as transcription factors, RNA and DNA bind activator of transcription 3 activity Suzuki A. et al., (2001).J ing proteins, Zinc fingers, helicase, isomerase, histones, Exp Med, 193(4):471-81, carcinoid tumor inhibited by nucleases, ribonucleoproteins, and transcription and transla increased signal transducer and activators of transcription 1 tion factors. and 2 Zhou Y., et al., (2001) Oncology, 60(4):330-8), and 0566 Pharmaceutical compositions including such pro esophageal cancer associated with loss of EGF-STAT1 path teins or protein encoding sequences, antibodies directed way Watanabe G. et al., (2001) Cancer J., 7(2):132-9. against Such proteins or polynucleotides capable of altering 0556 RNA Polymerase II Transcription Factors: expression of Such proteins, may be used to treat diseases 0557. The phrase “RNA polymerase II transcription fac involving DNA or RNA binding proteins such as: helicases, tors' refers to proteins such as specific and non-specific RNA isomerases, histones and nucleases, for example diseases US 2011/0091454 A1 Apr. 21, 2011 39 where there is abnormal replication or transcription of DNA and especially diabetes long term consequences, neurological and RNA respectively. Antibodies and polynucleotides such diseases such as Creutzfeldt-Jakob disease, and Parkinson or as PCR primers and molecular probes designed to identify Rasmussen's encephalitis. Such proteins or protein encoding sequences may be used for 0572 Cell Growth and/or Maintenance Proteins: diagnosis of such diseases. 0573 The phrase “Cell growth and/or maintenance pro 0567 Examples of such diseases include, but are not lim teins’ refers to proteins involved in any biological process ited to, neurological diseases such as renitis pigmentoas Am. required for cell Survival, growth and maintenance, including J. Ophthalmol. 2003, 136(4):678-87 parkinsonism Proc. proteins involved in biological processes such as cell organi Natl. Acad. Sci. USA. 2003, 100(18): 10347-52), Alzheimer Zation and biogenesis, cell growth, cell proliferation, metabo J. Neurosci. 2003, 23(17):6914-27 and canavan diseases lism, cell cycle, budding, cell shape and cell size control, Brain Res Bull. 2003, 61 (4):427-35, cancerous diseases sporulation (sensu Saccharomyces), transport, ion homeosta such as leukemia Anticancer Res. 2003, 23(4):3419-26 or sis, autophagy, cell motility, chemi-mechanical coupling, lung cancer J. Pathol. 2003, 200(5):640-6, miopathy Neu membrane fusion, cell-cell fusion, and stress response. romuscul Disord. 2003, 13(7-8):559-67 and liverdiseases J. 0574 Pharmaceutical compositions including such pro Pathol. 2003, 200(5):553-60. teins or protein encoding sequences, antibodies directed 0568 Proteins Involved in Metabolism: against Such proteins or polynucleotides capable of altering 0569. The phrase “proteins involved in metabolism” refers expression of Such proteins, may be used to treat or prevent to proteins involved in the totality of the chemical reactions diseases such as cancer, degenerative diseases, for example and physical changes that occur in living organisms, compris neurodegenerative diseases or conditions associated with ing anabolism and catabolism; may be qualified to mean the aging, or alternatively, diseases wherein apoptosis which chemical reactions and physical processes undergone by a should have taken place, does not take place. Antibodies and particular Substance, or class of substances, in a living organ polynucleotides such as PCR primers and molecular probes ism. This group includes proteins involved in the reactions of designed to identify such proteins or protein encoding cell growth and maintenance Such as: metabolism resulting in sequences may be used for diagnosis of Such diseases, detec cell growth, carbohydrate metabolism, energy pathways, tion of pre-disposition to a disease, and determination of the electron transport, nucleobase, nucleoside, nucleotide and stage of a disease. nucleic acid metabolism, protein metabolism and modifica 0575 Examples of such diseases include, but are not lim tion, amino acid and derivative metabolism, protein targeting, ited to, ataxia-telangiectasia associated with ataxia-telang lipid metabolism, aromatic compound metabolism, one-car iectasia mutated deficiency Hande et al., (2001) Hum. Mol. bon compound metabolism, coenzymes and prosthetic group Genet., 10(5):519-28), osteoporosis associated with metabolism, Sulfur metabolism, phosphorus metabolism, osteonectin deficiency Delany et al., (2000) J. Clin. Invest. phosphate metabolism, oxygen and radical metabolism, 105(7):915-23, arthritis caused by membrane-bound matrix Xenobiotic metabolism, nitrogen metabolism, fat body metalloproteinase deficiency Holmbeck et al., (1999) Cell, metabolism (sensu Insecta), protein localization, catabolism, 99(1):81-92, defective stratum corneum and early neonatal biosynthesis, toxin metabolism, methylglyoxal metabolism, death associated with transglutaminase 1 deficiency Matsuki cyanate metabolism, glycolate metabolism, carbon utiliza et al., (1998) Proc. Natl. Acad. Sci. USA,95(3):1044-9), and tion and antibiotic metabolism. Alzheimer's disease associated with estrogen Simpkins et 0570 Pharmaceutical compositions including such pro al., (1997) Am. J. Med., 103(3A): 19S-25S). teins or protein encoding sequences, antibodies directed 0576 Variants of Proteins which Accumulate an Element/ against Such proteins or polynucleotides capable of altering Compound expression of Such proteins, may be used to treat diseases 0577 Variant proteins which their wild type version natu involving cell metabolism. Antibodies and polynucleotides rally binds a certain compound or element inside the cell, such as PCR primers and molecular probes designed to iden Such as for storage, may have therapeutic effect as secreted tify such proteins or protein encoding sequences may be used variants. For example, Ferritin, accumulates iron inside the for diagnosis of such diseases. cells. A secreted variant of this protein is expected to bind 0571 Examples of such metabolism-related diseases plasma iron, reduce its levels to thereby have therapeutic include, but are not limited to, multisystem mitochondrial effects in hemodisorders which are characterized by high disorder caused by mitochondrial DNA cytochrome Coxi levels of free-iron in the blood. dase II deficiency Campos Y., et al., (2001) Ann. Neurol. 0578 Autoantigens 50(3):409-13, conduction defects and ventricular dysfunc (0579. Autoantigens refer to “self proteins which evoke tion in the heart associated with heterogeneous connexin43 autoimmune response. Examples of autoantigens are listed in expression Gutstein D. E., et al., (2001) Circulation, 104 Table 15, below. Secreted splice variants of such autoantigens (10): 1194-9), atherosclerosis associated with growth sup can be used to treat such autoimmune disorders. Since pressor p27 deficiency Diez-Juan A., and Andres V. (2001) autoimmune disorders are occasionally accompanied by dif FASEB J., 15(11):1989-95), colitis associated with glu ferent autoimmune manifestations (including but not limited tathione peroxidase deficiency (Esworthy R. S., et al., (2001) to multiple endocrine syndromes, i.e., syndrome), the Am. J. Physiol. Gastrointest. Liver Physiol. 281 (3):G848 secreted variants of the present invention may treat these 55. Systemic lupus erythematosus associated with deoxyri multiple symptoms. Therapeutic mechanisms of Such vari bonuclease I deficiency Yasutomo K., et al., (2001) Nat. ants may include: (i) sequestration of auto-antibodies to Genet., 28(4):313-4, alcoholic pancreatitis Pancreas. 2003, thereby reduce their circulating levels; (ii) antigen specific 27(4):281-5), amyloidosis and diseases that are related to immunotherapy—based on the observation that prior sys amyloid metabolism, such as FMF, atherosclerosis, diabetes, temic administration of a protein antigen could inhibit the US 2011/0091454 A1 Apr. 21, 2011 40

Subsequent generation of the immune response to the same 0588 Diseases that May be Treated/Diagnosed Using the antigen (has been proved in mice models for Myasthenia Teaching of the Present Invention Gravis and type I Diabetes). 0589 Inflammatory Diseases 0580. In addition, any novel variant of autoantigens (not 0590. Examples of inflammatory diseases include, but are necessarily secreted) may be used for “specific immunoad not limited to, chronic inflammatory diseases and acute Sorption' leading to a specific immunodepletion of an anti inflammatory diseases. body when used in immunoadsorption columns. 0591 Inflammatory Diseases Associated with Hypersen 0581 Variants of autoantigens are also of a diagnostic sitivity value. The diagnosis of many autoimmune disorders is based 0592 Examples of hypersensitivity include, but are not on looking for specific autoantibodies to autoantigens known limited to, Types I-IV hypersensitivity, immediate hypersen to be associated with an autoimmune condition. Most of the sitivity, antibody mediated hypersensitivity, immune com diagnostic techniques are based on having a recombinant plex mediated hypersensitivity, T lymphocyte mediated form of the autoantigen and using it to screen for serum hypersensitivity and DTH. An example of type I or immediate autoantibodies. However these antibodies may bind the vari hypersensitivity is asthma. Examples of type II hypersensi ants of the present invention with a similar or augmented tivity include, but are not limited to, rheumatoid diseases, affinity. For example, TPO is a known autoantigen in thyroid rheumatoid autoimmune diseases, rheumatoid arthritis autoimmunity. It has been shown that its variant TPOzanelli Krenn V. et al., Histol Histopathol 2000 July; 15 (3):791), also take part in the autoimmune process and can bind the spondylitis, ankylosing spondylitis Jan VoSwinkel et al., same antibodies as TPO Biochemistry. 2001 Feb. 27:40(8): Arthritis Res 2001; 3 (3): 189, systemic diseases, systemic 2572-9. autoimmune diseases, systemic lupus erythematosus Erik 0582 The nucleic acid sequences of the present invention, son J. et al., Immunol Res 1998; 17 (1-2):49, sclerosis, the proteins encoded thereby and the cells and antibodies systemic sclerosis Renaudineau Y. et al., Clin Diagn Lab described hereinabove can be used in Screening assays, thera Immunol. 1999 March; 6 (2):156; Chan OT. et al., Immunol peutic or prophylactic methods of treatment, or predictive Rev 1999 June; 169:107), glandular diseases, glandular medicine (e.g., diagnostic and prognostic assays, including autoimmune diseases, pancreatic autoimmune diseases, dia those used to monitor clinical trials, and pharmacogenetics). betes, Type I diabetes Zimmet P. Diabetes Res Clin Pract 0583. More specifically, the nucleic acids of the present 1996 October 34 Suppl:S125, thyroid diseases, autoim invention can be used to: (i) express a protein of the invention mune thyroid diseases, Graves' disease OrgiaZZi J. Endo in a host cell in culture or in an intact multicellular organism crinol Metab ClinNorth Am 2000 June; 29 (2):339), thyroidi following, e.g., gene therapy; (ii) detect an mRNA, or (iii) tis, spontaneous autoimmune thyroiditis Braley-Mullen H. detect an alteration in a gene to which a nucleic acid of the and Yu S, J Immunol 2000 Dec. 15: 165 (12):7262), Hash invention specifically binds; or to modulate such a gene's imoto's thyroiditis Toyoda N. et al., Nippon Rinsho 1999 activity. August; 57 (8): 1810, myxedema, idiopathic myxedema 0584) The nucleic acids and proteins of the present inven Mitsuma T. Nippon Rinsho. 1999 August: 57 (8): 1759, tion can also be used to treat disorders characterized by either autoimmune reproductive diseases, ovarian diseases, ovarian insufficient or excessive production of those nucleic acids or autoimmunity Garza K M. et al., J Reprod Immunol 1998 proteins, a failure in a biochemical pathway in which they February: 37 (2):87, autoimmune anti-sperm infertility normally participate in a cell, or other aberrant or unwanted Diekman AB. et al., Am J ReprodImmunol. 2000 March; 43 activity relative to the wild type protein (e.g., inappropriate (3):134, repeated fetal loss Tincani A. et al., Lupus 1998; 7 enzymatic activity or unproductive protein folding). The pro Suppl 2:S107-9, neurodegenerative diseases, neurological teins of the invention are useful in Screening for naturally diseases, neurological autoimmune diseases, multiple Sclero occurring protein Substrates or other compounds (e.g., drugs) sis Cross A H. et al., J Neuroimmunol 2001 Jan. 1; 112 that modulate protein activity. The antibodies of the invention (1-2):1. Alzheimer's disease Oron L. et al., JNeural Transm can also be used to detect and isolate the proteins of the Suppl. 1997: 49:77, myasthenia gravis Infante A. J. And invention, to regulate their bioavailability, or otherwise Kraig E, Int Rev Immunol 1999; 18 (1-2):83, motor neuro modulate their activity. Exemplary uses, and the methods by pathies Kornberg A.J.J. ClinNeurosci. 2000 May: 7 (3):1911, which they can be achieved, are described in detail below. Guillain-Barre Syndrome, neuropathies and autoimmune 0585 Possible Utilities for Variants of Drug Targets neuropathies Kusunoki S. Am J Med. Sci. 2000 April; 319 0586 Finding a variant of a known drug target can be (4):234, myasthenic diseases, Lambert-Eaton myasthenic advantageous in cases where the known drug has a major side syndrome Takamori M. Am J Med. Sci. 2000 April; 319 effect, the therapeutic efficacy of the known drug is medium, (4):204, paraneoplastic neurological diseases, cerebellar a known drug has failed clinical trials due to one of the above. atrophy, paraneoplastic cerebellaratrophy, non-paraneoplas A drug which is specific to a new protein variant of the target tic stiffman syndrome, cerebellaratrophies, progressive cer or to the target only (without affecting the novel variant) is ebellar atrophies, encephalitis, Rasmussen's encephalitis, likely to have lower side effects as compared to the original amyotrophic lateral sclerosis, Sydeham chorea, Gilles de la drug, higher therapeutic efficacy, and broader or different Tourette Syndrome, polyendocrinopathies, autoimmune range of activities. polyendocrinopathies Antoine J C. and Honnorat J. Rev 0587. For example, COX3, which is a variant of COX1, is Neurol (Paris) 2000 January; 156 (1):23, neuropathies, dys known to bind COX inhibitors in different affinity than immune neuropathies Nobile-Orazio E. et al., Electroen COX1. This molecule is also associated with different physi cephalogr. Clin Neurophysiol Suppl 1999; 50:419, neu ological processes than COX1. Therefore, a compound spe romyotonia, acquired neuromyotonia, arthrogryposis cific to COX1 or compounds specific to COX3 would have multiplex congenita Vincent A. et al., Ann NY Acad. Sci. lower side effects (by not affecting the other variants), and 1998 May 13; 841:482), cardiovascular diseases, cardiovas higher therapeutic efficacy to larger populations. cular autoimmune diseases, atherosclerosis Matsuura E. et US 2011/0091454 A1 Apr. 21, 2011

al., Lupus. 1998; 7 Suppl 2:S135), myocardial infarction Clin Immunol Immunopathol 1990 March; 54 (3):382), bil Vaarala O. Lupus. 1998; 7 Suppl 2:S132, thrombosis Tin iary cirrhosis, primary biliary cirrhosis Jones D. E. Clin Sci cani A. et al., Lupus 1998; 7 Suppl 2:S107-9, granulomato (Colch) 1996 November; 91 (5):551), nephric diseases, sis, Wegener's granulomatosis, arteritis, Takayasu's arteritis nephric autoimmune diseases, nephritis, interstitial nephritis and Kawasaki syndrome Praprotnik S. et al., Wien Klin Kelly C J. J. Am Soc Nephrol 1990 August: 1 (2):140), Wochenschr 2000 Aug. 25: 112 (15-16):660), anti-factor VIII connective tissue diseases, ear diseases, autoimmune connec autoimmune disease Lacroix-Desmazes S. et al., Semin tive tissue diseases, autoimmune ear disease Yoo T.J. et al., Thromb Hemost. 2000; 26 (2): 157, vasculitises, necrotizing Cell Immunol 1994 August: 157 (1):249, disease of the inner Small vessel vasculitises, microscopic polyangiitis, Churg ear Gloddek B. et al., Ann NY Acad Sci 1997 Dec. 29: and Strauss syndrome, glomerulonephritis, pauci-immune 830:266, skin diseases, cutaneous diseases, dermal diseases, focal necrotizing glomerulonephritis, crescentic glomerulo bullous skin diseases, pemphigus Vulgaris, bullous pemphig nephritis Noel L. H. Ann Med Interne (Paris). 2000 May; 151 oid and pemphigus foliaceus. (3):178), antiphospholipid syndrome Flamholz R. et al., J 0594 Examples of delayed type hypersensitivity include, Clin Apheresis 1999; 14 (4): 171, heart failure, agonist-like but are not limited to, contact dermatitis and drug eruption. B-adrenoceptor antibodies in heart failure Wallukat G. et al., 0595 Autoimmune Diseases Am J. Cardiol. 1999 Jun. 17: 83 (12A):75H, thrombocy 0596. Examples of autoimmune diseases include, but are topenic purpura Moccia F. Ann Ital Med. Int. 1999 April not limited to, cardiovascular diseases, rheumatoid diseases, June; 14 (2): 114, hemolytic anemia, autoimmune hemolytic glandular diseases, gastrointestinal diseases, cutaneous dis anemia Efremov DG. et al., Leuk Lymphoma 1998 January; eases, hepatic diseases, neurological diseases, muscular dis 28 (3-4):285, gastrointestinal diseases, autoimmune dis eases, nephric diseases, diseases related to reproduction, con eases of the gastrointestinal tract, intestinal diseases, chronic nective tissue diseases and systemic diseases. inflammatory intestinal disease Garcia Herola A. et al., Gas 0597 Examples of autoimmune cardiovascular and blood troenterol Hepatol. 2000 January; 23 (1): 16. celiac disease diseases include, but are not limited to atherosclerosis Mat Landau Y E. and Shoenfeld Y. Harefuah 2000 Jan. 16; 138 suura E. et al., Lupus. 1998; 7 Suppl 2:S135), myocardial (2):122, autoimmune diseases of the musculature, myositis, infarction Vaarala O. Lupus. 1998; 7 Suppl 2:S132, throm autoimmune myositis, Sjogren's syndrome Feist E. et al., Int bosis Tincani A. et al., Lupus 1998; 7 Suppl 2:S107-9, Arch Allergy Immunol 2000 September; 123 (1):92), smooth Wegener's granulomatosis, Takayasu's arteritis, Kawasaki muscle autoimmune disease Zauli D. et al., Biomed Phar syndrome Praprotnik S. et al., Wien Klin Wochenschr 2000 macother 1999 June; 53 (5-6):234, hepatic diseases, hepatic Aug. 25: 112 (15-16):660), anti-factor VIII autoimmune dis autoimmune diseases, autoimmune hepatitis Manns MP J ease Lacroix-Desmazes S. et al., Semin Thromb Hemost. Hepatol 2000 August; 33 (2):326 and primary biliary cirrho 2000; 26 (2): 157, necrotizing small vessel vasculitis, micro sis Strassburg CP. et al., Eur J. Gastroenterol Hepatol. 1999 scopic polyangiitis, Churg and Strauss syndrome, pauci-im June; 11 (6):595. mune focal necrotizing and crescentic glomerulonephritis 0593 Examples of type IV or T cell mediated hypersen Noel L. H. AnnMed Interne (Paris). 2000 May; 151 (3):178), sitivity, include, but are not limited to, rheumatoid diseases, antiphospholipid syndrome Flamholz R. et al., J. Clin Aph rheumatoid arthritis Tisch R, McDevitt HO. Proc Natl Acad eresis 1999; 14(4): 171, antibody-induced heart failure Wal Sci USA 1994 Jan. 18; 91 (2):437), systemic diseases, sys lukat G. et al., Am J. Cardiol. 1999 Jun. 17: 83 (12A):75H, temic autoimmune diseases, systemic lupus erythematosus thrombocytopenic purpura Moccia F. Ann Ital Med. Int. Datta SK., Lupus 1998; 7 (9):591, glandular diseases, glan 1999 April-June; 14 (2):114: Semple J. W. et al., Blood 1996 dular autoimmune diseases, pancreatic diseases, pancreatic May 15: 87 (10):4245, autoimmune hemolytic anemia Efre autoimmune diseases, Type 1 diabetes Castano L. and Eisen mov DG. et al., Leuk Lymphoma 1998 January; 28 (3-4):285: barth G. S. Ann. Rev. Immunol. 8:647, thyroid diseases, Sallah S. et al., Ann Hematol 1997 March; 74 (3):139), car autoimmune thyroid diseases, Graves disease Sakata S. et diac autoimmunity in Chagas disease Cunha-Neto E. et al., al., Mol Cell Endocrinol 1993 March; 92 (1):77), ovarian J. Clin Invest 1996 Oct. 15: 98 (8): 1709) and anti-helper T diseases Garza K. M. et al., J Reprod Immunol 1998 Febru lymphocyte autoimmunity Caporossi A P. et al., Viral Immu ary: 37 (2):87, prostatitis, autoimmune prostatitis Alex nol 1998; 11 (1):9). ander R B. et al., Urology 1997 December; 50 (6):893), 0598. Examples of autoimmune rheumatoid diseases polyglandular syndrome, autoimmune polyglandular syn include, but are not limited to rheumatoid arthritis Krenn V. drome, Type I autoimmune polyglandular syndrome Hara T. et al., Histol Histopathol 2000 July; 15 (3):791; Tisch R, et al., Blood. 1991 Mar. 1; 77 (5):1127), neurological dis McDevitt HO. Proc Natl AcadSci units SA 1994 Jan. 18; 91 eases, autoimmune neurological diseases, multiple Sclerosis, (2):437) and ankylosing spondylitis Jan VoSwinkel et al., neuritis, optic neuritis Soderstrom M. et al., J Neurol Neu Arthritis Res 2001: 3 (3): 189). rosurg Psychiatry 1994 May: 57 (5):544, myasthenia gravis 0599 Examples of autoimmune glandular diseases Oshima M. et al., EurJImmunol 1990 December; 20 (12): include, but are not limited to, autoimmune diseases of the 2563, stiff-man syndrome Hiemstra HS. et al., Proc Natl pancreas, Type 1 diabetes Castano L. and Eisenbarth G. S. Acad Sci USA 2001 Mar. 27: 98 (7):3988), cardiovascular Ann. Rev. Immunol. 8:647; Zimmet P. Diabetes Res Clin diseases, cardiac autoimmunity in Chagas disease Cunha Pract 1996 October; 34 Suppl:S125, autoimmune thyroid Neto E. et al., J. Clin Invest 1996 Oct. 15: 98 (8): 1709), diseases, Graves' disease Orgiazzi J. Endocrinol Metab Clin autoimmune thrombocytopenic purpura Semple J.W. et al., North Am 2000 June; 29 (2):339; Sakata S. et al., Mol Cell Blood 1996 May 15: 87 (10):4245), anti-helper T lymphocyte Endocrinol 1993 March; 92 (1):77, spontaneous autoim autoimmunity Caporossi A P. et al., Viral Immunol 1998; 11 mune thyroiditis Braley-Mullen H. and Yu S, J Immunol (1):9), hemolytic anemia Sallah S. et al., Ann Hematol 1997 2000 Dec. 15: 165 (12):7262), Hashimoto's thyroiditis March; 74 (3):139), hepatic diseases, hepatic autoimmune Toyoda N. et al., Nippon Rinsho 1999 August: 57 (8): 1810), diseases, hepatitis, chronic active hepatitis Franco A. et al., idiopathic myxedema Mitsuma T. Nippon Rinsho. 1999 US 2011/0091454 A1 Apr. 21, 2011 42

August; 57 (8): 1759, ovarian autoimmunity Garza K M. et mune ear diseases Yoo T.J. et al., Cell Immunol 1994 August; al., J Reprod Immunol 1998 February: 37 (2):87, autoim 157 (1):249) and autoimmune diseases of the inner earGlod mune anti-sperm infertility, autoimmune prostatitis and Type dek B. et al., Ann NY Acad Sci 1997 Dec. 29: 830:266. I autoimmune polyglandular syndrome. 0608 Examples of autoimmune systemic diseases 0600 Examples of autoimmune gastrointestinal diseases include, but are not limited to, systemic lupus erythematosus include, but are not limited to, chronic inflammatory intesti Erikson J. et al., Immunol Res 1998; 17 (1-2):49) and sys nal diseases Garcia Herola A. et al., Gastroenterol Hepatol. temic Sclerosis RenaudineauY. et al., Clin Diagn Lab Immu 2000 January; 23 (1):16, celiac disease Landau Y E. and nol. 1999 March; 6 (2):156; Chan OT. et al., Immunol Rev Shoenfeld Y. Harefuah 2000 Jan. 16; 138 (2):122, colitis, 1999 June; 169:107). illeitis and Crohn's disease and ulcerative colitis. 0609 Infectious Diseases 0601 Examples of autoimmune cutaneous diseases 0610 Examples of infectious diseases include, but are not include, but are not limited to, autoimmune bullous skin dis limited to, chronic infectious diseases, Subacute infectious eases, such as, but are not limited to, pemphigus Vulgaris, diseases, acute infectious diseases, viral diseases, bacterial bullous pemphigoid and pemphigus foliaceus. diseases, protozoan diseases, parasitic diseases, fungal dis 0602 Examples of autoimmune hepatic diseases include, eases, mycoplasma diseases, and prion diseases. but are not limited to, hepatitis, autoimmune chronic active 0611 Graft Rejection Diseases hepatitis Franco A. et al., Clin Immunol Immunopathol 1990 0612. Examples of diseases associated with transplanta March; 54(3):382, primary biliary cirrhosis Jones D.E. Clin tion of a graft include, but are not limited to, graft rejection, Sci (Colch) 1996 November; 91 (5):551; Strassburg C. P. et chronic graft rejection, Subacute graft rejection, hyperacute al., Eur J. Gastroenterol Hepatol. 1999 June; 11 (6):595) and graft rejection, acute graft rejection, and graft versus host autoimmune hepatitis Manns M P J Hepatol 2000 August; disease. 33 (2):326). 0613 Allergic Diseases 0603 Examples of autoimmune neurological diseases 0614 Examples of allergic diseases include, but are not include, but are not limited to, multiple sclerosis Cross A H. limited to, asthma, hives, urticaria, pollen allergy, dust mite et al., JNeuroimmunol 2001 Jan. 1; 112 (1-2):1. Alzheimer's allergy, Venom allergy, cosmetics allergy, latex allergy, disease Oron L. et al., J Neural Transm Suppl. 1997:49:77), chemical allergy, drug allergy, insect bite allergy, animal dan myasthenia gravis Infante A.J. And Kraig E. Int Rev Immu derallergy, stinging plantallergy, poison ivy allergy and food nol 1999; 18 (1-2):83; Oshima M. et al., EurJImmunol 1990 allergy. December; 20 (12):2563), neuropathies, motor neuropathies 0615 Cancerous Diseases Kornberg A.J. J. Clin Neurosci. 2000 May: 7 (3):1911, Guil 0616 Examples of cancer include but are not limited to lain-Barre Syndrome and autoimmune neuropathies carcinoma, lymphoma, blastoma, sarcoma, and leukemia. Kusunoki S. Am J Med. Sci. 2000 April; 319 (4):234, myas Particular examples of cancerous diseases but are not limited thenia, Lambert-Eaton myasthenic syndrome Takamori M. to: Myeloid leukemia Such as Chronic myelogenous leuke Am J Med. Sci. 2000 April; 319 (4):204, paraneoplastic mia. Acute myelogenous leukemia with maturation. Acute neurological diseases, cerebellaratrophy, paraneoplastic cer promyelocytic leukemia, Acute nonlymphocytic leukemia ebellaratrophy and stiff-man syndrome Hiemstra HS. et al., with increased basophils, Acute monocytic leukemia. Acute Proc Natl Acad Sci units S A 2001 Mar. 27; 98 (7):3988), myelomonocytic leukemia with eosinophilia; malignantlym non-paraneoplastic stiff man syndrome, progressive cerebel phoma, such as Birkitt's Non-Hodgkin's; Lymphoctyic leu laratrophies, encephalitis, Rasmussen's encephalitis, amyo kemia, Such as acute lumphoblastic leukemia. Chronic lym trophic lateral sclerosis, Sydeham chorea, Gilles de la phocytic leukemia; Myeloproliferative diseases, such as Tourette syndrome and autoimmune polyendocrinopathies Solid tumors Benign Meningioma, Mixed tumors of salivary Antoine J.C. and Honnorat J. Rev Neurol (Paris) 2000 Janu gland, Colonic adenomas; Adenocarcinomas, such as Small ary; 156 (1):23, dysimmune neuropathies Nobile-Orazio E. cell lung cancer, Kidney, Uterus, Prostate, Bladder, Ovary, et al., Electroencephalogr. Clin Neurophysiol Suppl 1999; Colon, Sarcomas, Liposarcoma, myxoid, Synovial sarcoma, 50:419, acquired neuromyotonia, arthrogryposis multiplex Rhabdomyosarcoma (alveolar), Extraskeletel myxoid chono congenita Vincent A. et al., Ann NY Acad. Sci. 1998 May 13; drosarcoma, Ewing's tumor; other include Testicular and ova 841:482, neuritis, optic neuritis Soderstrom M. et al., J rian dysgerminoma, Retinoblastoma, Wilms tumor, Neuro NeurolNeurosurg Psychiatry 1994 May: 57 (5):544) multiple blastoma, Malignant melanoma, Mesothelioma, breast, skin, Sclerosis and neurodegenerative diseases. prostate, and ovarian. 0604 Examples of autoimmune muscular diseases 0.617 Thus, the nucleic acid sequences of the present include, but are not limited to, myositis, autoimmune myosi invention and the proteins encoded thereby and the cells and tis and primary Sjogren's syndrome Feist E. et al., IntArch antibodies described hereinabove can be used in, for example, Allergy Immunol 2000 September: 123 (1):92) and smooth screening assays, therapeutic or prophylactic methods of muscle autoimmune disease Zauli D. et al., Biomed Phar treatment, or predictive medicine (e.g., diagnostic and prog macother 1999 June; 53 (5-6):234. nostic assays, including those used to monitor clinical trials, 0605 Examples of autoimmune nephric diseases include, and pharmacogenetics). but are not limited to, nephritis and autoimmune interstitial 0618. More specifically, the nucleic acids of the invention nephritis Kelly C J. J. Am Soc Nephrol 1990 August: 1 can be used to: (i) express a protein of the invention in a host (2):140, glommerular nephritis. cell (in culture or in an intact multicellular organism follow 06.06 Examples of autoimmune diseases related to repro ing, e.g., gene therapy, given, of course, that the transcript in duction include, but are not limited to, repeated fetal loss question contains more than untranslated sequence); (ii) Tincani A. et al., Lupus 1998; 7 Suppl 2:S107-9. detect an mRNA, or (iii) detect an alteration in a gene to 0607 Examples of autoimmune connective tissue dis which a nucleic acid of the invention specifically binds; or to eases include, but are not limited to, ear diseases, autoim modulate Such a gene's activity. US 2011/0091454 A1 Apr. 21, 2011

0619. The nucleic acids and proteins of the invention can Pat. No. 5,223.409), plasmids Cullet al., Proc Natl AcadSci also be used to treat disorders characterized by either insuf USA 89:1865-1869, (1992) or on phage Scott and Smith, ficient or excessive production of those nucleic acids or pro Science 249:386-390, (1990); Devlin, Science 249:404–406, teins, a failure in a biochemical pathway in which they nor (1990); Cwirla et al., Proc. Natl. Acad. Sci. USA 87:6378 mally participate in a cell, or other aberrant or unwanted 6382, (1990); Felici, J. Mol. Biol. 222:301-310, (1991); and activity relative to the wild type protein (e.g., inappropriate U.S. Pat. No. 5,223,409. enzymatic activity or unproductive protein folding). The pro 0624 The screening assay can be a cell-based assay, in teins of the invention are especially useful in screening for which case the screening method includes contacting a cell naturally occurring protein Substrates or other compounds that expresses a protein of the invention with a test compound (e.g., drugs) that modulate protein activity. The antibodies of and determining the ability of the test compound to modulate the invention can also be used to detect and isolate the pro the protein's activity. The cell used can be a mammalian cell, teins of the invention, to regulate their bioavailability, or including a cell obtained from a human or from a human cell otherwise modulate their activity. These uses, and the meth line. ods by which they can be achieved, are described in detail 0625. Alternatively, or in addition to examining the ability below. of an agent to modulate expression or activity generally, one can examine the ability of an agent to interact with, for Screening Assays example, to specifically bind to, a nucleic acid or protein of 0620. The present invention provides methods (or the invention. For example, one can couple an agent (e.g., a 'screening assays”) for identifying agents (or “test com substrate) to a label (those described above, including radio pounds that bind to or otherwise modulate (i.e., stimulate or active or enzymatically active Substances, are Suitable), con inhibit) the expression or activity of a nucleic acid of the tact the nucleic acid or protein of the invention with the present invention or the protein it encodes. An agent may be, labeled agent, and determine whether they bind one another for example, a small molecule Such as a peptide, peptidomi (by detecting, for example, a complex containing the nucleic metic (e.g., a peptoid), an amino acid or an analog thereof, a acid or protein and the labeled agent). Labels are not, how polynucleotide orananalog thereof, a nucleotide oran analog ever, always required. For example, one can use a micro thereof, or an organic or inorganic compound (e.g., a hetero physiometer to detect interaction between an agent and a organic or organometallic compound) having a molecular protein of the invention, neither of which were previously weight less than about 10,000 (e.g., about 5,000, 1,000, or labeled McConnell et al., Science 257: 1906-1912, (1992). A 500) grams per mole and salts, esters, and other pharmaceu microphysiometer (also known as a cytosensor) is an analyti tically acceptable forms of such compounds. cal instrument that measures the rate at which a cell acidifies 0621 Agents identified in the screening assays can be its environment. The instrument uses a light-addressable used, for example, to modulate the expression or activity of potentiometric sensor (LAPS), and changes in the acidifica the nucleic acids or proteins of the invention in a therapeutic tion rate indicate interaction between an agent and a protein of protocol, or to discover more about the biological functions of the invention. Molecular interactions can also be detected the proteins. using fluorescence energy transfer (FET; see, e.g., U.S. Pat. 0622. The assays can be constructed to Screen for agents Nos. 5,631,169 and 4,868,103). An FET binding event can be that modulate the expression or activity of a protein of the conveniently measured through fluorometric detection means invention or another cellular component with which it inter well known in the art (e.g., by means of a fluorimeter). Where acts. For example, where the protein of the invention is an analysis in real time is desirable, one can examine the inter enzyme, the screening assay can be constructed to detect action (e.g., binding) between an agent and a protein of the agents that modulate either the enzyme’s expression or activ invention with Biomolecular Interaction Analysis BIA: see, ity or that of its substrate. The agents tested can be those e.g., Solander and Urbaniczky Anal. Chem. 63:2338-2345, obtained from combinatorial libraries. Methods known in the (1991) and Szabo et al., Curr. Opin. Struct. Biol. 5:699-705, art allow the production and screening of biological libraries; (1995). BIA allows one to detect biospecific interactions in peptoid libraries i.e., libraries of molecules that function as real time without labeling any of the interactants (e.g., BIA peptides even though they have a non-peptide backbone that core). confers resistance to enzymatic degradation; see, e.g., Zuck 0626. The screening assays can also be cell-free assays ermann et al., J. Med. Chem. 37:2678-85, (1994): spatially (i.e., soluble or membrane-bound forms of the proteins of the addressable parallel solid phase or solution phase libraries: invention, including the variants, mutants, and other frag synthetic libraries requiring deconvolution: "one-bead one ments described above, can be used to identify agents that compound libraries; and synthetic libraries. The biological bind those proteins or otherwise modulate their expression or and peptoid libraries can be used to test only peptides, but the activity). The basic protocol is the same as that for a cell other four are applicable to testing peptides, non-peptide oli based assay in that, in either case, one must contact the protein gomers or libraries of Small molecules Lam, Anticancer of the invention with an agent of interest for a sufficient time Drug Des. 12:145, (1997). Molecular libraries can be syn and under appropriate (e.g., physiological) conditions to thesized as described by DeWitt et al. Proc. Natl. Acad. Sci. allow any potential interaction to occur and then determine USA 90:6909, (1993) Erb et al. Proc. Natl. Acad. Sci. USA whether the agent binds the protein or otherwise modulates its 91:11422, (1994). Zuckermann et al. J. Med. Chem. expression or activity. 37:2678, (1994) Cho et al. Science 261:1303, (1993) and 0627 Those of ordinary skill in the art will, however, Gallop et al. J. Med. Chem. 37:1233, (1994). appreciate that there are differences between cell-based and 0623 Libraries of compounds may be presented in solu cell-free assays. For example, when membrane-bound forms tion see, e.g., Houghten, Biotechniques 13:412-421, of the protein are used, it may be desirable to utilize a solu (1992), or on beads Lam, Nature 354:82-84, (1991), chips bilizing agent (e.g., non-ionic detergents such as n-octylglu Fodor, Nature 364:555-556, (1993), bacteria or spores (U.S. coside, n-dodecylglucoside, n-dodecylmaltoside, octanoyl US 2011/0091454 A1 Apr. 21, 2011 44

N-methylglucamide, decanoyl-N-methylglucamide, Triton R gation (Rivas and Minton, Trends Biochem Sci 18:284-7, X-100, Triton R. X-114. Thesit(R), Isotridecypoly(ethylene 1993); chromatography (e.g., gel filtration or ion-exchange glycol ether), 3-(3-cholamidopropyl)dimethylamminio-1- chromatography); electrophoresis see, e.g., Ausubel et al., propane sulfonate (CHAPS), 3-(3-cholamidopropyl)dim Eds. Current Protocols in Molecular Biology, J. Wiley & ethylamminio-2-hydroxy-1-propane sulfonate (CHAPSO), Sons, New York, N.Y., (1999); or immunoprecipitation see, or N-dodecyl-N,N-dimethyl-3-ammonio-1-propane sul e.g., Ausubel et al. (Supra); see also Heegaard, J. Mol. Rec fonate). ognit. 11:141-148, (1998) and Hage and Tweed, J. Chro 0628. In the assays of the invention, any of the proteins matogr. Biomed. Sci. Appl. 699:499-525, (1997). Fluores described herein or the agents being tested can be anchored to cence energy transfer (see above) can also be used, and is a solid phase or otherwise immobilized (assays in which one convenient because binding can be detected without purify of two substances that interact with one another are anchored ing the complex from Solution. Assays in which the entire to a solid phase are sometimes referred to as "heterogeneous' reaction of interest is carried out in a liquid phase are some assays). For example, a protein of the present invention can be times referred to as homogeneous assays. anchored to a microtiter plate, a test tube, a microcentrifuge 0633. The screening methods of the invention can also be tube, a column, or the like before it is exposed to an agent. Any designed as competition assays in which an agent and a Sub complex that forms on the solid phase is detected at the end of stance that is known to bind a protein of the present invention the period of exposure. For example, a protein of the present compete to bind that protein. Depending upon the order of invention can be anchored to a solid Surface, and the test addition of reaction components and the reaction conditions compound (which is not anchored and can be labeled, directly (e.g., whether the reaction is allowed to reach equilibrium), or indirectly) is added to the surface bearing the anchored agents that inhibit complex formation can be distinguished protein. Un-reacted (e.g., unbound) components can be from those that disrupt preformed complexes. removed (by, e.g., washing) under conditions that allow any 0634. In either approach, the order in which reactants are complexes formed to remain immobilized on the solid sur added can be varied to obtain different information about the face, where they can be detected (e.g., by virtue of a label agents being tested. For example, agents that interfere with attached to the protein or the agent or with a labeled antibody the interaction between a gene product and one or more of its that specifically binds an immobilized component and may, binding partners (by, e.g., competing with the binding part itself, be directly or indirectly labeled). ner), can be identified by adding the binding partner and the 0629. One can immobilize either a protein of the present agent to the reaction at about the same time. Agents that invention or an antibody to which it specifically binds to disrupt preformed complexes (by, e.g., displacing one of the facilitate separation of complexed (or bound) protein from components from the complex), can be added after a complex uncomplexed (or unbound) protein. Such immobilization can containing the gene product and its binding partner has also make it easier to automate the assay, and fusing the formed. proteins of the invention to heterologous proteins can facili 0635. The proteins of the invention can also be used as tate their immobilization. For example, proteins fused to glu “bait proteins’ in a two- or three-hybrid assay see, e.g., U.S. tathione-S-transferase can be adsorbed onto glutathione Pat. No. 5,283,317; Zervos et al., Cell 72:223-232, (1993); sepharose beads (Sigma Chemical Co., St. Louis, Mo.) or Madura et al., J. Biol. Chem. 268: 12046-12054, (1993); Bar glutathione derivatized microtiter plates, then combined with telet al. Biotechniques 14:920-924, (1993); Iwabuchi et al., the agent and incubated under conditions conducive to com Oncogene 8:1693-1696, (1993); and WO94/10300 to iden plex formation (e.g., conditions in which the salt and pH tify other proteins that bind to (e.g., specifically bind to) or levels are within physiological levels). Following incubation, otherwise interact with a protein of the invention. Such bind the Solid phase is washed to remove any unbound components ing proteins can activate or inhibit the proteins of the inven (where the solid phase includes beads, the matrix can be tion (and thereby influence the biochemical pathways and immobilized), the presence or absence of a complex is deter events in which those proteins are active). mined. Alternatively, complexes can be dissociated from a 0636. As noted above, the screening assays of the inven matrix, and the level of protein binding or activity can be tion can be used to identify an agent that inhibits the expres determined using standard techniques. sion of a protein of the invention by, for example, inhibiting 0630. Immobilization can be achieved with methods the transcription or translation of a nucleic acid that encodes known in the art. For example, biotinylated protein can be it. In these assays, one can contact a cell or cell free mixture prepared from biotin-NHS (N-hydroxy-succinimide) using with the agent and then evaluate mRNA or protein expression techniques known in the art (e.g., the biotinylation kit from relative to the levels that are observed in the absence of the Pierce Chemicals, Rockford, Ill.) and immobilized in the agent (a statistically significant increase in expression indi wells of streptavidin-coated tissue culture plates (also from cating that the agent stimulates mRNA or protein expression Pierce Chemical). and a decrease (again, one that is statistically significant) 06.31 The screening assays of the invention can employ indicating tat the agent inhibits mRNA or protein expression). antibodies that react with the proteins of the invention but do Methods for determining levels of mRNA or protein expres not interfere with their activity. These antibodies can be sion are known in the art and, here, would employ the nucleic derivatized to a solid surface, where they will trap a protein of acids, proteins, and antibodies of the present invention. the invention. Any interaction between a protein of the inven 0637. It should be noted that if desired, two or more of the tion and an agent can then be detected using a second anti methods described herein can be practiced together. For body that specifically binds the complex formed between the example, one can evaluate an agent that was first identified in protein of the invention and the agent to which it is bound. a cell-based assay in a cell free assay. Similarly, and the 0632 Cell-free assays can also be conducted in a liquid ability of the agent to modulate the activity of a protein of the phase, in which case any reaction product can be separated invention can be confirmed in vivo (e.g., in a transgenic (and thereby detected) by, for example: differential centrifu animal). US 2011/0091454 A1 Apr. 21, 2011

0638. The screening methods of the present invention can from an individual can be compared for identification pur also be used to identify proteins (in the event transcripts of the poses. Because greater numbers of polymorphisms occur in present invention encode proteins) that are associated (e.g., the noncoding regions, fewer sequences are necessary to dif causally) with drug resistance. One can then block the activity ferentiate individuals. The noncoding sequences disclosed of these proteins (with, e.g., an antibody of the invention) and herein can provide positive individual identification with a thereby improve the ability of a therapeutic agent to exert a panel of perhaps 10 to 1,000 primers which each yield a desirable effect on a cell or tissue in a subject (e.g., a human noncoding amplified sequence of 100 bases. If predicted cod patient). ing sequences are used, a more appropriate number of primers 0639 Monitoring the influence of therapeutic agents (e.g., for positive individual identification would be 500-2,000. drugs) or other events (e.g., radiation therapy) on the expres 0644. If a panel of reagents from the nucleic acids sion or activity of a biomolecular sequence of the present described herein is used to generate a unique identification invention can be useful in clinical trials (a desired extension database for an individual, those same reagents can later be of the screening assays described above). For example, agents used to identify tissue from that individual. Using the data that exert an effect by, in part, altering the expression or base, the individual, whether still living or dead, can subse activity of a protein of the invention ex vivo can be tested for quently be linked to even very Small tissue samples. their ability to do so as the treatment progresses in a Subject. 0645 DNA-based identification techniques, including Moreover, in animal or clinical trials, the expression or activ those in which small samples of DNA are amplified (e.g., by ity of a nucleic acid can be used, optionally in conjunction PCR) can also be used in forensic biology. Sequences ampli with that of other genes, as a “read out' or marker of the fied from tissues (such as hair or skin) or body fluids (such as phenotype of a particular cell. blood, saliva, or semen) found at a crime scene can be com pared to a standard (e.g., sequences obtained and amplified Detection Assays from a Suspect), thereby allowing one to determine whether 0640 The nucleic acid sequences of the invention can the suspect is the source of the tissue or bodily fluid. serve as polynucleotide reagents that are useful in detecting a 0646. The nucleic acids of the invention, when used as specific nucleic acid sequence. For example, one can use the probes or primers, can target specific loci in the human nucleic acid sequences of the present invention to map the genome. This will improve the reliability of DNA-based corresponding genes on a chromosome (and thereby discover forensic identifications because the more identifying markers which proteins of the invention are associated with genetic examined, the less likely it is that one individual will be disease) or to identify an individual from a biological sample mistaken for another. Moreover, tests that rely on obtaining (i.e., to carry out tissue typing, which is useful in criminal actual genomic sequence (which is possible here) are more investigations and forensic Science). The novel transcripts of accurate than those in which identification is based on the the present invention can be used to identify those tissues or patterns formed by restriction enzyme generated fragments. cells affected by a disease (e.g., the nucleic acids of the (0647. The nucleic acids of the invention can also be used to invention can be used as markers to identify cells, tissues, and study the expression of the mRNAs in histological sections specific pathologies, such as cancer), and to identify individu (i.e., they can be used in in situ hybridization). This approach als who may have or beat risk for a particular cancer. Specific can be useful when forensic pathologists are presented with methods of detection are described herein and are known to tissues of unknown origin or when the purity of a population those of ordinary skill in the art. of cells (e.g., a cell line) is in question. The nucleic acids can 0641. The nucleic acids of the present invention can be also be used in diagnosing a particular condition and in moni used to determine whether aparticular individual is the source toring a treatment regime. ofa biological sample (e.g., a blood sample). This is presently achieved by examining restriction fragment length polymor Predictive Medicine phisms (RFLPs; U.S. Pat. No. 5.272,057), and the sequences disclosed here are useful as additional DNA markers for 0648. The nucleic acids, proteins, antibodies, and cells RFLP. For example, one candigest a sample of an individual's described hereinabove are generally useful in the field of genomic DNA, separate the fragments (e.g. by Southern blot predictive medicine and, more specifically, are useful in diag ting), and expose the fragments to probes generated from the nostic and prognostic assays and in monitoring clinical trials. nucleic acids of the present invention (methods employing For example, one can determine whether a subject is at risk of restriction endonucleases are discussed further below). If the developing a disorder associated with a lesion in, or the mis pattern of binding matches that obtained from a tissue of an expression of a nucleic acid of the invention (e.g., a cancer unknown source, then the individual is the source of the Such as pancreatic cancer, breast cancer, or a cancer within the tissue. urinary system). In addition, the nucleic acids expressed in 0642. The nucleic acids of the present invention can also tumor tissues and not in normal tissues are markers that can be be used to determine the sequence of selected portions of an used to determine whether a subject has or is likely to develop individual's genome. For example, the sequences that repre a particular type of cancer. sent new genes can be used to prepare primers that can be used (0649. The “subject” referred to in the context of any of the to amplify an individual's DNA and Subsequently sequence it. methods of the present invention, is a vertebrate animal (e.g., Panels of DNA sequences (each amplified with a different set a mammal such as an animal commonly used in experimental of primers) can uniquely identify individuals (as every person studies (e.g. rats, mice, rabbits and guinea pigs); a domesti will have unique sequences due to allelic differences). cated animal (e.g., a dog or cat); an animal kept as livestock 0643 Allelic variation occurs to some degree in the coding (e.g., a pig, cow, sheep,goat, or horse); a non-human primate regions of these sequences, and to a greater degree in the (e.g. an ape, monkey, or chimpanzee); a human primate; an noncoding regions. Each of the sequences described herein avian (e.g., a chicken); an amphibian (e.g., a frog); or a reptile. can, to some degree, be used as a standard against which DNA The animal can be an unborn animal (accordingly, the meth US 2011/0091454 A1 Apr. 21, 2011 46 ods of the invention can be used to carry out genetic screening nucleic acid probe with which it can hybridize understringent or to make prenatal diagnoses). The Subject can also be a conditions (the characteristics of useful probes are known to human. those of ordinary skill in the art and are discussed elsewhere 0650. The methods related to predictive medicine can also herein). The mRNA can be immobilized on a surface (e.g., a be carried out by using a nucleic acid of the invention to, for membrane, such as nitrocellulose or other commercially example detect, in a tissue of a Subject: (i) the presence or available membrane) following gel electrophoresis. absence of a mutation that affects the expression of the cor responding gene (e.g., a mutation in the 5' regulatory region 0654 Alternatively, one or more nucleic acids (the target of the gene); (ii) the presence or absence of a mutation that sequence or the probe) can be distributed on a two-dimen alters the structure of the corresponding gene; (iii) an altered sional array (e.g., a gene chip). Arrays are useful in detecting level (i.e., a non-wildtype level) of mRNA of the correspond mutations because a probe positioned on the array can have ing gene (the proteins of the invention can be similarly used to one or more mismatches to a nucleic acid of the invention detect an altered level of protein expression); (iv) a deletion or (e.g., a destabilizing mismatch). For example, genetic muta addition of one or more nucleotides from the nucleic acid tions in any of nucleic acid sequences of the present invention sequences of the present invention; (v) a Substitution of one or can be identified in two-dimensional arrays containing light more nucleotides in the nucleic acid sequences of the present generated DNA probes Cronin et al., Human Mutation invention (e.g., a point mutation); (vi) a gross chromosomal 7:244-255, (1996). Briefly, when a light-generated DNA rearrangement (e.g., a translocation, inversion, or deletion); probe is used, a first array of probes is used to scan through or (vii) aberrant modification of a gene corresponding to the long stretches of DNA in a sample and a control to identify nucleic acid sequences of the present invention (e.g., modifi base changes between the sequences by making linear arrays cation of the methylation pattern of the genomic DNA). Simi of sequential overlapping probes. This step allows the iden larly, one can test for inappropriate post-translational modi tification of point mutations, and it can be followed by use of fication of any protein encoded. Abnormal expression or a second array that allows the characterization of specific abnormal gene or proteinstructures indicate that the Subject is mutations by using Smaller, specialized probe arrays comple at risk for the associated disorder. mentary to all variants or mutations detected. Each mutation 0651. A genetic lesion can be detected by, for example, array is composed of parallel probe sets, one complementary providing an oligonucleotide probe or primer having a to the wild-type gene and the other complementary to the sequence that hybridizes to a sense or antisense Strand of a mutant gene. Arrays are discussed further below; see also: nucleic acid sequence of the present invention, a naturally Kozal et al. Nature Medicine 2:753-759, (1996). occurring mutant thereof, or the 5' or 3' sequences that are 0655 The level of an mRNA in a sample can also be naturally associated with the corresponding gene, and expos evaluated with a nucleic acid amplification technique e.g., ing the probe or primer to a nucleic acid within a tissue of RT-PCR (U.S. Pat. No. 4,683.202), ligase chain reaction interest (e.g., a tumor). One can detect hybridization between LCR; Barany, Proc. Natl. Acad. Sci. USA 88:189-193, the probe or primer and the nucleic acid of the tissue by (1991); LCR can be particularly useful for detecting point standard methods (e.g., in situ hybridization) and thereby mutations), self Sustained sequence replication Guatelli et detect the presence or absence of the genetic lesion. Where al., Proc. Natl. Acad. Sci. USA 87: 1874-1878, (1990), tran the probe or primer specifically hybridizes with a new splice scriptional amplification system Kwoh et al., Proc. Natl. variant, the probe or primer can be used to detect a non-wild Acad. Sci. USA 86: 1173-1177, (1989), Q-Beta Replicase type splicing pattern of the mRNA. The antibodies of the Lizardi et al., Bio/Technology 6:1197. (1988), or rolling invention can be similarly used to detect the presence or circle replication (U.S. Pat. No. 5,854,033). Following ampli absence of a protein encoded by a mutant, mis-expressed, or fication, the nucleic acid can be detected using techniques otherwise deficient gene. Diagnostic and prognostic assays known in the art. Amplification primers are a pair of nucleic are described further below. acids that anneal to 5' or 3' regions of a gene (plus and minus 0652 Qualitative or quantitative analyses (which reveal Strands, respectively, or Vice-versa) at Some distance (possi the presence or absence of a Substance or its level of expres bly a short distance) from one another. For example, each sion or activity, respectively) can be carried out for any one of primer can consist of about 10 to 30 nucleotides and bind to the nucleic acid sequences of the present invention, or (where sequences that are about 50 to 200 nucleotides apart. Serial the nucleic acid encodes a protein) the proteins they encode, analysis of gene expression can be used to detect transcript by obtaining a biological sample from a Subject and contact levels (U.S. Pat. No. 5,695.937). Other useful amplification ing the sample with an agent capable of specifically binding a techniques (useful in, for example, detecting an alteration in nucleic acid represented by the nucleic acid sequences of the a gene) include anchor PCR, real-time PCR or RACE PCR. present invention or a protein those nucleic acids encode. The 0656 Mutations in the gene sequences of the invention conditions in which contacting is performed should allow for can also be identified by examining alterations in restriction specific binding. Suitable conditions are known to those of enzyme cleavage patterns. For example, one can isolate DNA ordinary skill in the art. The biological sample can be a tissue, from a sample cell or tissue and a control, amplify it (if a cell, or a bodily fluid (e.g., blood or serum), which may or necessary), digest it with one or more restriction endonu may not be extracted from the Subject (i.e., expression can be cleases, and determine the length(s) of the fragment(s) pro monitored in vivo). duced (e.g., by gel electrophoresis). If the size of the fragment 0653 More specifically, the expression of a nucleic acid obtained from the sample is different from the size of the sequence can be examined by, for example, Southern or fragment obtained from the control, there is a mutation in the Northern analyses, polymerase chain reaction analyses, or DNA in the sample tissue. Sequence specific ribozymes (see, with probe arrays. For example, one can diagnose a condition for example, U.S. Pat. No. 5,498.531) can be used to detect associated with expression or mis-expression of a gene by specific mutations by development or loss of a ribozyme isolating mRNA from a cell and contacting the mRNA with a cleavage site. US 2011/0091454 A1 Apr. 21, 2011 47

0657. Any sequencing reaction known in the art (includ antibody that specifically binds the protein of interest. The ing those that are automated) can also be used to determine antibodies described above (e.g., monoclonal antibodies, whether there is a mutation, and, if so, how the mutant differs detectably labeled antibodies, intact antibodies and frag from the wild type sequence. Mutations can also be identified ments thereof) can be used. The methods can be carried out by using cleavage agents to detect mismatched bases in RNA/ in-vitro (e.g., one can perform an enzyme linked immunosor RNA or RNA/DNA duplexes Myers et al., Science 230: bent assay (ELISA), an immunoprecipitation, an immunof 1242, (1985); Cotton et al., Proc. Natl. Acad. Sci. USA luorescence analysis, an enzyme immunoassay (EIA), a 85:4397, (1988); Saleeba et al., Methods Enzymol. 217:286 radioimmunoas say (RIA), or a Western blot analysis) or in 295, (1992). Mismatch cleavage reactions employ one or Vivo (e.g., one can introduce a labelled antibody that specifi more proteins that recognize mismatched base pairs in cally binds to a protein of the present invention into a subject double-stranded DNA (so called “DNA mismatch repair and then detect it by a standard imaging technique). Alterna enzymes; e.g., the mutY enzyme of E. coli cleaves A at G/A tively, the sample can be labeled and then contacted with an mismatches and the thymidine DNA glycosylase from HeLa antibody. For example, one can biotinylate the sample, con cells cleaves T at G/T mismatches see Hsu et al., Carcino tact it with an antibody (e.g., an antibody positioned on an genesis 15:1657-1662, (1994) and U.S. Pat. No. 5.459,039. antibody array) and then detect the bound sample (e.g., with 0658 Alterations in electrophoretic mobility can also be avidin coupled to a fluorescent label). As with methods to used to identify mutations. For example, single strand con detect nucleic acids, appropriate control studies can be per formation polymorphism (SSCP) can be used to detect dif formed in parallel with those designed to detect protein ferences in electrophoretic mobility between mutant and wild expression. type nucleic acids Orita et al., Proc. Natl. Acad. Sci. USA 0664. The diagnostic molecules disclosed herein can be 86:2766, (1989); see also Cotton Mutat. Res. 285:125-144, assembled as kits. Accordingly, the invention features kits for (1993); and Hayashi, Genet. Anal. Tech. Appl. 9:73-79, detecting the presence of the biomolecular sequences of the (1992). Single-stranded DNA fragments of sample and con present invention in a biological sample. The kit can include trol nucleic acids are denatured and allowed to renature. The a probe (e.g., a nucleic acid sequence or an antibody), a secondary structure of single-stranded nucleic acids varies standard and, optionally, instructions for use. More specifi according to sequence, and the resulting alteration in electro cally, antibody-based kits can include a first antibody (e.g., in phoretic mobility enables the detection of even a single base Solution or attached to a solid Support) that specifically binds change. The sensitivity of the assay is enhanced when RNA a protein of the present invention and, optionally, a second, (rather than DNA) is used because RNA's secondary structure different antibody that specifically binds to the first antibody is more sensitive to a change in sequence. See also Keen et al., and is conjugated to a detectable agent. Oligonucleotide Trends Genet. 7:5, (1991). The movement of mutant or wild based kits can include an oligonucleotide (e.g., a labeled type fragments through gels containing a gradient of denatur oligonucleotide) that hybridizes with one of the nucleic acids ant is also informative. of the present invention understringent conditions or a pair of 0659 When denaturing gradient gel electrophoresis oligonucleotides that can be used to amplify a nucleic acid DGGE: Myers et al., Nature 313:495, (1985) is used, DNA sequence of the present invention. The kits can also include a can be modified so it will not completely denature (this can be buffering agent, a preservative, a protein-stabilizing agent, or done by, for example by adding a GC clamp of approximately a component necessary for detecting any included label (e.g., 40 bp of high-melting GC-rich DNA by PCR). A temperature an enzyme or Substrate). The kits can also contain a control gradient can be used in place of a denaturing gradient to sample or a series of control samples that can be assayed and identify differences in the mobility of control and sample compared to the test sample contained. Each component of DNA Rosenbaum and Reissner, Biophys. Chem. 265:12753, the kit can be enclosed within an individual container, and all (1987). of the various containers can be within a single package. It 0660 Point mutations can also be detected by selective will be appreciated that the diagnostic kits of the present oligonucleotide hybridization, selective amplification, or invention may also include additional diagnostic reagents, selective primer extension Point et al., Nature 324:163, Such as diagnostic reagents for detecting the wild-type gene (1986); Saiki et al., Proc. Natl. Acad. Sci. USA 86:6230, product or known variants thereof. This combination of diag (1989) or by chemical ligation of oligonucleotides as nostic markers is likely to establish a more accurate diagno described in Xu et al., Nature Biotechnol. 19:148, (2001). S1S. Allele specific amplification technology can also be used 0665. The detection methods described hereincan identify see, e.g., Gibbs et al., Nucleic Acids Res. 17:2437-2448, a Subject who has, or is at risk of developing, a disease, (1989); Prossner, Tibtech. 11:238, (1993); and Barany, Proc. disorder, condition, or syndrome (the term “disease' is used Natl. Acad. Sci. USA 88:189, (1991). to encompass all deviations from a normal state) associated 0661 When analysis of a gene or protein is carried out in with aberrant or unwanted expression or activity of a biomo a cell or tissue sample, the cell or tissue can be immobilized lecular sequence of the present invention. The detection on a Support, typically a glass slide, and then contacted with methods also have prognostic value (e.g., they can be used to a probe that can hybridize to the nucleic acid or protein of determine whether or not it is likely that a subject will respond interest. positively (i.e., be effectively treated with) to an agent (e.g., a 0662. The detection methods of the invention can be car nucleic acid, protein, Small molecule or other drug)). Samples ried out with appropriate controls (e.g., analyses can be con can also be obtained from a subject during the course of ducted in parallel with a sample known to contain the target treatment to monitor the treatment’s efficacy at a cellular sequence and a target known to lack it). level. 0663 Various approaches can be used to determine pro 0666. The present invention also features methods of tein expression or activity. For example, one can evaluate the evaluating a sample by creating a gene expression profile for amount of protein in a sample by exposing the sample to an the sample that includes the level of expression of one or more US 2011/0091454 A1 Apr. 21, 2011 48 of biomolecular sequences of the present invention. The Sam can have a density of at least 10, 50, 100, 200, 500, 1,000, ple's profile can be compared with that of a reference profile 2,000, or 10,000 or more addresses/cm, or densities between (such as the profile of a wild-type gene product), either of these. In some embodiments, the plurality of addresses which can be obtained by the methods described herein (e.g., includes at least 10, 100, 500, 1,000, 5,000, 10,000, or 50,000 by obtaining a nucleic acid from the sample and contacting addresses, while in other embodiments, the plurality of the nucleic acid with those on an array). As with other detec addresses can be equal to, or less than, those numbers. tion methods, profile-based assays can be performed prior to 0672 Regardless of whether the array contains nucleic the onset of symptoms (in which case they can be diagnostic), acids (as probes or targets) or proteins (as probes or targets), prior to treatment (in which case they can be predictive) or the Substrate can be two-dimensional (formed, e.g., by a glass during the course of treatment (in which case they serve as slide, a wafer (e.g., silica or plastic), or a mass spectroscopy monitors) see, e.g., Golub et al., Science 286:531, (1999). plate) or three-dimensional (formed, e.g., by a gel or pad). 0667 As described hereinabove, the screening methods of Addresses in addition to the addresses of the plurality can be the invention can be used to identify candidate therapeutic disposed on the array. agents, and those agents can be evaluated further by examin 0673. At least one address of the plurality can include a ing their ability to alter the expression of one or more of the nucleic acid capture probe that hybridizes specifically to one proteins of the invention. For example, one can obtain a cell or more of the nucleic acid sequences of the present invention. from a subject, contact the cell with the agent, and Subse In certain embodiments, a subset of addresses of the plurality quently examine the cell's expression profile with respect to a will be occupied by a nucleic acid capture probe for one of the reference profile (which can be, for example, the profile of a nucleic acid sequences of the present invention; each address normal cell or that of a cell in a physiologically acceptable in the subset can bear a capture probe that hybridizes to a condition). The agent is evaluated favorably if the expression different region of a selected nucleic acid. In other embodi profile in the Subject's cell is, following exposure to the agent, ments, the probe at each address is unique, overlapping, and more similar to that of a normal cell or a cell in a physiologi complementary to a different variant of a selected nucleic cally acceptable condition. A control assay can be performed acid (e.g., an allelic variant, or all possible hypothetical vari with, for example, a cell that is not exposed to the agent. ants). If desired, the array can be used to sequence the selected 0668 Expression profiles (obtained by evaluating either nucleic acid by hybridization (see, e.g., U.S. Pat. No. 5,695, nucleic acid or protein expression) are also useful in evaluat 940). Alternatively, the capture probe can be a protein that ing Subjects. One can obtain a sample from a subject (either specifically binds to a protein of the present invention or a directly or indirectly from a caregiver), create an expression fragment thereof (e.g., a naturally-occurring interaction part profile, and, optionally, compare the Subject's expression pro ners of a protein of the invention or an antibody described file to one or more reference profiles and/or select a reference herein). In some instances (e.g., in the event of an autoim profile most similar to that of the subject. A variety of routine mune disease), it is significant that a subject produces anti statistical measures can be used to compare two reference bodies, and the arrays described herein can be used to detect profiles. One possible metric is the length of the distance those antibodies. More generally, an array that contains some vector that is the difference between the two profiles. Each of or all of the proteins of the present invention can be used to the Subject and reference profile is represented as a multi detect any Substance to which one or more those proteins bind dimensional vector, wherein each dimension is a value in the (e.g., a natural binding partner, an antibody, or a synthetic profile. molecule). 0669. The result, which can be communicated to the sub ject, a caregiver, or another interested party, can be the Sub 0674) An array can be generated by methods known to ject's expression profile perse, a result of a comparison of the those of ordinary skill in the art. For example, an array can be Subject's expression profile with another profile, a most simi generated by photolithographic methods (see, e.g., U.S. Pat. lar reference profile, or a descriptor of any of these. Commu Nos. 5,143,854: 5,510,270; and 5,527,681), mechanical nication can be mediated by a computer network (e.g., in the methods (e.g., directed-flow methods as described in U.S. form of a computer transmission Such as a computer data Pat. No. 5,384.261), pin-based methods (e.g., as described in signal embedded in a carrier wave). U.S. Pat. No. 5.288,514), and bead-based techniques (e.g., as 0670 Accordingly, the invention also features a computer described in PCT US/93/04145). Methods of producing pro medium having executable code for effecting the following tein-based arrays are described in, for example, De Wildt etal. steps: receive a Subject expression profile; access a database Nature Biotech. 18:89-994, (2000), Lueking et al. Anal. of reference expression profiles; and either i) select a match Biochem. 270:103-111, (1999), Ge Nucleic Acids Res. ing reference profile most similar to the Subject expression 28:e3, I-VII, (2000), MacBeath and Schreiber Science 289: profile, or ii) determine at least one comparison score for the 1760-1763, (2000), and WO 99/51773A1. Addresses in similarity of the Subject expression profile to at least one addition to the address of the plurality can be disposed on the reference profile. The subject expression profile and the ref array. erence expression profile each include a value representing 0675. The arrays described above can be used to analyze the expression of any of the biomolecular sequences of the the level of expression of one or more of the biomolecular present invention. For example, one can contact an array with sequences of the present invention. a sample and detect binding between a component of the sample and a component of the array. In the event nucleic Arrays and Uses Thereof acids are analyzed, one can amplify the nucleic acids obtained 0671 The present invention also encompasses arrays that from a sample prior to their application to the array. The array include a Substrate having a plurality of addresses, at least one can also be used to examine tissue-specific gene expression. of which includes a capture probe that specifically binds or For example, the nucleic acids or proteins of the invention (all hybridizes to a nucleic acid represented by any one of the or a subset thereof) can be distributed on an array that is then biomolecular sequences of the present invention. The array exposed to nucleic acids or proteins obtained from a particu US 2011/0091454 A1 Apr. 21, 2011 49 lar tissue, tumor, or cell type. If a sufficient number of diverse et al., J. Mass Spectrom. 35:258-264, 2000, and James, AIDS samples are analyzed, clustering (e.g., hierarchical cluster Treatment News Archive 209, 1994). As mentioned herein ing, k-means clustering, Bayesian clustering and the like) can above, the biomolecular sequences of the present invention be used to identify other genes that are co-regulated with may be used as markers alone or with other markers to estab those of the invention. The array can be used not only to lish an earlier and more accurate diagnosis of the disease. determine tissue specific expression, but also to ascertain the 0681. The biomolecular sequences of the present inven level of expression of a battery of genes. tion, can also serve as pharmacodynamic markers, which 0676 Array analysis of the nucleic acids or proteins of the provide an indicia of a therapeutic result. As pharmacody invention can be used to study the effect of cell-cell interac namic markers are not directly related to the disease for which tions or therapeutic agents on the expression of those nucleic the drug is being administered, their presence (or levels of acids or proteins. For example, nucleic acid or protein that has expression) indicates the presence or activity of a drug in a been obtained from a cell that has been placed in the vicinity Subject (i.e., the pharmacodynamic marker may indicate the of a tissue that has been perturbed in some way can be concentration of a drug in a biological tissue, as the gene or obtained and exposed to the probes of an array. Thus, one can protein serving as the marker is either expressed or tran use the methods of the invention to determine the effect of one scribed (or not) in the body in relationship to the level or cell type on another (i.e., the response (e.g., a change in the activity of the drug). One can also monitor the distribution of type or quantity of nucleic acids or proteins expressed) to a a drug with a pharmacodynamic marker (e.g., these markers biological stimulus can be determined). Similarly, nucleic can be used to determine whether a drug is taken up by a acid or protein that has been obtained from a cell that has been particular cell type). The presence or amount of pharmaco treated with an agent can be obtained and exposed to the dynamic markers can be related to the drug per se or to a probes of an array. In this scenario, one can determine how the metabolite produced from the drug. Thus, these markers can therapeutic agent affects the expression of any of the biomo indicate the rate at which a drug is broken down in vivo. lecular sequences of the present invention. Appropriate con Pharmacodynamic markers can be particularly sensitive (e.g., trols (e.g., assays using cells that have not received a biologi even a small amount of a drug may activate Substantial tran cal stimulus or a potentially therapeutic treatment) can be Scription or translation of a marker), and they are therefore performed in parallel. Moreover, desirable and undesirable useful in assessing drugs that are administered at low doses. responses can be detected. If an event (e.g., exposure to a Examples regarding the use of pharmacodynamic markers biological stimulus or therapeutic compound) has an unde are known in the art and include: U.S. Pat. No. 6,033,862; sirable effect on a cell, one can either avoid the event (by, e.g., Hattis et al. Env. Health Perspect. 90: 229-238, (1991); prescribing an alternative therapy) or take steps to counteract Schentag, Am. J. Health-Syst. Pharm. 56 Suppl. 3: S21-S24, or neutralize it. (1999); and Nicolau, Am. J. Health-Syst. Pharm. 56 Suppl. 3: 0677. In more straightforward assays, the arrays described S16-S20, (1991). here can be used to monitor the expression of one or more of 0682. The biomolecular sequences of the present inven the biomolecular sequences of the present invention, with tion, are also useful as pharmacogenomic markers, which can respect to time. Such analyses allow one to characterize a provide an objective correlate to a specific clinical drug disease process associated with the examined sequence. response or Susceptibility in a particular Subject or class of 0678. The arrays are also useful for ascertaining the effect subjects see, e.g., McLeod et al., Eur. J. Cancer 35: 1650 of the expression of a gene on the expression of other genes in 1652, (1999). The presence or amount of the pharmacoge the same cell or in different cells (e.g., ascertaining the effect nomic marker is related to the predicted response of a subject of the expression of any one of the biomolecular sequences of to a specific drug (or type of drug) prior to administration of the present invention on the expression of other genes). If the drug. By assessing one or more pharmacogenomic mark altering the expression of one gene has a deleterious effect on ers in a subject, the drug therapy that is most appropriate for the cell (due to its effect on the expression of other genes) one the subject, or which is predicted to have a greater likelihood can, again, avoid that effect (by, e.g., selecting an alternate of Success, can be selected. For example, based on the pres molecular target or counteracting or neutralizing the effect). ence or amount of RNA or protein associated with a specific tumor marker in a Subject, an optimal drug or treatment Markers regime can be prescribed for the subject. 0679 The molecules of the present invention are also use 0683 More generally, pharmacogenomics addresses the ful as markers of: (i) a cell or tissue type; (ii) disease; (iii) a relationship between an individual's genotype and that indi pre-disease state; (iv) drug activity, and (v) predisposition for vidual’s response to a foreign compound or drug. Differences disease. in the way individual Subjects metabolize therapeutics can 0680 Using the methods described herein, the presence or lead to severe toxicity or therapeutic failure because metabo amount of the biomolecular sequences of the present inven lism alters the relation between dose and blood concentration tion, can be detected and correlated with one or more biologi of the pharmacologically active drug. Thus, a physician cal states (e.g., a disease state or a developmental state). When would consider the results of pharmacogenomic studies when used in this way, the compositions of the invention serve as determining whether to administer a composition of the Surrogate markers; they provide an objective indicia of the present invention and how to tailor a therapeutic regimen for presence or extent of a disease (e.g., cancer). Surrogate mark the subject. ers are particularly useful when a disease is difficult to assess 0684 Pharmacogenomics deals with clinically significant with standard methods (e.g., when a Subject has a small tumor hereditary variations in the response to drugs due to altered or when pre-cancerous cells are present). It follows that Sur drug disposition and abnormal action in affected persons. rogate markers can be used to assess a disease before a poten See, e.g., Eichelbaum et al., Clin. Exp. Pharmacol. Physiol. tially dangerous clinical endpoint is reached. Other examples 23:983-985, (1996), and Linder et al., Clin. Chem. 43:254 of Surrogate markers are known in the art (see, e.g., Koomen 266, (1997). In general, two types of pharmacogenetic con US 2011/0091454 A1 Apr. 21, 2011 50 ditions can be differentiated. Genetic conditions transmitted 0689. The computer readable storage medium further as a single factor can: (i) alter the way drugs act on the body includes sequence annotations (as described in Examples 10 (altered drug action) or (ii) the way the body acts on drugs and 22 of the Examples section). (altered drug metabolism). These pharmacogenetic condi 0690. The computer readable storage medium can further tions can occur either as rare genetic defects or as naturally include information pertaining to generation of the data and/ occurring polymorphisms. or potential uses thereof. 0685. One approach that can be used to identify genes that 0691 As used herein, a “computer-readable medium’ predict drug response, known as “a genome-wide associa refers to any medium that can be read and accessed directly by tion, relies primarily on a high-resolution map of the human a machine e.g., a digital or analog computer; e.g., a desktop genome consisting of already known gene-related markers PC, laptop, mainframe, server (e.g., a web server, network (e.g., a “bi-allelic' gene marker map that consists of 60,000 server, or server farm), a handheld digital assistant, pager, 100,000 polymorphic or variable sites on the human genome, mobile telephone, or the like. Computer-readablemedia each of which has two variants.) Such a high-resolution include: magnetic storage media, such as floppy discs, hard genetic map can be compared to a map of the genome of each disc storage medium, and magnetic tape; optical storage of a statistically significant number of patients taking part in media such as CD-ROM; electrical storage media such as a Phase II/III drug trial to identify markers associated with a RAM, ROM, EPROM, EEPROM, flash memory, and the like: particular observed drug response or side effect. Alterna and hybrids of these categories such as magnetic/optical Stor tively, a high resolution map can be generated from a combi age media. nation of known and newly uncovered single nucleotide poly 0692. A variety of data storage structures are available to morphisms (SNPs; a common alteration that occurs in a those of ordinary skill in the art and can be used to create a single nucleotide base in a stretch of DNA, see Example 22) computer-readablemedium that has recorded one or more (or in the human genome. For example, a SNP may occur once all) of the nucleic acids and/or amino acid sequences of the per every 1000 bases of DNA. While a SNP may be involved present invention. The data storage structure will generally in a disease process, the vast majority may not be disease depend on the means chosen to access the stored information. associated. Given a genetic map based on the occurrence of In addition, a variety of data processor programs and formats Such SNPs, individuals can be grouped into genetic catego can be used to store the sequence information of the present ries depending on a particular pattern of SNPs in their indi invention on machine or computer-readable medium. The vidual genome. In Such a manner, treatment regimens can be sequence information can be represented in a word process tailored to groups of genetically similar individuals, taking ing text file, formatted in commercially-available software into account traits that may be common among Such geneti such as WordPerfect and Microsoft Word, or represented in a cally similar individuals. file using a form of encoding of characters such as ASCII or 0686. Two alternative methods, the “candidate gene EBCDIC, stored in a database application, such as DB2, approach” and "gene expression profiling.” can be used to Sybase, Oracle, or the like. One of ordinary skill in the art can identify pharmacogenomic markers. According to the first readily adapt any number of data processor structuring for method, if a gene that encodes a drug's target is known, all mats (e.g., text file or database) to obtain machine or com common variants of that gene can be fairly easily identified in puter-readable medium having recorded thereon the the population, and one can determine whether having one sequence information of the present invention. version of the gene versus another is associated with a par 0693. The sequence information and annotations are ticular drug response. In the second approach, the gene stored in a relational database (such as Sybase or Oracle) that expression of an animal dosed with a drug (e.g., a composi can have a first table for storing sequence (nucleic acid and/or tion of the invention) can reveal whether gene pathways amino acid sequence) information. The sequence information related to toxicity have been activated. can be stored in one field (e.g., a first column) of a table row 0687 Information generated using one or more of the and an identifier for the sequence can be stored in another approaches described above can be used in designing thera field (e.g., a second column) of the table row. The database peutic or prophylactic treatments that are less likely to fail or can have a second table (to, for example, store annotations). to produce adverse side effects when a subject is treated with The second table can have a field for the sequence identifier, a therapeutic composition. a field for a descriptor or annotation text (e.g., the descriptor can refer to a functionality of the sequence), a field for the initial position in the sequence to which the annotation refers, Informatics and a field for the ultimate position in the sequence to which the annotation refers. Examples for annotation to nucleic acid 0688. The biomolecular sequences of the present inven sequences and amino acid sequences are provided in tion can be provided in a variety of media to facilitate their use. For example, one or more of the sequences (e.g., Subsets Examples 10 and 14-20 of the Examples section. of the sequences expressed in a defined tissue type) can be provided as a manufacture (e.g., a computer-readable storage Pharmaceutical Compositions medium such as a magnetic, optical, optico-magnetic, chemi 0694. The nucleic acids, fragments thereof, hybrid cal or mechanical information storage device). The manufac sequences of which they are a part, and gene constructs con ture can provide a nucleic acid or amino acid sequence in a taining them; proteins, fragments thereof, chimeras, and anti form that will allow examination of the manufacture in ways bodies that specifically bind thereto; and cells, including that are not applicable to a sequence that exists in nature or in those that are engineered to express the nucleic acids or purified form. The sequence information can include full proteins of the invention) can be incorporated into pharma length sequences, fragments thereof, polymorphic sequences ceutical compositions. These compositions typically also including single nucleotide polymorphisms (SNPs), epitope include a solvent, a dispersion medium, a coating, an antimi sequence, and the like. crobial (e.g., an antibacterial or antifungal) agent, an absorp US 2011/0091454 A1 Apr. 21, 2011

tion delaying agent (when desired. Such as aluminum 0702. The toxicity and therapeutic efficacy of any given monostearate and gelatin), or the like, compatible with phar compound can be determined by standard pharmaceutical maceutical administration (see below). Active compounds, in procedures carried out in cell culture or in experimental ani addition to those of the present invention, can also be included mals. For example, one of ordinary skill in the art can rou in the composition and may enhance or Supplement the activ tinely determine the LD50 (the dose lethal to 50% of the ity of the present agents. population) and the ED50 (the dose therapeutically effective 0695. The composition will be formulated in accordance in 50% of the population). The dose ratio between toxic and with their intended route of administration. Acceptable routes therapeutic effects is the therapeutic index. Compounds that include oral or parenteral routes (e.g., intravenous, intrader exhibit high therapeutic indices are preferred. While com mal, transdermal (e.g., Subcutaneous or topical), or transmu pounds that exhibit toxic side effects may be used, care should cosal (i.e., across a membrane that lines the respiratory or be taken to design a delivery system that targets such com anogenital tract). The compositions can be formulated as a pounds to the site of affected tissue in order to minimize Solution or Suspension and, thus, can include a sterile diluent potential damage to uninfected cells and, thereby, reduce side (e.g., water, Saline solution, a fixed oil, polyethylene glycol, effects. glycerine, propylene glycol or another synthetic solvent); an 0703. The data obtained from the cell culture assays and antimicrobial agent (e.g., benzyl alcohol or methyl parabens; animal studies described hereinabove can be used to formu chlorobutanol, phenol, ascorbic acid, thimerosal, and the late a range of dosage for use in humans (prefarably a dosage like); an antioxidant (e.g., ascorbic acid or sodium bisulfite); within a range of circulating concentrations that include the a chelating agent (e.g., ethylenediaminetetraacetic acid); or a ED50 with little or no toxicity). The dosage may vary within buffer (e.g., an acetate-, citrate-, or phosphate-based buffer). this range depending upon the formulation and the route of When necessary, the pH of the solution or suspension can be administration. For any compound used in the method of the adjusted with an acid (e.g., hydrochloric acid) or a base (e.g., invention, the therapeutically effective dose can be estimated Sodium hydroxide). Proper fluidity (which can ease passage initially from cell culture assays. A dose may beformulated in through a needle) can be maintained by a coating Such as animal models to achieve a circulating plasma concentration lecithin, by maintaining the required particle size (in the case range that includes the IC50 (i.e., the concentration of the test of a dispersion), or by the use of Surfactants. compound which achieves a half-maximal inhibition of 0696. The compositions of the invention can be prepared symptoms) as determined in cell culture. Such information as sterile powders (by, e.g., vacuum drying or freeze-drying), can be used to more accurately determine useful doses in which can contain the active ingredient plus any additional humans. Levels in plasma may be measured, for example, by desired ingredient from a previously sterile-filtered solution. high performance liquid chromatography. 0697 Oral compositions generally include an inert diluent 0704. A therapeutically effective amount of a protein of or an edible carrier. For example, the active compound can be the present invention can range from about 0.001 to 30 mg/kg incorporated with excipients and used in the form of tablets, body weight (e.g., about 0.01 to 25 mg/kg, about 0.1 to 20 troches, or capsules (e.g., gelatin capsules). Oral composi mg/kg, or about 1 to 10 (e.g., 2-9, 3-8, 4-7, or 5-6) mg/kg). tions can be prepared using fluid carries and used as mouth The protein can be administered one time per week for washes. The tablets etc. can also containabinder (e.g., micro between about 1 to 10 weeks (e.g., 2 to 8 weeks, 3 to 7 weeks, crystalline cellulose, gum tragacanth, or gelatin); an excipient or about 4, 5, or 6 weeks). However, a single administration (e.g., starch or lactose), a disintegrating agent (e.g., alginic can also be efficacious. Certain factors can influence the acid, Primogel, or corn starch); a lubricang (e.g., magnesium dosage and timing required to effectively treat a Subject. Stearate or Sterotes); a glidant (e.g., colloidal silicon dioxide); These factors include the severity of the disease, previous a Sweetening agent (e.g., Sucrose or saccharine); or a flavoring treatments, and the general health or age of the Subject. agent (e.g., peppermint, methyl salicylate, or orange flavor 0705. When the active ingredient is an antibody, the dos ing. age can be about 0.1 mg/kg of body weight (generally 10-20 0698. For administration by way of the respiratory system, mg/kg). If the antibody is to act in the brain, a dosage of 50 the compositions can be formulated as aerosol sprays (e.g., mg/kg to 100 mg/kg is usually appropriate. Generally, par from a pressured container or dispenser that contains a Suit tially human antibodies and fully human antibodies have a able propellant (e.g., a gas Such as carbon dioxide), or a longer half-life within the human body than other antibodies. nebulizer. The ability of a composition to cross a biological Accordingly, lower dosages and less frequent administration barrier can be enhanced by agents known in the art. For are often possible with these types of antibodies. Modifica example, detergents, bile salts, and fusidic acid derivatives tions such as lipidation can be used to stabilize antibodies and can facilitate transport across the mucosa (and therefore, be to enhance uptake and tissue penetration e.g., into the brain; included in nasal sprays or Suppositories). see Cruikshank et al., J. Acquired Immune Deficiency Syn 0699 For topical administration, the active compounds dromes and Human Retrovirology 14:193, (1997). are formulated into ointments, salves, gels, or creams accord 0706. As noted above, the present invention encompasses ing to methods known in the art. agents (e.g., Small molecules) that modulate expression or 0700 Controlled release can also be achieved by using activity of a nucleic acid represented by any of biomolecular implants and microencapsulated delivery systems (see, e.g., sequences of the present invention. Exemplary doses of these the materials commercially available from Alza Corporation agents include milligram or microgram amounts of the Small and Nova Pharmaceuticals, Inc.; see also U.S. Pat. No. 4,522, molecule per kilogram of Subject or sample weight (e.g., 811 for the use of liposome-based suspensions). about 1-500 mg/kg; about 100 mg/kg; about 5 mg/kg; about 1 0701. The pharmaceutical compositions of the invention mg/kg, or about 50 ug/kg). Appropriate doses of a small can be formulated in dosage units (i.e., physically discrete molecule depend upon the potency of the small molecule with units containing a predetermined quantity of the active com respect to the expression or activity to be modulated. When pound) for uniformity and ease of administration. one or more of these small molecules is to be administered to US 2011/0091454 A1 Apr. 21, 2011 52 an animal (e.g., a human) to modulate expression or activity system. The pharmaceutical compositions of the invention of nucleic acid or protein of the invention, a physician, Vet can be included in a container, pack, or dispenser together erinarian, or researcher may prescribe a relatively low dose at with instructions for administration. first, Subsequently increasing the dose until an appropriate response is obtained. In addition, it is understood that the Methods of Treatment specific dose level for any particular animal subject will 0710. The present invention provides for both prophylac depend upon a variety of factors including the activity of the tic and therapeutic methods of treating a subject at risk of (or specific compound employed, the age, body weight, general Susceptible to) a disorder or having a disorder associated with health, gender, and diet of the Subject, the time of adminis aberrant or unwanted expression or activity of a nucleic acid tration, the route of administration, the rate of excretion, any or protein of the invention. “Treatment' encompasses the drug combination, and the degree of expression or activity to application or administration of a therapeutic agent to a be modulated. patient, or to an isolated tissue or cell line (e.g., one obtained 0707 Pharmaceutical compositions of the present inven from the patient to be treated), with the purpose of curing or tion may also include atherapeutic moiety Such as a cytotoxin lessening the severity of the disease or a symptom associated (i.e., an agent that is detrimental to a cell), a therapeutic agent, with the disease. or a radioactive ion can be conjugated to the biomolecular 0711. Whether carried out prophylactically or therapeuti sequences of the present invention or related compositions, cally, the methods of the invention can be specifically tailored described hereinabove (e.g., antibodies, antisense molecules, or modified, based on knowledge obtained from the field of ribozymes etc.). The cytotoxin can be, for example, taxol. pharmacogenomics (see above). cytochalasin B, gramicidin D, ethidium bromide, emetine, 0712. Thus, the invention provides a method for prevent mitomycin, etoposide, tenoposide, Vincristine, vinblastine, ing in a subject, a disease which onset or progression depends colchicin, doxorubicin, daunorubicin, dihydroxy anthracin on the expression and/or activity of the biomolecular dione, mitoxantrone, mithramycin, actinomycin D, 1-dehy sequences of the present invention or variants or homologs drotestosterone, glucocorticoids, procaine, tetracaine, thereof. Such diseases include cellular proliferative and/or lidocaine, propranolol, puromycin, maytansinoids (e.g., differentiative disorders, disorders associated with bone maytansinol; see U.S. Pat. No. 5,208,020), CC-1065 (see metabolism, immune disorders, cardiovascular disorders, U.S. Pat. Nos. 5,475,092, 5,585,499, and 5,846,545) and liver disorders, viral diseases, pain or metabolic disorders. analogs or homologs thereof. Therapeutic agents include 0713 Examples of cellular proliferative and/or differen antimetabolites (e.g., methotrexate, 6-mercaptopurine, tiative disorders include cancer (e.g., carcinoma, sarcoma, 6-thioguanine, cytarabine, 5-fluorouracil decarbazine), alky metastatic disorders or hematopoietic neoplastic disorders lating agents (e.g., mechlorethamine, thioepa chlorambucil, Such as leukemias and lymphomas). A metastatic tumor can CC-1065, melphalan, carmustine (BSNU) and lomustine arise from a multitude of primary tumor types, including but (CCNU), cyclothosphamide, busulfan, dibromomannitol, not limited to those of prostate, colon, lung, breast or liver. streptozotocin, mitomycin C, and cis-dichlorodiamine plati 0714. The terms “cancer,” “hyperproliferative,” and “neo num(II) (DDP) cisplatin), anthracyclines (e.g., daunorubicin plastic are used in reference to cells that have exhibited a (formerly daunomycin) and doxorubicin), antibiotics (e.g., capacity for autonomous growth (i.e., an abnormal state or dactinomycin (formerly actinomycin), bleomycin, mithra condition characterized by rapid cellular proliferation). mycin, and anthramycin (AMC)), and anti-mitotic agents Hyperproliferative and neoplastic disease states can be cat (e.g., Vincristine, vinblastine, taxol and maytansinoids). egorized as pathologic (i.e., characterizing or constituting a Radioactive ions include, but are not limited to iodine, disease state), or can be categorized as non-pathologic (i.e., yttrium and praseodymium. deviating from normal but not associated with a disease state). 0708. Other therapeutic moieties include, but are not lim The term is meant to include all types of cancerous growths or ited to, toxins such as abrin, ricin A, pseudomonas exotoxin, oncogenic processes, metastatic tissues or malignantly trans or diphtheria toxin; a protein Such as tumor necrosis factor, formed cells, tissues, or organs, irrespective of histopatho Y-interferon, B-interferon, nerve growth factor, platelet logic type or stage of invasiveness. “Pathologic hyperprolif derived growth factor, tissue plasminogen activator; or, bio erative” cells occur in disease states characterized by logical response modifiers such as, for example, lymphok malignant tumor growth. Examples of non-pathologic hyper ines, interleukin-1 (IL-1), interleukin-2 (IL-2), interleukin-6 proliferative cells include proliferation of cells associated (IL-6), granulocyte macrophase colony stimulating factor with wound repair. (GM-CSF), granulocyte colony stimulating factor (G-CSF), 0715. The terms “cancer or “neoplasms include malig or other growth factors. nancies of the various organ systems, such as affecting lung, 0709. The nucleic acid molecules of the invention can be breast, thyroid, lymphoid, gastrointestinal, and genitourinary inserted into Vectors and used as gene therapy vectors. Gene tract, as well as adenocarcinomas, which include malignan therapy vectors can be delivered to a subject by, for example, cies such as most colon cancers, renal-cell carcinoma, pros intravenous injection, local administration (see U.S. Pat. No. tate cancer and/or testicular tumors, non-Small cell carcinoma 5.328,470) or by stereotactic injection (see e.g., Chen et al., of the lung, cancer of the Small intestine and cancer of the Proc. Natl. Acad. Sci. USA 91:3054-3057, 1994). The phar esophagus. maceutical preparation of the gene therapy vector can include 0716. The term “carcinoma’ refers to malignancies of epi the gene therapy vector in an acceptable diluent, or can com thelial or endocrine tissues including respiratory system car prise a slow release matrix in which the gene delivery vehicle cinomas, gastrointestinal system carcinomas, genitourinary is imbedded. Alternatively, where the complete gene delivery system carcinomas, testicular carcinomas, breast carcinomas, vector can be produced intact from recombinant cells (e.g. prostatic carcinomas, endocrine system carcinomas, and retroviral vectors), the pharmaceutical preparation can melanomas. Exemplary carcinomas include those forming include one or more cells which produce the gene delivery from tissue of the cervix, lung, prostate, breast, head and US 2011/0091454 A1 Apr. 21, 2011

neck, colon and ovary. The term also includes carcinosarco 0719 AS discussed, diseases associated (e.g., causally mas (e.g., which include malignant tumors composed of car associated) with increased expression or activity of a protein cinomatous and sarcomatous tissues). An "adenocarcinoma of the present invention (as determined, for example, by the in refers to a carcinoma derived from glandular tissue or in vivo or ex vivo analyses described above), can be treated with which the tumor cells form recognizable glandular structures. techniques in which one inhibits the expression or activity of The term "sarcoma' is art recognized and refers to malignant the nucleic acid or its gene products. For example, a com tumors of mesenchymal derivation. As used herein, the term pound (e.g., an agent identified using an assay described "hematopoietic neoplastic disorder(s)' includes diseases above) that exhibits negative modulatory activity with respect involving hyperplastic/neoplastic cells of hematopoietic ori to a nucleic acid of the invention (the expression or over gin. A hematopoietic neoplastic disorder can arise from expression of which is causally associated with a disease) can myeloid, lymphoid or erythroid lineages, or precursor cells be used to prevent and/or ameliorate that disease or one or thereof. Preferably, the diseases arise from poorly differenti more of the symptoms associated with it. The compound can ated acute leukemias (e.g., erythroblastic leukemia and acute be a peptide, phosphopeptide, Small organic or inorganic megakaryoblastic leukemia). Additional exemplary myeloid disorders include, but are not limited to, acute promyeloid molecule, or antibody (e.g., a polyclonal, monoclonal, leukemia (APML), acute myelogenous leukemia (AML) and humanized, anti-idiotypic, chimeric or single chain antibod chronic myelogenous leukemia (CML) (see Vaickus, Crit. ies, and Fab, F(ab')2 and Fab expression library fragments, Rev. in Oncol./Hemotol. 11:267-97, 1991); lymphoid malig ScFV molecules, and epitope-binding fragments thereof). nancies include, but are not limited to acute lymphoblastic 0720. Further, antisense, ribozyme, and triple-helix mol leukemia (ALL) which includes B-lineage ALL and T-lin ecules (see above) that inhibit expression of the target gene eage ALL, chronic lymphocytic leukemia (CLL), prolym (e.g., a gene of the invention) can also be used to reduce the phocytic leukemia (PLL), hairy cell leukemia (HLL) and level of target gene expression, thus effectively reducing the Waldenstrom's macroglobulinemia (WM). Additional forms level of target gene activity. If necessary, to achieve a desir of malignantlymphomas include, but are not limited to non able level of gene expression, molecules that inhibit gene Hodgkin lymphoma and variants thereof, peripheral T cell expression can be administered with nucleic acid molecules lymphomas, adult T cell leukemia/lymphoma (ATL), cutane that encode and express target gene polypeptides exhibiting ous T-cell lymphoma (CTCL), large granular lymphocytic normal target gene activity. Of course, where the assays of the leukemia (LGF), Hodgkin's disease and Reed-Sternberg dis invention indicate that expression or over expression is desir CaSC. able, the nucleic acid can be introduced into cells via gene 0717 The leukemias, including B-lymphoid leukemias, therapy methods with little or no treatment with inhibitory T-lymphoid leukemias, undifferentiated leukemias, erythro agents (this can be done to combat not only under expression, leukemia, megakaryoblastic leukemia, and monocytic leuke but over secretion of a gene product). mias are encompassed with and without differentiation; chronic and acute lymphoblastic leukemia, chronic and acute 0721 Aptamer molecules (nucleic acid molecules having lymphocytic leukemia, chronic and acute myelogenous leu a tertiary structure that permits them to specifically bind to kemia, lymphoma, myelo dysplastic syndrome, chronic and protein ligands; see, e.g., Osborne et al., Curr. Opin. Chem. acute myeloid leukemia, myelomonocytic leukemia; chronic Biol. 1: 5-9, (1997) and Patel Curr. Opin. Chem. Biol. 1:32 and acute myeloblastic leukemia, chronic and acute myelog 46, (1997) are also useful therapeutics. Since nucleic acid enous leukemia, chronic and acute promyelocytic leukemia, molecules can usually be more conveniently introduced into chronic and acute myelocytic leukemia, hematologic malig target cells than therapeutic proteins may be, aptamers offer a nancies of monocyte-macrophage lineage. Such as juvenile method by which protein activity can be specifically chronic myelogenous leukemia; secondary AML, antecedent decreased without the introduction of drugs or other mol hematological disorder; refractory anemia; aplastic anemia; ecules that may have pluripotent effects. reactive cutaneous angioendotheliomatosis; fibrosing disor 0722. As noted above, the nucleic acids of the invention ders involving altered expression in dendritic cells, disorders and the proteins they encode can be used as immunothera including systemic sclerosis, E-M syndrome, epidemic toxic peutic agents (to, e.g., elicit an immune response against a oil syndrome, eosinophilic fasciitis localized forms of scle protein of interest). However, in Some circumstances, unde roderma, keloid, and fibrosing colonopathy; angiomatoid sirable effects occur when a subject is injected with a protein malignant fibrous histiocytoma; carcinoma, including pri or an epitope that stimulate antibody production. In those mary head and neck squamous cell carcinoma; sarcoma, circumstances, one can instead generate an immune response including kaposi's sarcoma; fibroadanoma and phyllodes with an anti-idiotypic antibody see, e.g., Herlyn, Ann. Med. tumors, including mammary fibroadenoma; Stromal tumors; 31:66-78, 1991 and Bhattacharya-Chatterjee and Foon, Can phyllodes tumors, including histiocytoma; erythroblastosis: cer Treat. Res. 94:51-68, (1998). Effective anti-idiotypic and neurofibromatosis. antibodies stimulate the production of anti-anti-idiotypic 0718. Examples of disorders involving the heart or “car antibodies, which specifically bind the protein in question. diovascular disorders' include, but are not limited to, a dis Vaccines directed to a disease characterized by expression of ease, disorder, or state involving the cardiovascular system, the nucleic acids of the present invention can also be gener e.g., the heart, the blood vessels, and/or the blood. A cardio ated in this fashion. In other circumstances, the target antigen vascular disorder can be caused by an imbalance in arterial is intracellular. In these circumstances, antibodies (including pressure, a malfunction of the heart, oran occlusion of a blood fragments, single chain antibodies, or other types of antibod vessel, e.g., by a thrombus. Examples of Such disorders ies described above) can be internalized within a cell by include hypertension, atherosclerosis, coronary artery spasm, delivering them with, for example, a lipid-based delivery congestive heart failure, coronary artery disease, Valvular system (e.g., Lipofectin M or liposomes). Single chain anti disease, arrhythmias, and cardiomyopathies. bodies can also be administered by delivering nucleotide US 2011/0091454 A1 Apr. 21, 2011 54 sequences that encode them to the target cell population (see, preferably adenoviruses, AAV, lentiviruses, or retroviruses. A e.g., Marasco et al., Proc. Natl. Acad. Sci. USA 90:7889 viral construct such as a retroviral construct includes at least 7893, 1993). one transcriptional promoter/enhancer or locus-defining ele 0723. Alternatively, treatment of diseases associated with ment(s), or other elements that control gene expression by over expression or activity of a wild-type variant of the bio other means Such as alternate splicing, nuclear RNA export, molecular sequences of the present invention can be effected or post-translational modification of messenger. Such vector by upregulating expression or activity of the polypeptides of constructs also include a packaging signal, long terminal the present invention in cases where they have an activity repeats (LTRs) or portions thereof, and positive and negative which antagonizes that of the wild-type protein (e.g., soluble Strand primer binding sites appropriate to the virus used, receptor which antagonizes the activity of the wild type receptor as described hereinabove). Upregulating expression unless it is already present in the viral construct. In addition, of the polypeptides of the present invention in a Subject may Such a construct typically includes a signal sequence for be effected via the administration of at least one of the exog secretion of the peptide from a host cell in which it is placed. enous polynucleotide sequences of the present invention Preferably the signal sequence for this purpose is a mamma ligated into a nucleic acid expression construct designed for lian signal sequence or the signal sequence of the polypeptide expression of coding sequences in eukaryotic cells (e.g., variants of the present invention. Optionally, the construct mammalian cells). Accordingly, the exogenous polynucle may also include a signal that directs polyadenylation, as well otide sequence may be a DNA or RNA sequence encoding the as one or more restriction sites and a translation termination polypeptides of the present invention or active portions sequence. By way of example, such constructs will typically thereof. include a 5' LTR, a tRNA binding site, a packaging signal, an 0724. It will be appreciated that the nucleic acid construct origin of second-strand DNA synthesis, and a 3' LTR or a can be administered to the individual employing any Suitable portion thereof. Other vectors can be used that are non-viral, mode of administration, described hereinbelow (i.e., in-vivo Such as cationic lipids, polylysine, and dendrimers. gene therapy). Alternatively, the nucleic acid construct is 0728. It will be appreciated that the present methodology introduced into a suitable cell via an appropriate gene deliv may also be effected by specifically upregulating the expres ery vehicle/method (transfection, transduction, homologous sion of the splice variants of the present invention endog recombination, etc.) and an expression system as needed and enously in the Subject. Agents for upregulating endogenous then the modified cells are expanded in culture and returned to expression of specific splice variants of a given gene include the individual (i.e., ex-vivo gene therapy). antisense oligonucleotides, which are directed at splice sites 0725 Preferably, the promoter utilized by the nucleic acid of interest, thereby altering the splicing pattern of the gene. construct of the present invention is active in the specific cell This approach has been successfully used for shifting the population transformed. Examples of cell type-specific and/ balance of expression of the two isoforms of Bcl-X Taylor or tissue-specific promoters include promoters, such as albu (1999) Nat. Biotechnol. 17:1097-1100; and Mercatante min that is liver specific Pinkert et al., (1987) Genes Dev. (2001) J. Biol. Chem. 276:1641 1-16417): IL-5R Karras 1:268-277, lymphoid specific promoters Calame et al., (2000) Mol. Pharmacol. 58:380-387); and c-myc Giles (1988) Adv. Immunol. 43:235-275; in particular promoters (1999) Antisense Acid Drug Dev. 9:213-220. of T-cell receptors Winoto et al., (1989) EMBO J.8:729-733 0729. For example, interleukin 5 and its receptor play a and immunoglobulins; Banerji et al. (1983) Cell 33729 critical role as regulators of hematopoiesis and as mediators 740, neuron-specific promoters such as the neurofilament in Some inflammatory diseases such as allergy and asthma. promoter Byrne et al. (1989) Proc. Natl. Acad. Sci. USA Two alternatively spliced isoforms are generated from the 86:5473-5477, pancreas-specific promoters Edlunch et al. IL-5R gene, which include (i.e., long form) or exclude (i.e., (1985) Science 230:912-916 or mammary gland-specific short form) exon 9. The long form encodes for the intact promoters such as the milk whey promoter (U.S. Pat. No. membrane-bound receptor, while the shorter form encodes 4,873.316 and European Application Publication No. 264, for a secreted soluble non-functional receptor. Using 2'-O- 166). MOE-oligonucleotides specific to regions of exon 9, Karras 0726 Examples of suitable constructs include, but are not and co-workers (Supra) were able to significantly decrease the limited to, pcDNA3, pcDNA3.1 (+/-), pGL3, PZeoSV2 (+/-), expression of the wild type receptor and increase the expres pDisplay, pEF/myc/cyto, pCMV/myc/cy to each of which is sion of the shorter isoforms. Design and synthesis of oligo commercially available from Invitrogen Co. (www.invitro nucleotides which can be used according to the present inven gen.com). Examples of retroviral vector and packaging sys tion are described hereinbelow and by Sazani and Kole (2003) tems are those sold by Clontech, San Diego, Calif., including Progress in Moleclular and Subcellular Biology 31:217-239. Retro-X vectors plNCX and pLXSN, which permit cloning 0730. Alternatively or additionally, upregulation may be into multiple cloning sites and the trasgene is transcribed effected by administering to the subject at least one of the from CMV promoter. Vectors derived from Mo-MuLV are polypeptides of the present invention (e.g., recombinant or also included such as pBabe, where the transgene will be synthetic) or an active portion thereof, as described herein transcribed from the 5'LTR promoter. above. However, since the bioavailability of large polypep 0727 Currently preferred in vivo nucleic acid transfer tides is relatively small due to high degradation rate and low techniques include transfection with viral or non-viral con penetration rate, administration of polypeptides is preferably structs, such as adenovirus, lentivirus, Herpes simplex I virus, confined to Small peptide fragments (e.g., about 100 amino or adeno-associated virus (AAV) and lipid-based systems. acids). Useful lipids for lipid-mediated transfer of the gene are, for 0731. It will be appreciated that the treatment methods of example, DOTMA, DOPE, and DC-Chol Tonkinson et al., the present invention may be combined with other therapeutic Cancer Investigation, 14(1): 54-65 (1996). The most pre modalities (e.g., radiotherapy, chemotherapy) to increase ferred constructs for use in gene therapy are viruses, most therapeutic efficacy. US 2011/0091454 A1 Apr. 21, 2011

0732. Additional objects, advantages, and novel features be elucidated. Accumulative EST information coming from of the present invention will become apparent to one ordi heterogeneous tissues and cell-types, can be used as a con narily skilled in the art upon examination of the following siderable source to understanding some of the events inherent examples, which are not intended to be limiting. Additionally, to carcinogenesis. each of the various embodiments and aspects of the present 0736. Although a large number of current bioinformatics invention as delineated hereinabove and as claimed in the tools are used to predict tissue specific genes in general and claims section below finds experimental Support in the fol cancer specific genes in particular, all fail to consider alter lowing examples. natively spliced variants Boguski and Schuler (1995) Nat. Genet. 10:369-71, Audic and Clayerie (1997) Genome Res. EXAMPLES 7:986-995: Huminiecki and Bicknell (2000) Genome Res. 0733 Reference is now made to the following examples, 10:1796-1806; Kawamoto et al. (2000) Genome Res. which together with the above descriptions, illustrate the 10:1817-1827. Alternative splicing is also overlooked by wet invention in a non limiting fashion. laboratory methods such as SAGE and microarray experi 0734 Generally, the nomenclature used herein and the ments which have been widely used to study gene expression, laboratory procedures utilized in the present invention however remain to be linked to alternative splicing modeling include molecular, biochemical, microbiological and recom see Background section and Valculescu et al. (1995) Science binant DNA techniques. Such techniques are thoroughly 270:484-487; Caronetal. (2001) Science 291: 1289-1292 and explained in the literature. See, for example, “Molecular Schena et al. (1995) Science 270:467-470). Cloning: A laboratory Manual Sambrook et al., (1989); 0737. A computational-based approach was developed to “Current Protocols in Molecular Biology” Volumes I-III identify alternatively spliced transcripts, which are expressed Ausubel, R. M., ed. (1994); Ausubel et al., “Current Protocols in a temporal and/or spatial pattern. Examples 1-4 below in Molecular Biology”, John Wiley and Sons, Baltimore, Md. describe the identification of cancer specific alternatively (1989); Perbal, “A Practical Guide to Molecular Cloning, spliced isoforms, which were identified according to the John Wiley & Sons, New York (1988); Watsonet al., “Recom teachings of the present invention. binant DNA, Scientific American Books, New York; Birren 0738 Experimental Procedures and Reagents et al. (eds) "Genome Analysis: A Laboratory Manual Series'. 0739 DATA and LEADS alternative splicing modeling Vols. 1-4, Cold Spring Harbor Laboratory Press, New York GenBank version 125 with genomic build #25 from National (1998); methodologies as set forth in U.S. Pat. Nos. 4,666, Center for Biotechnology Information (NCBI) was used as an 828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057; “Cell input to the LEADS platform as described Shoshan et al. Biology: A Laboratory Handbook'', Volumes I-III Cellis, J. (2001) Proc. SPIE Microarrays: Optical Technologies and E., ed. (1994): “Current Protocols in Immunology Volumes Informatics 4266:86-95; Matloubian (2000) Nat. Immunol. I-III Coligan J. E., ed. (1994); Stites et al. (eds), “Basic and 1:298-304: David et al. (2002) J. Biol. Chem. 277:18084 Clinical Immunology” (8th Edition), Appleton & Lange, 18090; Sorek et al. (2002) Genome Res. 12:1060-7. Uni Norwalk, Conn. (1994); Mishell and Shiigi (eds), “Selected Gene Build #146 and libraryQuest.txt were obtained from Methods in Cellular Immunology'. W.H. Freeman and Co., NCBI and Cancer Genome Anatomy Project (CGAP) in New York (1980); available immunoassays are extensively National Cancer Institute (NCI), respectively. described in the patent and scientific literature, see, for 0740 EST tissue information EST information was example, U.S. Pat. Nos. 3,791,932; 3,839,153; 3,850,752: available in web form from Library Browser or Library 3.850,578; 3,853.987; 3,867,517; 3,879,262; 3,901,654; Finder in NCBI or in the flat file libraryQuest.txt. The file 3,935,074; 3,984,533; 3,996,345; 4,034,074; 4,098,876: listed 53 tissue sources, 5 histological states (cancer, multiple 4,879,219; 5,011,771 and 5,281.521: “Oligonucleotide Syn histology, normal, pre-cancer, and uncharacterized histol thesis” Gait, M.J., ed. (1984): “Nucleic Acid Hybridization” ogy), 6 types of tissue preparations (bulk, cell line, flow Hames, B. D., and Higgins S.J., eds. (1985): “Transcription Sorted, microdissected, multiple preparation, and uncharac and Translation’ Hames, B.D., and Higgins S.J., eds. (1984); terized), and brief descriptions on each library. 5318 libraries “Animal Cell Culture' Freshney, R.I., ed. (1986): “Immobi were from bulk tissue preparation including 5000 lized Cells and Enzymes' IRL Press, (1986): “A Practical ORESTES libraries Camargo et al. (2001) Proc. Natl. Acad. Guide to Molecular Cloning Perbal, B., (1984) and “Meth Sci. USA 98: 12103-12108}, 329 from cell lines, 37 flow ods in Enzymology” Vol. 1-317, Academic Press: “PCR Pro sorted, 66 microdissected, 5 multiple preparation, and 1121 tocols: A Guide To Methods And Applications'. Academic were from uncharacterized preparations. Excluding Press, San Diego, Calif. (1990); Marshak et al., “Strategies ORESTES libraries, 507 libraries were designated as “non for Protein Purification and Characterization—A Laboratory normalized and 100 were designated normalized or sub Course Manual CSHL Press (1996); all of which are incor tracted indicating the pretreatment of mRNA before cDNA porated by reference as if fully set forth herein. Other general library construction. A small number of libraries were derived references are provided throughout this document. The pro from the same original sample. These were not considered cedures therein are believed to be well known in the art and separately. Library counts of ESTs rather than direct EST are provided for the convenience of the reader. All the infor counts were used to provide semi-quantitative measurements mation contained therein is incorporated herein by reference. of expression level, since EST counts in some cases reflect the prevalence of ESTs in one or a few particular libraries, and Example I library counts provide better indications across different tis Identification of Alternatively Spliced Expressed Sue types when both normalized and non-normalized libraries Sequences—Background were analyzed. Such tissue information analyses are limited to those tissues with a sufficient number of libraries. The 0735. The etiology of many kinds of cancers, especially inclusion of normalized cDNA libraries allowed the exami those involving multiple genes or sporadic mutations, is yet to nation of genes expressed at low levels. US 2011/0091454 A1 Apr. 21, 2011 56

0741. The ESTs from pooled tissue or uncharacterized tissue were considered as non-conforming in order to main TABLE 2-continued tain the robustness of the results. In addition, 139.243 ESTs that had no library information were considered non-con Alternative Alternative forming in investigating tissue- or cancer-specific alternative donor site Cluster acceptor site Cluster 4 760 4 799 splicing events. 5 435 5 SO8 0742 Results—Human EST and mRNA sequences 6 and above 566 6 and above 710 aligned against genomic sequences and clustered through Compugen’s LEADS platform were used to identify intron Total 9068 Total 9667 boundaries and alternative splicing sites Shoshan et al. (2001) Proc. SPIE Microarrays: Optical Technologies and 0745 Distribution analysis—As described hereinabove a Informatics 4266:86-95; Matloubian (2000) Nat. Immunol. valid donor-acceptor concatenation must be Supported by at 1:298-304: David et al. (2002) J. Biol. Chem. 277:18084 least one mRNA or by ESTs from at least two different 18090; Soreket al. (2002) Genome Res. 12:1060-7. libraries. 8254 clusters were found to have both alternatively 0743. 20.301 clusters with 2.0 million ESTs contained at spliced donor and acceptor sites. When the lower bound on least one mRNA sequence, in general agreement with Uni the number of EST libraries supporting each donor-acceptor Gene build #148 with 20.876 such clusters. The remaining concatenation was increased to three, 13.402 alternatively EST sequences, which were clustered to unknown regions of spliced donor sites were shown to be included in 6892 clusters known genes or to unknown genes were not analyzed. Table and 15,015 alternatively spliced acceptor sites where shown 1 below provides some statistics about EST and mRNA clus to be included in 7570 clusters, while 6111 clusters com tering. 125,115 introns, and 213,483 exons were aligned prised both alternatively spliced donor and acceptor sites. either withan mRNA, or with ESTs from at least two libraries Example 3 if there was no RNA aligned to the gene segment. This was effected to exclude possible genomic contamination in Tissue Distribution of ESTs and Libraries Following expressed sequences, or other EST technology associated LEADS Alternative Splicing Modeling faults. 0746 Cluster analysis performed to identify alternatively spliced ESTs (see Example 2) was further used for tissue TABLE 1. information extraction. Table 3 below lists ten tissue types with the largest numbers of ESTs along with those from EST Cluster RNA Cluster pooled or uncharacterized tissues. 1 963 1 6527 2-3 1457 2-3 6372 TABLE 3 4-7 1532 4-7 6204 8-15 1655 8-15 1915 Number of ESTs Number of Libraries 16-31 1879 16-31 226 32-63 2SOO 32-63 40 Tissue Normal Cancer Total Normal Cancer Total 64-127 3481 64 and above 17 128-2SS 3240 Total 2O3O1 Brain 93O24 878O3 18O827 30 25 55 256-511 1406 Lung 35455 85596 121051 92 1S6 248 512-1023 422 Placenta 86571. 27291. 113862 259 3 262 1024-above 1766 Uterus 30.052 71521 1 O1573 99 107 2O6 Colon 23796 74998 98794 274 445 719 Total 2O3O1 Kidney 42628 46811 89439 9 S4 63 Skin 32436 43O85 75521 8 10 18 Prostate 4.0312 2.7963 68275 131 135 266 Mammary gland 26509 36638 63147 305 665 970 Example 2 Head 12354 SO167 62S21 62 800 862 and neck Pooled 1786.18 992 17961O 15 1 16 Cluster Distribution of Alternatively Spliced Donor Uncharacterized 761.93 9721 85914 778 106 884 and Acceptor Sites 0744 Alternative splice events include exon skipping, 0747 Evidently, ESTs derived from lung, uterus, colon, alternative 5' or 3' splicing, and intron retention, which can be kidney, mammary gland, head and neck were obtained aminly described by the following simplification: a single exon con from cancerous libraries. The distribution of ESTs in normal nects to at least two other exons in either the 3' end (donor site) and cancer libraries in each case was taken into a consider or the 5' end (acceptor site), as shown in FIG.3. Table 2 below ation and used as a parameter for scoring the differential lists some statistics of alternative splicing events based on this expression annotation. simplification. Example 4 Identification of Putative Cancer Specific Alterna TABLE 2 tively Spliced Transcripts Alternative Alternative 0748 Alternative splicing events restricted to cancertis donor site Cluster acceptor site Cluster Sues were identified by looking for any donor-acceptor con 1 3690 1 3751 catenations exclusively supported by ESTs from cancertis 2 2269 2 2388 Sues. Table 4 below lists six examples for such. An interesting 3 1348 3 1511 example is the NONO gene (GenBank Accession No: BC003129), represented by 1496 ESTs. The NONO gene has US 2011/0091454 A1 Apr. 21, 2011 57 been previously suggested to code for a possible splicing 1.4, 2000 Nov. 15). 58118 SWISS-Prot proteins have been factor Dong B, Horowitz, D S. Kobayashi R. Krainer A R. assigned with at least one GO node by the following sources: Nucleic Acids Res (1993) 25: 21 (17):4085-92. It's newly 15534 proteins were assigned with at least a functional GO discovered restricted expression to cancer tissues Suggests node by conversion of EC (enzyme nomenclature) to GO that alternative splicing of multiple genes may be regulated node. MGI has assigned 5984 SwissProt proteins with GO during carcinogenesis. nodes (http://www.mgi.org). 31869 SwissProt proteins were

TABLE 4 Uni Non-specific mRNA Gene Total Specific E Possible EST ID PoS. E R Type E R C N R function BCOO3129 1722O7 123, 237 1496 8 d+ 15 1 46 20 3 Splicing factor candidate NM 018035 279851 220,301 584 2 d- 7 O 21 9 2 No known function ALS1936S 21938 474, 513 162 3 a- 8 3 6 1 O Oxysterol binding BF341,144 155596 507,542 148 1 a 6 0 7 4 1 BCL2 adeno wir-us E1B interacting ABOO9357 7510 1372, 1452 2OS 6 a. 7 4 2 4 2 MAPKKK 7 NM 002382 42712 57,84 165 7 a- 8 1 7 3 6 MAX protein

One mRNA/EST containing both splicingjunctions identifies assigned a GO node using SwissProt keyword correspon the cluster. Type indicates the type of transcript, which dence and 33048 SwissProt proteins were assigned GO node was shown to be cancer specific. The following symbols were by InterPro scanning (http://www.ebi.ac.uk/interpro/). The used, (d) donor site; (a) acceptor site; (+) proximal exon; nonredundant protein database was constructed from GenPep (-) distal exon. Total indicates the number of ESTs or file from NCBI, along with proteins collected from the Sac mRNAs which were used for analysis. Specific/non-spe charomyces genome database (SGD) Dwight et al. (2002) cific’ indicates total library number which was used for Nucleic Acids Res. 30:69-72 and the Drosophila genome analysis. All mRNA sequences under specific’ were from database (Flybase) The Flybase consortium 2002 Nucleic cancer tissues. Position—identifies splicing boundaries on Acids Res. 30:106-108, with a total number of 670130. the sequence. E. EST; R RNA; C Cancer; N-Normal. Example 6 Example 5 Generation of Progressive Sequence Clusters Ontological Annotation of Proteins—Data Collection Background 0752. A two-stage strategy was used to build a detailed homology map between all proteins in the comprehensive 0749 Recent progress in genomic sequencing, computa protein database (Example 5). In a first stage, all protein pairs tional biology and ontology development has presented an with an E score lower than 0.01 using Blastp with default opportunity to investigate broad biological systems parameters were cataloged. Table 5 lists the distribution of 0750 A gene ontology system was developed and specifi Blastp results. cally used to annotate human proteins. Examples 5-9 below describe the development of an ontology engine, a computa TABLE 5 tional platform for annotation and resultant annotations of human proteins. E escore Percentage 0751 Gene Ontology (GO) and gene association files 10-10-10-2 17.58 10-20-10-10 13.81 were obtained from the Gene Ontology Consortium http:// 10-30-10-20 11.02 www.geneontology.org/. InterPro Scan from http://www.ebi. 10-40-10-30 12.91 ac.uk/interprof, and enzyme database from http://expasy.pro 10-50-10-40 10.24 teome.org.au/enzyme?. The following databases and versions 10-60-10-50 5.81 10-70-10-60 3.64 were used, GenBank release 122.0, SwissProt release 39.0, 10-80-10-70 2.65 Enzyme database Release 26.0, InterPro database as of Apr. 10–90-10-80 2.86 6, 2001, NCBI LocusLink data as of Mar. 6, 2001, MEDLINE 10-100-10-90 2.53 databases as of Apr. 6, 2001, and the following files from 10-110-10-100 2.18 10-120-10-110 1.58 Gene Ontology Consortium: gene association.fb (version 10-130-10-120 1...SO 1.26, 2001 Feb. 19), gene association.mgi (version 1.19, 10-140-10-130 1.13 2001 Mar. 1), gene association.sgd (version 1.251, 2001 10-150-10-140 1.01 Mar. 13), gene association.pombase (version 1.2, 2000 Jul. 10-160-10-150 1.01 22), ec2go (version 1.2, 2000 Oct. 23), and Swp2go (version US 2011/0091454 A1 Apr. 21, 2011 58

whole text information space were retained for further stud TABLE 5-continued ies. This cutoff threshold was used to eliminate rare words, wrong spellings, and sometimes even the base pair sequence E escore Percentage present in either the definition lines or abstracts. In addition, 10-170-10-160 O.92 an upper limit of word frequency (common words Such as 10-178-10-170 O.90 and, gene, protein) and a lower limit of word frequency O.OO 6.72 were defined through repeated training process and manual review. The words within the upper and the lower limits were 0753. In the second stage, all homologous protein pairs considered as predictive. Since the correlation between the were aligned through Needlman-Wunsch algorithm with a GO nodes and specific words is positive by nature, negative global alignment to obtain the percentage of identical amino sentences with words such as not and its variants, such as acids between the two proteins. BLOSUM62 was used as the unlikely or unresponsive were excluded from consider Substitution matrix. The percentage of identity was defined as ation. the number of amino acids aligned with nonnegative scores 0756 Genes with GO annotation from other sources such divided by the number of amino acids in both aligned and as GO consortium, InterPro Scanning or keyword mappings unaligned length of two proteins in the global alignment. were used as training data to obtain the correlation between Table 6 shows a percent identity distribution of protein pairs specific words and specific GO nodes. following global alignment. Evidently, the majority of pro 0757. The following formula was used. S-log(P(m.g)/P tein pairs (i.e., 68.5%) exhibited identity levels in the range of (m)P(g)), wherein S is the LOD score for word m—GOg 10-50%. combination, wherein P(mg) is the frequency of term mand GO node g co-occurrence among all word and GO combina TABLE 6 tions, P(m) is the frequency of occurrence of term mamong Identity Level Percentage all word occurrences, and P(g) is the frequency of occurrence of GO node gamong all GO occurrences. 3. ... 0758. In order to predict GO node for any specific gene 20-30% 1994 which is linked to one to a few dozen words, the sums of LOD 30-40% 10.94 scores from all these words for each possible GO were cal t 7. culated and sorted, and used for further GO annotation. Mul 60-70% 724 tiple MeSH terms—GO correlations were tested and were 70-80% 6.70 found to be no more informative than the single MeSH 3. term—GO correlation, and therefore they were not used. (0759 Results Table 7 below, lists general statistics of text information from publicly available sequence databases.

TABLE 7

MeSH Definition tel Title Abstract line Number of proteins 110608 106190 113073 516952 Number of articles 71703 773.14 82654 na Number of unique words* 40011 18175 26630 25915 Average number of words per article 19. OS 2.70 11.65 6.56 or per definition line

Example 7 0760 A predictive probabilistic model was then applied to create possible GO annotations based on the associated text Text Mining information. Definition lines of sequence records, MeSH term annotations, titles and abstracts from sequence related (0754 Correlations between presence of specific MeSH publications were modeled separately. terms, or specific English words in available text information 0761 The frequency of association of a specific term with and Gene Ontology assignments in the training data were a specific GO node in the training data was examined. Param obtained. The correlations were then used to predict Gene eters such as boundaries of the frequency of MeSH terms and Ontology for unassigned genes. other words were optimized through the training process, 0755 Method Non-characters in titles and abstracts, and using self-validation and cross validation methods. LOD in definition line of gene records were eliminated and words (logarithm of odds) scores, defined as the logarithm of the were stemmed through the Lingua::stem module from www. ratio between the association frequency of any term-GO pair cpan.org. Due to the standardized and curated nature of and the calculated frequency of the random combination of MeSH terms, MeSH terms were not parsed or stemmed. The this pair, were used to indicate the relatedness of certain terms frequency of each word in all the available text information with certain GO node. These LOD scores were found to be was calculated. Words that occurred at least 5 times over the correlative with the accuracy of GO prediction, as shown in US 2011/0091454 A1 Apr. 21, 2011 59

FIG. 5. Text information from titles of MEDLINE records provides statistics on the number of input gene ontology appeared to have more predictive power, in particular at lower annotations and the number of output annotations following LOD scores, than text information from other categories. This processing. Suggests that the title tended to Summarize the gist of an article in a straightforward manner. MeSH terms had similar TABLE 9 predictive capabilities as the abstracts, possibly because the Input MeSH terms were derived from the abstracts, and thus had similar information contents. GO Consortium annotation, Enzyme conversion, Text mining 0762 Based on text information, a significant number of InterPro mapping, etc. ProLoc Output proteins were predicted to be associated with one or more GO nodes. Table 8 below, lists the number of proteins with pre Cellular Component 44702 522179 574607 dicted GO node from four types of text information in the Molecular Function 85626 S26083 580767 three categories of GO. These predicted GO annotations were Biological Process 69726 S25842 578.636 incorporated in GO process to increase the accuracy of homology-based GO annotation and to generate de novo 0767. Over 85% of proteins were annotated with one or annotations. more GO nodes in each of three GO categories. Table 10 below, analyses the number of proteins annotated at different TABLE 8 homology levels, showing that GO annotations were achieved throughout the homology spectra. MeSH Definition tel Title Abstract Line Total TABLE 10 Cellular Component 57845 5.2094 57597 S14191 S21396 Molecular Function 57845 54152 57632 S16319 S23384 Cellular Molecular Biological Biological Process 57845 5397O 57631 S16402 S2338S Component Function Process Text 32257 34137 3O149 10-2-10-10 87967 71717 74277 0763 To further enhance the accuracy and coverage of GO 10-10-10-50 122992 70O88 793.18 annotation process, a computational platform for predicting 1O-O-O.O 98.059 55132 59051 3596-7596 111130 97209 108334 cellular localization, ProLoc (Einat Hazkani-Covo, Erez 75%-90% 38509 68.282 67429 Levanon, Galit Rotman, Dan Graur and Amit Novik, Supra) 90%-99% 38991 98.576 90352 was used to predict the cellular localization of individual Input GO 44702 85626 69726 proteins based on their inherent features such as specific localization signatures, protein domains, amino acid compo sition, pI, and protein length. Only protein sequences that Example 9 begin with methionine underwent ProLoc analysis. Thus, 88997 out of 93110 proteins in SwissProt version 39 were Statistical Validation of Ontological Annotations analyzed, and 78111 proteins have one to three GO predic 0768 Gene ontology annotations, which were assigned tions in cellular component category. according to the teachings of the present invention, were assessed by automatic cross-validation. One fifth of input of Example 8 input GO annotations were withheld during the GO annota tion process and the resultant annotations were compared Gene Ontology Assignment with the withheld GO nodes. For each protein, the GO node with the lowest error score was examined. Table 11 below, 0764 Progressive single-linkage clusters with 1% resolu lists the coverage and accuracy of Such representative test. tion were generated to assign GO annotations (i.e., nodes) to proteins (see Example 6). Protein clustering and annotation TABLE 11 assignment were effected at each level of homology. The resolution was 1% for global alignment identity (i.e., cluster Total Predicted GO Accurate GO ing was first effected at 98%, then at 97% and so forth). The Cellular Component 7431 7186 4642 resolution was 10 fold for the Escore of a BlastPhomology Molecular Function 12999 12864 101.38 pair. For example, clustering was performed at 10, then at Biological Process 10811 10690 808O 10' and so forth. 0765. To examine clustering efficiency and homology 0769 Evidently, sample coverage ranged from 96% to transitivity, all homology pairs clustered with at least 90% 99% and the reproducibility was between 65% and 80%. The identity were examined. At this level, there were a total of lower reproducibility of GO annotations in the “cellular com 57,004 clusters containing 263,259 protein members. Among ponent” category, as compared with that in the other two GO these clusters, 23.231 clusters contained at least three protein categories was consistent with the notion that a short amino members (see FIGS. 6a-c). The lowest homology pairs had an acid segment such as a signal peptide affects significantly identity of 46% while being clustered at 90% or higher iden protein localization. The presence or absence of Such small tity levels. amino acid segments could not be completely captured in 0766 Clusters containing proteins with preassociated or sequence similarity comparisons. Detailed analysis of the predicted ontological annotations were analyzed and best validation of data indicated that the accuracy of the annota annotations for individual proteins in the clusters were tions correlated with the homology levels (data not shown). selected through an error weight calculation. Table 9 below, Manual validation of assigned annotations was performed on US 2011/0091454 A1 Apr. 21, 2011 60 a total of 500 annotations and about 85%-93% of annotations The GenBank accession numbers of these expressed were found to be correct. The higher percentage of accuracy sequences are listed only for the first transcript in the contig, in the manual examination over the automatic cross-valida e.g. “HEST BC006216.BE674469,BE798748.NM032716” tion resulted from the incomplete annotation of input GO. in Example 10b. The rest of the transcripts derived from the same contig, are indicated by an ifEST field marked with “the Example 10 same'. 0785 Expressed sequences, marked with “PropGyXXX, Description of Data e.g., “PropGy933 in Example 10d, and expressed sequences, (0770. Example 10a-e below describe the data table in marked with “GeneID XXX', e.g., “GeneID1007Forward” “Summary table' file, on the attached CD-ROM3. The data in Example 10e, are proprietary sequences which do not table shows a collection of annotations of biomolecular appear in GenBank database. These sequences are deposited sequences, which were identified according to the teachings in the nucleotide sequence file “PropG seqs' in the attached of the present invention. CD-ROM2. (0771) Each feature in the data table is identified by “if”. 0786 Data pertaining to differentially expressed alterna 0772 Each transcript in the data table is identified by: tively spliced sequences is presented in the following format: (0773 (i) A Serial number, e.g. “251470 in Example 10a, 0787 *, * “#TAA CD represents the coordinates of the “445259'-'445262 in Example 10b. I differentially expressed sequence segment. A single number 0774 (ii) An internal arbitrary transcript accession num represents a differentially expressed edge, corresponding to ber, e.g. “N62228 4” in Example 10a, “BE674469 0, the specific junction between 2 exons. “TAA CD repre “BE674469 0 124”, “BE674469 1, “BE674469 1 sented by a pair of numbers represents the start and end 124” in Example 10b. positions of a differentially expressed sequence node. For 0775. The first number of the internal transcript accession example, "iTAA CD 269296 in Example 10a indicates that number is shared by all transcripts which belong to the same the transcript identified as N62228 4 contains a differen contig, and represent alternatively spliced variants of each tially expressed segment, located between the nucleotides at other, e.g. “BE674469 in “BE674469 0”, “BE674469 0 positions 269 and 296. 124”, “BE674469 1”, “BE674469 1 124” in Example 0788 *, * “#TAA TIS” contains information pertaining 10b. to specific tissue(s), in which the respective transcript is pre 0776 The second number of the internal transcript acces dicted to be expressed differentially. Tumor tissues are indi sion number is an internal serial transcript number of a spe cated accordingly. For example, "iTAA TIS lung Tumor cific contig, e.g. * 0” or “ 1” in “BE674469 0. indicates that transcript BE674469 0 in Example 10b is “BE674469 0 124”, “BE674469 1, “BE674469 1 predicted to be differentially expressed in lung tumor tissues. 124” in Example 10b. 0789 *, ** “#DN. represents information pertaining to 0777. The third number of the internal transcript accession transcripts, which contain altered functional interpro number is optional, and represents the GenBank database domains. The Interpro domain is either lacking in this protein version used for clustering, assembly and annotation pro (as compared to another expression product of the gene) or cesses. Unless otherwise mentioned, GenBank database ver scored low (i.e., includes sequence alteration within the sion 126 was used. “124' indicates the use of GenBank ver domain when compared to another expression product of the sion 124, as in “BE674469 1 124” of Example 10b. gene). This domain alteration can have a functional conse (0778 “PropG” following the internal accession number quence in which the altered protein product can either gain a indicates an EST sequence data from a proprietary source, function, lose of function (e.g., acting, at times, as dominant e.g., Examples 3d and 3e. negative inhibitor of the respective protein) or obtain a func (0779) “han” represents the use of GenBank version 125. tion which is different than that of the wild-type protein, as This version was used in the annotation of lung and colon described hereinabove (see the definition for “functionally cancer specific expressed sequences. altered biomolecular sequences' in the Terminology section). 0780 “lab’ indicates expressed sequences which differen 0790. This field lists the description of the functional tial pattern of expression has been confirmed in the labora domain(s), which is altered in the respective splice variants tory. e.g., “HDN EGF-like domain in Example 10a. 0781 Transcript accession number identifies each 0791) Functional annotations of transcripts based on Gene sequence in the nucleotide sequence data files “Transcripts Ontology (GO) are indicated by the following format. nucleotide seqs part1. “Transcripts nucleotide seqs 0792 *, * “HGO P', annotations related to Biological part2”, “Transcripts nucleotide seqs part3’ and “Tran Process, Scripts nucleotide seqs part4’ on CD-ROMs 1 and 2, and in 0793 *, ** “HGO F, annotations related to Molecular the respective amino acid sequences data file "Protein. Seqs' Function, and on CD-ROM2. Of note, some nucleotide sequence data files 0794 *, ** “HGO C. annotations related to Cellular of the above, do not have respective amino acid sequences in Component. the amino acid sequence file “Protein.seqs' attached on CD 0795 For each category the following features are option ROM2. ally addressed: 0782. Additional lines of the file contain the following 0796 “HGOPR represents internal arbitrary accession information: number of the predicted protein corresponding to the func 0783 “*” indicates optional fields: “*” indicates repeat tionally annotated transcript. This internal accession number able features. identifies the protein in the amino acid sequence file “Protein. 0784) “#EST’ represents a list of GenBank accession seqs' in the attached CD-ROM2, together with the internal numbers of all expressed sequences (ESTs and RNAs) clus arbitrary transcript accession number. For example, "iGOPR tered to a contig, from which a respective transcript is derived. human 281192 in Example 10a, is a protein sequence US 2011/0091454 A1 Apr. 21, 2011

encoded by transcript N62228 4, which appears in the 0806 #EN NRG2 HUMAN #GO P #GOPR human amino acid sequence file “Protein.seqs' in the attached CD 281192 #GO Acc 7165 #GO Desc signal transduction #CL ROM2 and is identified by both numbers, “N62228 4' and 2 #DB sp#EN NRG2 HUMAN “human 281192. 0797 “#GO Acc” represents the accession number of the Example 10b assigned GO entry, corresponding to the following “HGO Desc’ field. 0807. 445259 BE674469 0 #EST BC006216, 0798 “HGO Desc” represents the description of the BE674469,BE798748,NM032716 iTAA CD O 2537 assigned GO entry, corresponding to the mentioned “HGO #TAA TIS lung, #TAA CD 0.2537 #TAA TIS lung Tumor Acc’ field. For example, "iGO Acc 7165 iGO Desc signal (0808 445260 BE674469 0 124 #124EST BC006216, transduction' in Example 10a, means that the respective tran BE674469,BE798748.NM 032716 #SA Lung Tumor #RA Script is assigned to GO entry number 7165, corresponding to lung cancer signal transduction pathway. 0809 445261 BE674469 1 #EST the same #TAA CDO 2537 iTAA TIS lung, #TAA CD 0.2537 iTAA TIS lung 0799) “HCL represents the confidence level of the GO Tumor assignment, when #CL1 is the highest and iCL5 is the lowest 0810) 445262 BE674469 1 124 #124EST BC006216, possible confidence level. BE674469,BE798748.NM 032716 #SA Lung Tumor #RA 0800 “HDB' marks the database on which the GO assign lung cancer ment relies on. The “sp”, as in Example 10a, relates to Swis sProt Protein knowledgebase, available from http://www.ex pasy.ch/sprot/. “InterPro’, as in Example 10c, refers to the Example 10c InterPro combined database, available from http://www.ebi. 0811 314251 HUMM7BA 0 #EST BF804381, ac.uk/interprof, which contains information regarding protein BF805793.BF805830BG978076, HUMM7BA #GO C families, collected from the following databases: SwissProt #GOPR human 313276 #GO Acc 16459 #GO Desc myo (http://www.ebi.ac.uk/swissprot/). Prosite (http://www.ex sin #CL 2 #DB interpro #EN IPR001609 #GO F #GOPR pasy.ch/prosite?), Pfam (http://www.sanger.ac.uk/Software/ human 313281 HGO Acc 3774 HGO Desc motor HCL 1 Pfam/), Prints (http://www.bioinfman.ac.uk/dbbrowser/ #DB sp#ENQ14786 #GO F#GOPR human 313281 #GO PRINTS/), Prodom (http://prodes.toulouse.inra.fr/prodom/), Acc 5524 #GO Desc ATP binding #CL 1 #DB sp #EN Smart (http://smart.embl-heidelberg.de/) and Tigrfams Q14786 #GO P #GOPR human 313281 #GO Acc 5983 (http://www.tigr.org/TIGRFAMs/). #GO Desc starch catabolism HCL 4 HDB sp #EN Q14786 (0801) “HEN represents the accession of the entity in the #SA colon, colonic, gut HRA colon normal database(HDB), corresponding to the best hit of the predicted protein. For example, “HDB sp #EN NRG2 HUMAN” in Example 10d Example 10a means that the GO assignment in this case was based on SwissProt database, while the closest homologue to 0812 723873 AA157684 TO PropC #EST the assigned protein is depicted in SwissProt entry "NRG2 0813 AA157684AA157764AK057980.BF355351, HUMAN, corresponding to protein named “Pro-neuregulin PropCy933 2' (http://www.expasy.org/cgi-bin/niceprot.pl?O14511). 0814) #GO C #GO Acc 0016021 #GO Desc “integral “#DB interpro #ENIPRO01609” in Example 10c means that membrane protein’ #GO F GO assignment in this case was based on InterPro database, 0815 #GO Acc 0005978 #GO Desc “glycogen biosyn while the best hit of the assigned protein is to protein family thesis' HGO PiGO Acc 0003707 HGO Desc “steroid hor depicted in SwissProt accession number “IPR001609, cor mone receptor responding to “Myosin head (motor domain)” protein family (http://www.ebi.ac.uk/interpro/IEntry?ac=IPR001609). Example 10e 0802. The following two fields correspond to the hierar chical assignment of the differentially expressed sequences to 0816 723928 GeneID1007Forward TO PropC #EST a specific tissue(s), based on the EST content and EST librar 0817 AC018755CDS1AC018755mRNA1AW403840, ies origin within the contig. AY040820CDS0, BF359557, BF896787, BF898989, (0803 *, ** “HSA' indicates that tissue assignment BF899932BF900235, BF905509,BI518761,BIT56629, requires a contig, containing at least 3 ESTs, where at least BI822428,BI 906477,BI906754BM550096. BM922784, 80% thereof are assigned to a selected tissue. GeneID1007Forward,GeneID285Forward. Prop Gy1006 (0804) *, ** “HRA indicates that tissue assignment #GO C #GO Acc 0005887 #GO Desc “integral plasma requires a contig derived from at least two different EST membrane protein’ #GO F #GO Acc 0007267 #GO Desc libraries, originally constructed from a specific tissue. “cell-cell signaling'#GO P #GO Acc 0005530 HGO Desc “lectin Example 10a Example 11 0805 251470 N62228 4#EST the same #TAA CD 269 Description of the Sequence Files on the Enclosed 296 #TAA TIS ovary, #TAA CD 269296 #TAA TIS ovary CD-ROMS 1-2 Tumor, HTAA CD 269296 HTAA TIS skin Tumor, HTAA CD59 269 iFTAA TIS ovary, #TAA CD59 269 iFTAA TIS 0818. The sequences in the sequence files “Transcripts ovary Tumor, #TAA CD59269 iFTAA TIS skin Tumor #DN nucleotide seqs part1. “Transcripts nucleotide seqs EGF-like domain HGO F #GOPR human 281192 HGO part2”, “Transcripts nucleotide seqs part3. “Transcripts Acc 3823 #GO Descantibody iCL 2 #DB sp nucleotide seqs part4”, “protein seqs', 'PropG seqs' of US 2011/0091454 A1 Apr. 21, 2011 62 the enclosed CD ROMs 1-2 are in Fast A text format. Each Example 12 transcript sequence starts with “s' mark, followed by the Description of CD-ROM1-3 Content transcript internal accession number. The proprietary PropG 0822. The CD-ROMs enclosed herewith contain the fol EST sequences starts with">' mark, followed by the internal lowing files: sequence accession. An example of the sequence file is pre 0823 CD-ROM1 (2 files): sented below. 0824 1. “Transcripts nucleotide seqs part1, containing nucleotide sequences of all the transcripts based on genomic Example 11a production of GenBank version 126. 0825 2. “GC new.txt, includes a title of the invention 08.19 and reference numbers. 0826 CD-ROM2 (4 files): 0827. 1. “Transcripts nucleotide seqs part2', containing >R422780 nucleotide sequences of all the transcripts based on expressed (SEQ ID NO: 41) production of GenBank version 126 (in cases where no TGTTTTAGAAATCTCATGATTCCCAGGAAAAAAATTTTAAATTGTGA genomic data Support was available). TACAGGTTTGACAGCCTTTTAGTCAAATAAGTTAAAACACACACGCA 0828 2. “Transcripts nucleotide seqs part3.new, con taining nucleotide sequences of all the transcripts based on AACTCATTTACT CACTTTGCCATTATAATTCAATCACAAAGAAATTT GenBank versions 124, 125, and transcripts containing ProDG proprietary sequences. TGGCCAGGCGTGGTGGTTACGCCTGTAATCCCAGCACTTTGGGAGGC 0829. 3. “Protein.seqs, containing all the amino acid CGAGGCAGGTGGATCACGAGGTCAGGGGATCAAGATCATCCTGGCTA sequences encoded by the transcripts based on GenBank Ver sions 126. ACATGTGAAACCCCGTCTCTATTAAAAATAAAAAATTAGCCTGGTGT 0830. 4. “PropG seqs, containing the proprietary EST GGTGGCGGGTGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGCA Sequences. 0831 5. “Transcripts nucleotide seqs part4, containing GAATGGCGTGAACT CAGGAGGCGGAGCTTGCAGTGAGCCGAGATCGC nucleotide sequences of transcripts based on GenBank ver sion 131, as described in Example 21. GCCACTGCACTCCAGCCTGGATGACAGAGCGAGACTCCATCTCAAAA 0832 CD-ROM3 (1 file): AAAAAA 0833 1. “Summary table', containing all the annotation information, as described in Example 10. Example 11b Example 13 In-Vitro Confirmation of Differentially Expressed 0820 Transcripts Experimental Procedures and Reagents 0834. In-vitro confirmation of in-silico obtained differen >GeneID3 Reverse #TY RNA #DE ProDGy sequence #DT tially expressed polynucleotide sequences was effected uti 18- JUN-1OOO EDR 5 ELN 348 lizing laboratorial methodologies, based on nucleotide (SEQ ID NO: 1) hybridization including northern analysis, RT-PCR and real GTGGTTATTACAGCATGGTTCCCAGCCTTACAGTGTCTAAGTGCTTC time PCR. TCTTGTGTCCTGTAGATGTTGTGAAAAAGAAAAAAACAAAAAATACA 0835 RNA preparation Total RNA was isolated from the indicated cell lines or tumor tissues using the Tri-Reagent CCACACTGTACTTTTTCCCCCTGCCCCCGTTACTGCCGGTGATTATT (Molecular Research Center Inc.) following the manufactur er's recommendations. Poly(A) RNA was purified from total ATTAAAAATTAGTTTTTTTCACATCATTATATCTGGCTTCCTATAAA RNA using oligo(dT)s Dynabeads (Dynal). CAACAGCCTTAATTCACGTCAAGACTCCCTTTGGGGAATTCATTTTAT 0836 Northern blotting 20 ug is of total RNA or 2 g of poly(A) RNA were electrophoresed on 1% agarose gels con TAAAAATTGGTGTCTGGATACTTCCCTGTACATGCATAAATATGCAT taining formaldehyde, and blotted onto Nytran Super Charge GCATGTACAGAAAGACTGTATGTGTGTGCCTTGCACACACACCCATA membranes (Shcleicher & Schuell). Hybridization was car ried out using a DNA probe (SEQID NO:3) in EZ-Hybrid CCTCTCAGAAAAAGTGTTT ization Solution (Biological Industries, Beit Haemek, Israel) at 68° C. for 18 hrs. The membranes were rinsed twice with Example 11c 2xSSC, 0.1% SDS at room temperature, followed by two washes with 0.1xSSC, 0.1% SDS at 50° C. Autoradiograms 0821 were obtained by exposing the membranes to X-ray films. 0837 RT-PCR analysis Prior to RT reactions, total RNA was digested with DNase (DNA-freeTM, Ambion) in the pres >ProDGy1339 #OS Homo sapiens #DE ProDGy ence of RNasin. Reverse transcription was carried out on 2 Jug sequence #DT 26-JUL-2002 #TY EST FEDR 5 #AC is of total RNA, in a 20 Jul reaction, using 2.5 units of Super ProDGy1339 HLN 132 (SEQ ID NO: 2) script II Reverse Transcriptase (Bibco/BRL) in the buffer CAGAAAGCCCAGAGTAGTCCCTGTAAGAAGCTGAGGGGCGCATACC Supplied by the manufacturer, with 10 umol of oligo(dT)s (Promega), and 30 units of Rnasin (Promega). RT reactions TCTGGGGTTTGGGTTCCCTTCAGGGAAGCGAAGGGAGATGACCTCT were standardized by PCR with GAPDH-specific primers, for 20 cycles. The calibrated reverse transcriptase samples TTCCAGGCTGGGGACCAAGAGGGCTCCCTAGAAGATATTA were then analyzed with gene-specific primers either at 35 cycles, or at lower cycles (15 and 20 cycles). PCR products of