USOO6989232B2 (12) United States Patent (10) Patent No.: US 6,989,232 B2 Burgess et al. (45) Date of Patent: Jan. 24, 2006

(54) AND NUCLEIC ACIDS Adams et al, Nature 377 (Suppl), 3 (1995).* ENCODING SAME Nakayama et al, Genomics 51(1), 27(1998).* Mahairas et al. Accession No. B45150 (Oct. 21, 1997).* (75) Inventors: Catherine E. Burgess, Wethersfield, Wallace et al, Methods Enzymol. 152: 432 (1987).* CT (US); Pamela B. Conley, Palo Alto, GenBank Accession No.: CAB01233 (Jun. 20, 2001). CA (US); William M. Grosse, SWALL (SPTR) Accession No.: Q9N4G7 (Oct. 1, 2000). Branford, CT (US); Matthew Hart, San GenBank Accession No.: Z77666 (Jun. 20, 2001). Francisco, CA (US); Ramesh Kekuda, GenBank Accession No. XM 038002 (Oct. 16, 2001). Stamford, CT (US); Richard A. GenBank Accession No. XM 039746 (Oct. 16, 2001). Shimkets, West Haven, CT (US); GenBank Accession No.: AAF51854 (Oct. 4, 2000). Kimberly A. Spytek, New Haven, CT GenBank Accession No.: AAF52569 (Oct. 4, 2000). (US); Edward Szekeres, Jr., Branford, GenBank Accession No.: AAF53188 (Oct. 4, 2000). CT (US); James E. Tomlinson, GenBank Accession No.: AAF55 108 (Oct. 5, 2000). Burlingame, CA (US); James N. GenBank Accession No.: AAF58048 (Oct. 4, 2000). Topper, Los Altos, CA (US); Ruey-Bin GenBank Accession No.: AAF59281 (Oct. 4, 2000). Yang, San Mateo, CA (US) GenBank Accession No.: AE003598 (Oct. 4, 2000). GenBank Accession No.: AE003619 (Oct. 4, 2000). (73) Assignees: Millennium Pharmaceuticals, Inc., GenBank Accession No.: AE003636 (Oct. 4, 2000). Cambridge, MA (US); Curagen GenBank Accession No.: AE003706 (Oct. 5, 2000). Corporation, New Haven, CT (US) GenBank Accession No.: AE003808 (Oct. 4, 2000). GenBank Accession No.: AE003842 (Oct. 4, 2000). (*) Notice: Subject to any disclaimer, the term of this SWALL (SPTR) Accession No.: Q9V419 (May 1, 2000). patent is extended or adjusted under 35 SWALL (SPTR) Accession No.: Q9V7K1 (May 1, 2000). U.S.C. 154(b) by 339 days. SWALL (SPTR) Accession No.: Q9VFF2 (May 1, 2000). SWALL (SPTR) Accession No.: Q9VK90 (May 1, 2000). (21) Appl. No.: 09/939,853 SWALL (SPTR) Accession No.: Q9VLW2 (May 1, 2000). (22) Filed: Aug. 27, 2001 SWALL (SPTR) Accession No.: Q9VNR8 (May 1, 2000). GenBank Accession No.: U72744 (Feb. 1, 1997). (65) Prior Publication Data Alderborn, et al. (2000). “Determination of single-nucle US 2004/0039163 A1 Feb. 26, 2004 otide polymorphisms by real-time pyrophosphate DNA sequencing.” Genome Res. 10(8): 1249-1258. Related U.S. Application Data Alessi, et al. (1997). “3-Phosphoinositide-dependent pro (60) Provisional application No. 60/228,191, filed on Aug. 25, tein kinase-1 (PDK1): structural and functional homology 2000, provisional application No. 60/267,300, filed on Feb. 8, 2001, provisional application No. 60/269,961, filed on with the Drosophila DSTPK61 kinase.” Curr. Biol. 7(10): Feb. 20, 2001, and provisional application No. 60/277,337, 776-789. filed on Mar. 20, 2001. Alessi, et al. (1997). “Characterization of a 3-phosphoi nositide-dependent kinase which phosphorylates (51) Int. Cl. and activates protein kinase Balpha.” Curr. Biol. 7(4): CI2O I/68 (2006.01) 261-269. GenBank Accession No.: AAC50357 (Sep. 12, 2000). (52) U.S. Cl...... 435/6; 435/252.3; 435/320.1; GenBank Accession No. U30473 (Sep. 12, 2000). 435/325; 435/252.33; 514/44; 536/23.1; SWALL (SPTR) Accession No.: Q13239 (Nov. 1, 1996). 536/23.5 GenBank Accession No.: AAC48618 (Aug. 22, 1996). (58) Field of Classification Search ...... 536/23.1, 536/23.5, 23.2; 514/44; 435/6, 320.1, 325, (Continued) 435/252.3, 252.33 See application file for complete Search history. Primary Examiner James Martinell (74) Attorney, Agent, O Firm-Millennium (56) References Cited Pharmaceuticals, Inc. U.S. PATENT DOCUMENTS (57) ABSTRACT 4,350,764 A * 9/1982 Baxter et al...... 435/69.4 Disclosed herein are nucleic acid Sequences that encode novel polypeptides. Also disclosed are polypeptides encoded FOREIGN PATENT DOCUMENTS by these nucleic acid Sequences, and antibodies, which EP 1033401 6/2000 immunospecifically-bind to the polypeptide, as well as EP 1033405 6/2000 derivatives, variants, mutants, or fragments of the aforemen EP 1074617 7/2001 tioned polypeptide, polynucleotide, or antibody. The inven WO WO OO/O9552 2/2000 tion further discloses therapeutic, diagnostic and research methods for diagnosis, treatment, and prevention of disor OTHER PUBLICATIONS derS involving any one of these novel human nucleic acids GenBank Accession No. AC026539 (Apr. 27, 2000).* and proteins. Pharmacia P-L Biochemicals 1984 Product Reference Guide (pp. 36–37).* 38 Claims, No Drawings US 6,989.232 B2 Page 2

OTHER PUBLICATIONS Hurd and Saxton (1996). “Kinesin mutations cause motor neuron disease phenotypes by disrupting fast axonal trans GenBank Accession No .: AAC50646 (Aug. 22, 1996). port in Drosophila.” Genetics 144(3): 1075–1085. GenBank Accession No .: U23028 (Aug. 22, 1996). SWALL (SPTR) Accession No.: Q9H875 (Mar. 1, 2000). GenBank Accession No .: AL035424 (Nov. 23, 1999). SWALL (SPTR) Accession No.: Q9H955 (Mar. 1, 2000). GenBank Accession No ..: CAB39994 (Nov. 23, 1999). GenBank Accession No.: AKO01921 (Feb. 22, 2000). SWALL (SPTR) Accession No.: Q9COH6 (Oct. 16, 2001). GenBank Accession No.: AK021895 (Sep. 29, 2000). SWALL (SPTR) Accession No.: Q9Y3J5 (Oct. 16, 2001). GenBank Accession No.: AK023057 (Sep. 29, 2000). Burn, et al. (1996). “Generation of a transcriptional map for GenBank Accession No.: AK023964 (Sep. 29, 2000). a 700-kb region Surrounding the polycystic kidney disease GenBank Accession No ... BAB14382 (Sep. 29, 2000). type 1 (PKD1) and tuberous sclerosis type 2 (TSC2) disease GenBank Accession No ... BAB14742 (Sep. 29, 2000). genes on human chromosome 16p3.3.” Genome Res. 6(6): SWALL (SPTR) Accession No.: Q55909 (Nov. 1, 1997). 525-537. GenBank Accession No .: BAA10672 (Jul. 4, 2001). GenBank Accession : AKO03661 (Jul. 5, 2001). GenBank Accession No .: AKO25645 (Sep. 29, 2000). GenBank Accession : AKO05100 (Jul. 5, 2001). GenBank Accession No ... BAB15201 (Sep. 29, 2000). GenBank Accession : AKO07036 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9H6O3 (May 1, 2001). GenBank Accession .: AKO10359 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9CQZ1 (Jun. 1, 2001). GenBank Accession : AKO18438 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9CVQ1 (Jun. 1, 2001). GenBank Accession : AKO18708 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9CW36 (Jun. 1, 2001). GenBank Accession : AKO20837 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9CWV6 (Jun. 1, 2001). GenBank Accession : BAB22923 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9CXA5 (Jun. 1, 2001). GenBank Accession : BAB23818 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9CY32 (Jun. 1, 2001). GenBank Accession ... BAB24835 (Jul. 5, 2001). SWALL (SPTR) Accession No.: Q9D1E2 (Jun. 1, 2001). GenBank Accession : BAB26879 (Jul. 5, 2001). SWALL (SPTR) Accession No. : Q9D1Z9 (Jun. 1, 2001). GenBank Accession : BAB31212 (Jul. 5, 2001). GenBank Accession No .: O15438 (Aug. 20, 2001). GenBank Accession : BAB31359 (Jul. 5, 2001). GenBank Accession N .: BAA28146 (Feb. 13, 1999). GenBank Accession ... BAB32223 (Jul. 5, 2001). GenBank Accession N .: AAA36741 (Sep. 15, 1994). SWALL (SPTR) AccC ssion No.: Q9SN75 (May 1, 2000). GenBank Accession N : M25631 (Sep. 15, 1994). GenBank Accession OO : AL132955 (Nov. 30, 1999). GenBank Accession N : CAA76658 (May 12, 1999). GenBank Accession ..: CAB61989 (Nov. 30, 1999). GenBank Accession N : AAD01430 (Jun. 22, 1999). GenBank Accession ... AAB46616 (Feb. 10, 1997). GenBank Accession N : AAA61178 (Jan. 14, 1995). GenBank Accession : P33527 (Mar. 1, 2001). GenBank Accession N ..: CAA28370 (Mar 21, 1995). GenBank Accession : D64005 (Oct. 8, 1999). GenBank Accession N : P07996 (Aug. 20, 2001). GenBank Accession : AAB17690 (Oct. 30, 1996). GenBank Accession N : AAF02111 (Jan. 24, 2001). GenBank Accession : U19516 (Oct. 30, 1996). GenBank Accession N : AC009755 (Jan. 24, 2001). GenBank Accession GenBank Accession N : AAC12836 (Apr. 5, 2000). : AAB83979 (Nov. 6, 1997). GenBank Accession No : AC004238 (Apr. 5, 2000). GenBank Accession No.: AAB83983 (Nov. 6, 1997). SWALL (SPTR) Accession No.: O64760 (Aug. 1, 1998). GenBank Accession No.:O AF022853 (Nov. 8, 1997). SWALL (SPTR) Accession No.: Q9SRU3 (May 1, 2001). SWALL (SPTR) Accession No.: Q9UQ99 (May 1, 2000). GenBank Accession No .: AL050318 (Jul. 20, 2001). GenBank Accession No.: AJ251892 (Dec. 22, 1999). GenBank Accession No ..: CAB75365 (Jul. 20, 2001). GenBank Accession No.: CAB64381 (Dec. 22, 1999). SWALL (SPTR) Accession No.: Q9H135 (May 1, 2001). SWALL (SPTR) Accession No.: Q9U111 (May 1, 2000). GenBank Accession No ... AAC27662 (Jul. 28, 1998). Gubb and Garcia-Bellido (1982). “A genetic analysis of the GenBank Accession No .: AB054031 (Jul 27, 2001). determination of cuticular polarity during development in GenBank Accession No ... BAB32487 (Jul 27, 2001). Drosophila melanogaster.” J. Embryol. Exp. Morphol. 68: SWALL (SPTR) Accession No.: Q99PT4 (Jun. 1, 2001). 37-57. SWALL (SPTR) Accession No.: Q99PU2 (Jun. 1, 2001). GenBank Accession No.: AAD55929 (Sep. 21, 1999). GenBank Accession No ... BAB32495 (Jun. 27, 2001). GenBank Accession No.: AF148265 (Sep. 21, 1999). GenBank Accession No .: Z92844 (Nov. 23, 1999). SWALL (SPTR) Accession No.: Q9RPT3 (May 1, 2000). Moore and Endow (1995). “Kinesin proteins: a phylum of GenBank Accession No .: AAB59366 (Aug. 3, 1993). motors for microtubule-based motility.” Bioessays 18(3): GenBank Accession No ..: CAA32889 (Mar. 31, 1995). 207-219. GenBank Accession No .: BAA28955 (Jun. 20, 1998). GenBank Accession No.: AAD09622 (Jul 14, 1999). GenBank Accession No .: AB036840 (Jul 15, 2000). GenBank Accession No.: AF111168 (Jul 14, 1999). GenBank Accession No .: AB036841 (Jul 15, 2000). GenBank Accession No.: O95432 (May 1, 1999). GenBank Accession No ... BAB00617 (Jul 15, 2000). GenBank Accession No.: AB033062 (Nov. 11, 1999). GenBank Accession No ... BAB00618 (Jul 15, 2000). GenBank Accession No.: AB046863 (Feb. 22, 2001). SWALL (SPTR) Accession No.: Q9NDQ8 (Oct. 1, 2000). GenBank Accession No.: AB051474 (Feb. 7, 2001). SWALL (SPTR) Accession No.: Q9NDQ9 (Oct. 1, 2000). GenBank Accession No .: BAA86550 (Nov. 11, 1999). Hoyt (1994). “Cellular roles of kinesin and related proteins.” GenBank Accession No ... BAB13469 (Feb. 22, 2001). Curr. Opin. Cell Biol. 6(1): 63–68. GenBank Accession No.: BAB21778 (Feb. 7, 2001). GenBank Accession No.: AAF17300 (Dec. 14, 1999). SWALL (SPTR) Accession No.: Q9ULI4 (May 1, 2000). GenBank Accession No.: AF108229 (Dec. 14, 1999). SWALL (SPTR) Accession No.: Q9HCC9 (Mar. 1, 2000). SWALL (SPTR) Accession No.: Q9U541 (May 1, 2000). GenBank Accession No.: AAF81717 (Jul. 5, 2000). US 6,989.232 B2 Page 3

GenBank Accession No.: AAF81719 (Jul. 5, 2000). GenBank Accession No.: AAC48313 (Apr. 17, 1998). GenBank Accession No.: AF252281 (Jul. 4, 2000). SWALL (SPTR) Accession No.: Q23832 (Nov. 1, 1996). GenBank Accession No.: AF252283 (Jul. 4, 2000). GenBank Accession No.: U42213 (Apr. 17, 1998). GenBank Accession No.: BAA 13758 (Feb. 7, 1999). GenBank Accession No.: AAHO1130 (Jul 12, 2001). Online Inheritance in Man OMIM: 116806 (Jun. 16, 1994). GenBank Accession No.: AAHO5999 (Jul 12, 2001). Online Inheritance in Man OMIM: 127600 (Jun. 4, 1986). GenBank Accession No.: BC005999 (Jul 12, 2001). Online Inheritance in Man OMIM: 171190 (Jun. 29, 1988). SWALL (SPTR) Accession No.: Q9BQ24 (Jun. 1, 2001). Online Inheritance in Man OMIM: 175100 (Jun. 2, 1986). GenBank Accession No.: AAB80938 (Oct. 13, 1997). Online Inheritance in Man OMIM: 176705 (Aug. 6, 1991). GenBank Accession No.: AF022908 (Oct. 13, 1997). Online Inheritance in Man OMIM: 176807 (Feb. 7, 1992). SWALL (SPTR) Accession No.: O35379 (Jan. 1, 1998). Online Inheritance in Man OMIM: 178000 (Jun. 2, 1986). SWALL (SPTR) Accession No.: O76007 (Nov. 1, 1998). Online Inheritance in Man OMIM: 184757 (Jun. 21, 1994). GenBank Accession No.: AJO11654 (Oct. 14, 1998). Online Inheritance in Man OMIM: 232200 (Jun. 3, 1986). GenBank Accession No.:O CAA09726 (Oct. 14, 1998). Online Inheritance in Man OMIM: 300011 (Feb. 4, 1996). GenBank Accession No.:O AAF36018 (Nov. 3, 2001). Online Inheritance in Man OMIM: 300189 (May 18, 1999). GenBank Accession No.:O AC024201 (Nov. 3, 2001). Online Inheritance in Man OMIM:311800 (Jun. 24, 1986). GenBank Accession No.:O AAC72122 (Sep. 26, 2001). Online Inheritance in Man OMIM: 314250 (Jun. 4, 1986). GenBank Accession No.: AC005278 (Sep. 26, 2001). Online Inheritance in Man OMIM: 314580 (Jun. 4, 1986). SWALL (SPTR) Accession No.: Q97.VS7 (May 1, 1999). Online Inheritance in Man OMIM: 601014 (Jan. 23, 1996). Wijnholds, et al. (1997). “Increased sensitivity to anticancer Online Inheritance in Man OMIM: 601462 (Oct. 9, 1996). drugs and decreased inflammatory response in mice lacking Online Inheritance in Man OMIM: 603030 (Sep. 11, 1998). the multidrug resistance-associated protein.” Nat. Med. Online Inheritance in Man OMIM: 604050 (Jul 22, 1999). 3(11): 1275–1279. Online Inheritance in Man OMIM: 604054 (Jul 22, 1999). GenBank Accession No.: AL133463 (Mar. 15, 2001). GenBank Accession N o.: AAC25416 (Jul. 2, 1998). GenBank Accession No.: CAC16127 (Mar. 15, 2001). SWALL (SPTR) Accession No.: O88563 (Jul 15, 1999). SWALL (SPTR) Accession No.: Q9H599 (Mar. 1, 2001). GenBank Accession N o.: AAA82756 (Aug. 31, 1995). SWALL (SPTR) GenBank Accession No.: 074339 (Nov. 1, SWALL (SPTR) Accession No.: Q60898 (Nov. 1, 1996). 1998). GenBank Accession N o.: U29056 (Dec. 4, 1995). GenBank Accession No.: AL031174 (Feb. 1, 2000). GenBank Accession N o.: AP003044 (Jan. 26, 2001). GenBank Accession No.: CAA20110 (Feb. 1, 2000). GenBank Accession N o. BAB19328 (Jan. 26, 2001). Klein, et al. “Selection for genes encoding Secreted proteins SWALL (SPTR) Accession No.: Q9FP22 (May 1, 2000). and receptors' Proc. Natl. Acad. Sci. USA 93:7108–7113. GenBank Accession N o.: AAC25186 (Dec. 12, 2000). International Search Report for PCT US 01/26510, mailed GenBank Accession N o.: AF068754 (Dec. 12, 2000). Sep. 12, 2002. GenBank Accession N o.: AAD10191 (Jan. 28, 1999). Nagase, et al. (1999). "Prediction of the coding sequences of GenBank Accession N o. AF102777 (Jan. 28, 1999). unidentified human genes. XV. The complete Sequences of SWALL (SPTR) Accession No.: Q97.1T6 (May 30, 2000). 100 new CDNA clones from brain which code for large GenBank Accession N o.: AL158075 (Jun. 21, 2001). proteins in vitro.” DNA Research. 6:337–345. GenBank Accession N o.: AL050320 (Apr. 4, 2001). International Search Report for PCT/US01/26510. Mailed GenBank Accession N o.: CAC36074 (Apr. 4, 2001). on Mar. 5, 2003. SWALL (SPTR) Accession No.: Q9BQL4 (Jun. 1, 2000). SWALL (SPTR) Accession No.: Q9U3B7 (May 1, 2000). * cited by examiner US 6,989,232 B2 1 2 PROTEINS AND NUCLEC ACIDS NOV14, NOV15 and NOV16 nucleic acids and polypep ENCODING SAME tides. These nucleic acids and polypeptides, as well as variants, derivatives, homologs, analogs and fragments RELATED APPLICATIONS thereof, will hereinafter be collectively designated as “NOVX nucleic acid or polypeptide sequences. This application claims priority from Provisional Appli In one aspect, the invention provides an isolated NOVX cations U.S. Ser. No. 60/228,191, filed Aug. 25, 2000, U.S. nucleic acid molecule encoding a NOVX polypeptide that Ser. No. 60/267,300, filed Feb. 8, 2001, U.S. Ser. No. includes a nucleic acid Sequence that has identity to the 60/269,961, filed Feb. 20, 2001, and U.S. Ser. No. 60/277, nucleic acids disclosed in SEQ ID NOS: 1, 8, 10, 12, 18, 20, 337, filed Mar. 20, 2001, each of which is incorporated by 26, 28, 34, 36, 42, 44, 50, 52, 54, 60, 62, 64, 70, 72, 74, 76, reference in its entirety. 82, 89, 91, 99 and 101. In some embodiments, the NOVX FIELD OF THE INVENTION nucleic acid molecule will hybridize under Stringent condi tions to a nucleic acid Sequence complementary to a nucleic The invention generally relates to novel nucleic acids and acid molecule that includes a protein-coding Sequence of a polypeptides encoded thereby. 15 NOVX nucleic acid sequence. The invention also includes an isolated nucleic acid that encodes a NOVX polypeptide, BACKGROUND OF THE INVENTION or a fragment, homolog, analog or derivative thereof. For Eukaryotic cells are subdivided by membranes into mul example, the nucleic acid can encode a polypeptide at least tiple functionally distinct compartments that are referred to 80% identical to a polypeptide comprising the as organelles. Each organelle includes proteins essential for sequences of SEQ ID NOS: 2, 9, 11, 19, 27, 35, 43, 51, 53, its proper function. These proteins can include Sequence 61, 63, 65, 71, 73, 75, 83,90, 92, 100 and 102. The nucleic motifs often referred to as Sorting Signals. The Sorting acid can be, for example, a genomic DNA fragment or a Signals can aid in targeting the proteins to their appropriate cDNA molecule that includes the nucleic acid Sequence of cellular organelle. In addition, Sorting Signals can direct any of SEQ ID NOS: 1, 8, 10, 12, 18, 20, 26, 28, 34, 36, 42, Some proteins to be exported, or Secreted, from the cell. 25 44, 50, 52, 54, 60, 62, 64, 70, 72, 74, 76, 82, 89,91, 99 and One type of Sorting Signal is a signal Sequence, which is 101. also referred to as a signal peptide or leader Sequence. The Also included in the invention is an oligonucleotide, e.g., Signal Sequence is present as an amino-terminal extension on an oligonucleotide which includes at least 6 contiguous a newly Synthesized polypeptide chain. A signal Sequence nucleotides of a NOVX nucleic acid (e.g., SEQ ID NOS: 1, 8, 10, 12, 18, 20, 26, 28, 34, 36, 42, 44, 50, 52, 54, 60, 62, can target proteins to an intracellular organelle called the 64, 70, 72, 74,76, 82, 89,91, 99 and 101) or a complement endoplasmic reticulum ("ER”). of Said oligonucleotide. The Signal Sequence takes part in an array of protein Also included in the invention are Substantially purified protein and protein-lipid interactions that result in translo NOVX polypeptides (SEQ ID NOS: 2, 9, 11, 19, 27, 35,43, cation of a polypeptide containing the Signal Sequence 35 51,53, 61, 63, 65, 71, 73, 75, 83,90, 92, 100 and 102). In through a channel in the ER. After translocation, a certain embodiments, the NOVX polypeptides include an membrane-bound enzyme, named a signal peptidase, liber amino acid Sequence that is Substantially identical to the ates the mature protein from the Signal Sequence. amino acid Sequence of a human NOVX polypeptide. The ER functions to Separate membrane-bound proteins The invention also features antibodies that immunoselec and Secreted proteins from proteins that remain in the 40 tively bind to NOVX polypeptides, or fragments, homologs, cytoplasm. Once targeted to the ER, both Secreted and analogs or derivatives thereof. membrane-bound proteins can be further distributed to In another aspect, the invention includes pharmaceutical another cellular organelle called the Golgi apparatus. The compositions that include the rapeutically - or Golgi directs the proteins to other cellular organelles Such as prophylactically-effective amounts of a therapeutic and a vesicles, lySOSomes, the plasma membrane, mitochondria 45 pharmaceutically-acceptable carrier. The therapeutic can be, and microbodies. e.g., a NOVX nucleic acid, a NOVX polypeptide, or an Secreted and membrane-bound proteins are involved in antibody specific for a NOVX polypeptide. In a further many biologically diverse activities. Examples of known aspect, the invention includes, in one or more containers, a Secreted proteins include human insulin, interferon, therapeutically- or prophylactically-effective amount of this interleukins, transforming GENX-beta, human growth 50 pharmaceutical composition. hormone, erythropoietin, and lymphokines. Only a limited In a further aspect, the invention includes a method of number of genes encoding human membrane-bound and producing a polypeptide by culturing a cell that includes a Secreted proteins have been identified. NOVX nucleic acid, under conditions allowing for expres The invention generally relates to nucleic acids and 55 sion of the NOVX polypeptide encoded by the DNA. If polypeptides encoded by them. More specifically the inven desired, the NOVX polypeptide can then be recovered. tion relates to nucleic acids encoding cytoplasmic, nuclear, In another aspect, the invention includes a method of membrane bound, and Secreted polypeptides, as well as detecting the presence of a NOVX polypeptide in a Sample. vectors, host cells, antibodies, and recombinant methods for In the method, a Sample is contacted with a compound that producing these nucleic acids and polypeptides. 60 Selectively binds to the polypeptide under conditions allow ing for formation of a complex between the polypeptide and SUMMARY OF THE INVENTION the compound. The complex is detected, if present, thereby The invention is based in part upon the discovery of identifying the NOVX polypeptide within the sample. nucleic acid Sequences encoding novel polypeptides. The The invention also includes methods to identify specific novel nucleic acids and polypeptides are referred to herein 65 cell or tissue types based on their expression of a NOVX. as NOVX, or NOV1, NOV2, NOV3, NOV4, NOV5, NOV6, Also included in the invention is a method of detecting the NOV7, NOV8, NOV9, NOV10, NOV11, NOV12, NOV13, presence of a NOVX nucleic acid molecule in a sample by US 6,989,232 B2 3 4 contacting the sample with a NOVX nucleic acid probe or associated with altered levels of a NOVX polypeptide, a primer, and detecting whether the nucleic acid probe or NOVX nucleic acid, or both, in a Subject (e.g., a human primer bound to a NOVX nucleic acid molecule in the Subject). The method includes measuring the amount of the Sample. NOVX polypeptide in a test sample from the subject and In a further aspect, the invention provides a method for comparing the amount of the polypeptide in the test Sample modulating the activity of a NOVX polypeptide by contact to the amount of the NOVX polypeptide present in a control ing a cell sample that includes the NOVX polypeptide with a compound that binds to the NOVX polypeptide in an sample. An alteration in the level of the NOVX polypeptide amount Sufficient to modulate the activity of Said polypep in the test Sample as compared to the control Sample tide. The compound can be, e.g., a Small molecule, Such as indicates the presence of or predisposition to a disease in the a nucleic acid, peptide, polypeptide, peptidomimetic, Subject. Preferably, the predisposition includes, e.g., the carbohydrate, lipid or other organic (carbon containing) or diseases and disorders disclosed above and/or other patholo inorganic molecule, as further described herein. gies and disorders of the like. Also, the expression levels of Also within the scope of the invention is the use of a the new polypeptides of the invention can be used in a therapeutic in the manufacture of a medicament for treating 15 method to Screen for various cancers as well as to determine or preventing disorders or Syndromes including, e.g., those the Stage of cancers. described for the individual NOVX nucleotides and polypeptides herein, and/or other pathologies and disorders In a further aspect, the invention includes a method of of the like. treating or preventing a pathological condition associated The therapeutic can be, e. g., a NOVX nucleic acid, a with a disorder in a mammal by administering to the Subject NOVX polypeptide, or a NOVX-specific antibody, or a NOVX polypeptide, a NOVX nucleic acid, or a NOVX biologically-active derivatives or fragments thereof. Specific antibody to a Subject (e.g., a human Subject), in an For example, the compositions of the present invention amount Sufficient to alleviate or prevent the pathological will have efficacy for treatment of patients suffering from the 25 condition, in preferred embodiments, the disorder, includes, diseases and disorders disclosed above and/or other patholo gies and disorders of the like. The polypeptides can be used e.g., the diseases and disorders disclosed above and/or other as immunogens to produce antibodies Specific for the pathologies and disorders of the like. invention, and as vaccines. They can also be used to Screen for potential agonist and antagonist compounds. For In yet another aspect, the invention can be used in a example a cDNA encoding NOVX may be useful in gene method to identity the cellular receptorS and downstream therapy, and NOVX may be useful when administered to a effectors of the invention by any one of a number of Subject in need thereof. By way of non-limiting example, the techniques commonly employed in the art. These include but compositions of the present invention will have efficacy for are not limited to the two-hybrid system, affinity treatment of patients Suffering from the diseases and disor purification, co-precipitation with antibodies or other derS disclosed above and/or other pathologies and disorders 35 Specific-interacting molecules. of the like. The invention further includes a method for screening for Unless otherwise defined, all technical and Scientific a modulator of disorders or Syndromes including, e.g., the terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this diseases and disorders disclosed above and/or other patholo 40 gies and disorders of the like. The method includes contact invention belongs. Although methods and materials similar ing a test compound with a NOVX polypeptide and deter or equivalent to those described herein can be used in the mining if the test compound binds to said NOVX practice or testing of the present invention, Suitable methods polypeptide. Binding of the test compound to the NOVX and materials are described below. All publications, patent polypeptide indicates the test compound is a modulator of 45 applications, patents, and other references mentioned herein activity, or of latency or predisposition to the aforemen are incorporated by reference in their entirety. In the case of tioned disorders or Syndromes. conflict, the present Specification, including definitions, will Also within the scope of the invention is a method for control. In addition, the materials, methods, and examples Screening for a modulator of activity, or of latency or are illustrative only and not intended to be limiting. predisposition to an disorders or Syndromes including, e.g., 50 the diseases and disorders disclosed above and/or other Other features and advantages of the invention will be pathologies and disorders of the like by administering a test apparent from the following detailed description and claims. compound to a test animal at increased risk for the afore mentioned disorders or Syndromes. The test animal DETAILED DESCRIPTION OF THE expresses a recombinant polypeptide encoded by a NOVX 55 nucleic acid. Expression or activity of NOVX polypeptide is INVENTION then measured in the test animal, as is expression or activity of the protein in a control animal which recombinantly The present invention provides novel nucleotides and expresses NOVX polypeptide and is not at increased risk for polypeptides encoded thereby. Included in the invention are the disorder or syndrome. Next, the expression of NOVX 60 the novel nucleic acid Sequences and their polypeptides. The polypeptide in both the test animal and the control animal is sequences are collectively referred to as “NOVX nucleic compared. A change in the activity of NOVX polypeptide in acids” or “NOVX polynucleotides” and the corresponding the test animal relative to the control animal indicates the encoded polypeptides are referred to as “NOVX polypep test compound is a modulator of latency of the disorder or tides” or “NOVX proteins.” Unless indicated otherwise, Syndrome. 65 “NOVX” is meant to refer to any of the novel sequences In yet another aspect, the invention includes a method for disclosed herein. Table A provides a Summary of the NOVX determining the presence of or predisposition to a disease nucleic acids and their encoded polypeptides. US 6,989,232 B2

TABLE A Sequences and Corresponding SEQ ID Numbers SEO Internal SEO ID ID NO NOVX Identification NO (nt) (aa) Homology NOV1 24CSO17 1. 2 Kinesin like protein; Overlaps genomic clone with KIAA1236-like protein, predicted secreted NOV2 24CSO59; CG56403- 8 9 Novel Nuclear Protein-like 01: 146556340 protein NOV3 24SC113; CG56383-01 10, 12 11 LIM-domain-containing Prickle like, secreted-like protein NOV4 24SC128; CG56824- 18, 20 19 hypothetical protein similar 01: 13374351; to Y71F9B.2 PROTEIN 3374350; 13374349 Caenorhabditis elegans-like protein NOV5 24SC239; 13374166; 26, 28 27 CG8441 PROTEIN-like protein 3374167; 13374355; 3374356; 13374357; 3374358; 13374359; 3374360; 13374361; 3374362 NOV6 24SC3OO 34, 36 35 eEIF-2B epsilon subunit-like protein NOV7 24SC526; 13374363; 42, 44 43 heat shock factor binding 3374364; 13374365; protein 1-like protein 3374366 NOV8 24SC714; 13373973; 50 51 putative secreted protein-like 3.373974 protein NOV9 6CSO60; 13374352; 52, 54 53 Kelch-like protein-like 3374353; 13374354 protein NOV10 OO340173; 1373975; 60, 62, 64 61, hypothetical 22.2 kDa protein 373976; 1373977; 63, SLRO305-like protein; 373978 65 Transmembrane NOV11 8793.8450; 70 71 transposase-like protein NOV12 87917235; 13373979; 72 73 Novel Leucine Zipper CG92OO2-01 Containing Type II membrane like protein-like protein NOV13 87919652; 74, 76 75 PO7948 tyrosine-protein kinase LYN-like protein NOV14 87935554; 82 83 O15438 canalicular multispecific organic anion ransporter 2-like protein; multidrug resistance NOV15a 10O399281 89 90 novel intracellular hrombospondin domain containing protein-like protein NOV15b CG57356-01; 91 92 novel intracellular 15951875.4 hrombospondin domain containing protein-like protein NOV16a 101.33OO77 99 100 FYVE finger-containing phosphoinositide kinase-like protein NOV16b CG57248-01; 101 102 FYVE finger-containing 1OO391903 phosphoinositide kinase-like protein

NOVX nucleic acids and their encoded polypeptides are presence of mutations in the new genes. Specific uses are useful in a variety of applications and contexts. The various 55 described for each of the Sixteen genes, based on the tissues NOVX nucleic acids and polypeptides according to the in which they are most highly expressed. Uses include invention are useful as novel members of the protein fami developing products for the diagnosis or treatment of a lies according to the presence of domains and Sequence variety of diseases and disorders. relatedness to previously described proteins. Additionally, For example, NOV1 is homologous to a kinesin-like Superfamily of proteins. Thus, the NOV1 nucleic acids, NOVX nucleic acids and polypeptides can also be used to 60 polypeptides, antibodies and related compounds according identify proteins that are members of the family to which the to the invention will be useful in therapeutic and diagnostic NOVX polypeptides belong. applications implicated in, for example, cancer (e.g. renal The NOVX genes and their corresponding encoded pro and/or gastric cancer), neurodegenerative diseases, diseases teins are useful for preventing, treating or ameliorating of vesicular transport, and infectious diseases, and/or other medical conditions, e.g., by protein or gene therapy. Patho 65 pathologies, diseases and disorders. logical conditions can be diagnosed by determining the Also, NOV2 is homologous to the Novel Nuclear Protein amount of the new protein in a Sample or by determining the like family of proteins. Thus NOV2 nucleic acids, US 6,989,232 B2 7 8 polypeptides, antibodies and related compounds according will be useful in therapeutic and diagnostic applications to the invention will be useful in therapeutic and diagnostic implicated in, for example, ACTH deficiency; Convulsions, applications implicated in, for example, cancer and/or other familial febrile, 1, Duane Syndrome; congenital Adrenal pathologies, diseases and disorders. hyperplasia due to 11-beta-hydroxylase deficiency; Further, NOV3 is homologous to a family of LIM glucocorticoid-remediable Aldosteronism; congenital domain-containing Prickle-like proteins. Thus, the NOV3 Hypoaldosteronism due to CMO I deficiency; congenital nucleic acids and polypeptides, antibodies and related com Hypoaldosteronism due to CMO II deficiency; Susceptibility pounds according to the invention will be useful in thera to Nijmegen breakage Syndrome, Low renin hypertension; peutic and diagnostic applications implicated in, for Anemia, Ataxia-telangiectasia, Autoimmume disease, example, dystonia-parkinsonism Syndrome, dyskeratosis, Immunodeficiencies, kidney cancer, proliferative disease, hereditary benign intraepithelial; developmental disorders, immune-mediated disease, allergy, asthma, and psoriasis diseases of cytoskeletal function, cancer (e.g. gastric uterine, and/or other pathologies, diseases and disorders. lung and/or renal cancer), neurodegenerative diseases (e.g. NOV11 is homologous to a transposase-like protein fam Alzheimer's disease, multiple Sclerosis and stroke) and/or ily of proteins. Thus, the NOV11 nucleic acids, other pathologies, diseases and disorders. 15 polypeptides, antibodies and related compounds according Also, NOV4 is homologous to the hypothetical protein to the invention will be useful in, for example, potential similar to Y71F9B.2 PROTEIN–Caenorhabditis elegans therapeutic applications Such as the following: (i) a protein like family of proteins. Thus, NOV4 nucleic acids, therapeutic, (ii) a Small molecule drug target, (iii) an anti polypeptides, antibodies and related compounds according body target (therapeutic, diagnostic, drug targeting/ to the invention will be useful in therapeutic and diagnostic cytotoxic antibody). (iv) a nucleic acid useful in gene applications implicated in, for example; heart disease, therapy (gene delivery/gene ablation), (v) an agent promot Stroke, autoimmune disease, infectious disease, and cancer ing tissue regeneration in vitro and in Vivo, and (vi) a (e.g. renal and/or breast cancer) and/or other pathologies, biological defense weapon, and/or transposase-related diseases and disorders. pathologies, diseases and disorders. Additionally, NOV5 is homologous to the CG8441 25 Also, NOV12 is homologous to the Novel Leucine Zipper PROTEIN-like family of proteins. Thus NOV5 nucleic Containing Type II membrane like protein-like family of acids, polypeptides, antibodies and related compounds proteins. Thus NOV12 nucleic acids, polypeptides, antibod according to the invention will be useful in therapeutic and ies and related compounds according to the invention will be diagnostic applications implicated in, for example, cancer useful in therapeutic and diagnostic applications implicated (e.g. breast and/or ovarian cancer) and/or other pathologies, in, for example, prostate cancer, lung cancer, diabetes, diseases and disorders. abnormal wound healing, congenital Slow-channel myOS Also, NOV6 is homologous to the eEIF-2B epsilon thenic Syndrome, asthma, IBD, contact hyperSensitivity, subunit-like family of proteins. Thus NOV6 nucleic acids, infection disease, allorejection, autoimmunity, inflammation polypeptides, antibodies and related compounds according and/or other pathologies, diseases and disorders. to the invention will be useful in therapeutic and diagnostic 35 Further, NOV13 is homologous to a family of P07948 applications implicated in, for example, cancer (e.g. breast tyrosine-protein kinase LYN-like proteins. Thus, the and/or ovarian cancer) and/or other pathologies, diseases NOV13 nucleic acids and polypeptides, antibodies and and disorders. related compounds according to the invention will be useful Further, NOV7 is homologous to members of the heat in therapeutic and diagnostic applications implicated in, for Shock factor binding protein l-like family of proteins. Thus, 40 example, breast cancer, diabetes and/or other pathologies, the NOV7 nucleic acids, polypeptides, antibodies and diseases and disorders. related compounds according to the invention will be useful Also, NOV14 is homologous to the O15438 canalicular in therapeutic and diagnostic applications implicated in, for multispecific organic anion transporter 2-like family of example; cancer (e.g. breast and/or ovarian cancer) and/or 45 proteins. Thus, NOV14 nucleic acids, polypeptides, anti other pathologies, diseases and disorders. bodies and related compounds according to the invention Still further, NOV8 is homologous to the putative secreted will be useful in therapeutic and diagnostic applications protein-like protein family of proteins. Thus, NOV8 nucleic implicated in, for example, detoxification, drug resistance, acids and polypeptides, antibodies and related compounds multidrug resistance, inflammatory disease, cancer, liver according to the invention will be useful in therapeutic and 50 disease and/or other pathologies, diseases and disorders. diagnostic applications implicated in, for example, cancer Additionally, NOV15 is homologous to the novel intrac (e.g. liver, lung, ovarian and/or colon cancer), inflammatory ellular thrombospondin domain containing protein-like fam diseases and/or other pathologies, diseases and disorders. ily of proteins. Thus NOV15 nucleic acids, polypeptides, Additionally, NOV9 is homologous to the Kelch-like antibodies and related compounds according to the invention protein-like family of proteins. Thus, NOV9 nucleic acids 55 will be useful in therapeutic and diagnostic applications and polypeptides, antibodies and related compounds accord implicated in, for example, Systemic lupus erythematosus, ing to the invention will be useful in therapeutic and autoimmune disease, asthma, emphysema, Scleroderma, diagnostic applications implicated in Menkes disease, allergy, ARDS; fertility, breast cancer, liver differentiation, myoglobinuria/hemolysis due to PGK deficiency, and hypogonadism; angiogenesis, Vascularization in CNS tissue Wieacker-Wolff syndrome, neurological disorders, 60 undergoing repair/regeneration, CNS-related cancers, dis development-related pathologies and/or other various eases of the thyroid gland, immunological disease, diseases pathologies, diseases and disorders. of the thyroid gland and pancreas as well as other metabolic NOV10a, NOV10b and NOV10c are homologous to a and neuroendocrine diseases and/or other pathologies, dis hypothetical 22.2 kDa protein SLR0305-like protein family eases and disorders. of proteins and the Type IIIb plasma membrane-like family 65 Also, NOV16a and NOV16b are homologous to the of proteins. Thus, the NOV10 nucleic acids, polypeptides, FYVE finger-containing phosphoinositide kinase-like fam antibodies and related compounds according to the invention ily of proteins. Thus NOV16 nucleic acids, polypeptides, US 6,989,232 B2 9 10 antibodies and related compounds according to the invention myasthenia gravis, periodic paralysis, mental disorders will be useful in therapeutic and diagnostic applications including mood, anxiety, and Schizophrenic disorders, implicated in, for example, diabetes, obesity, fertility, Sig akathesia, amnesia, catatonia, diabetic neuropathy, tardive naling and/or other pathologies, diseases and disorders. dyskinesia, dystonias, paranoid psychoses, postherpetic The NOVX nucleic acids and polypeptides can also be neuralgia, and Tourette's disorder; and disorders of vesicular used to Screen for molecules, which inhibit or enhance transport Such as cystic fibrosis, glucose-galactose malab NOVX activity or function. Specifically, the nucleic acids Sorption Syndrome, hypercholesterolemia, diabetes mellitus, and polypeptides according to the invention may be used as diabetes insipidus, hyper- and hypoglycemia, Grave's targets for the identification of Small molecules that modu disease, goiter, Cushing's disease, Addison's disease, gas late or inhibit, e.g., neurogenesis, cell differentiation, cell trointestinal disorders including ulcerative colitis, gastric proliferation, hematopoiesis, wound healing and angiogen and duodenal ulcers, other conditions associated with abnor esis. mal vesicle trafficking including acquired immunodefi In one embodiment of the present invention, NOVX or a ciency Syndrome (AIDS), allergic reactions, autoimmune fragment or derivative thereof may be administered to a hemolytic anemia, proliferative glomerulonephritis, inflam Subject to treat or prevent a disorder associated with 15 matory bowel disease, multiple Sclerosis, myasthenia gravis, decreased expression or activity of NOVX. Examples of rheumatoid arthritis, osteoarthritis, Scleroderma, Chediak Such disorders include, but are not limited to, cancerS Such Higashi Syndrome, Sjogren's Syndrome, Systemic lupus as adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, Sarcoma, teratocarcinoma, and, in particular, can erythiematosus, toxic Shock Syndrome, traumatic tissue cers of the adrenal gland, bladder, bone, bone marrow, brain, damage and viral, bacterial, fungal, helminthic, and proto breast, cervix, gall bladder, ganglia, gastrointestinal tract, Zoal infections, as well as additional indications listed for the heart, kidney, liver, lung, muscle, ovary, pancreas, individual NOVX clones. parathyroid, penis, prostate, Salivary glands, skin, Spleen, The NOVX nucleic acids and proteins of the invention are testis, thymus, thyroid, and uterus, neurological disorders useful in potential diagnostic and therapeutic applications Such as epilepsy, ischemic cerebrovascular disease, Stroke, and as a research tool. These include Serving as a specific or cerebral neoplasms, Alzheimer's disease, Pick's disease, 25 Selective nucleic acid or protein diagnostic and/or prognostic Huntington's disease, dementia, Parkinson's disease and marker, wherein the presence or amount of the nucleic acid other extrapyramidal disorders, amyotrophic lateral Sclero or the protein are to be assessed. These also include potential sis and other motor neuron disorders, progressive neural therapeutic applications Such as the following: (i) a protein muscular atrophy, retinitis pigmentosa, hereditary ataxias, therapeutic, (ii) a Small molecule drug target, (iii) an anti multiple Sclerosis and other demyelinating diseases, bacte body target (therapeutic, diagnostic, drug targeting/ rial and viral meningitis, brain abscess, Subdural empyema, cytotoxic antibody), (iv) a nucleic acid useful in gene epidural abscess, Suppurative intracranial thrombophlebitis, therapy (gene delivery/gene ablation), (v) an agent promot myelitis and radiculitis, Viral central nervous System disease, ing tissue regeneration in vitro and in Vivo, and (vi) a prion diseases including kuru, Creutzfeldt-Jakob disease, biological defense weapon. and Gerstmann-Straussler-Scheinker Syndrome, fatal famil 35 ial insomnia, nutritional and metabolic diseases of the ner Additional utilities for the NOVX nucleic acids and Vous System, neurofibromatosis, tuberous Sclerosis, cerebel polypeptides according to the invention are disclosed herein. loretinal he mangioblastomatosis, encephalotrigeminal NOV1 Syndrome, mental retardation and other developmental dis A disclosed NOV1 nucleic acid of 1065 nucleotides (also orders of the central nervous System, cerebral palsy, neuro 40 referred to as 24CS017) encoding a novel kinesin-like skeletal disorders, autonomic nervous System disorders, protein is shown in Table 1A. An open reading frame was cranial nerve disorders, Spinal cord diseases, muscular dyS identified beginning with an ATG initiation codon at nucle trophy and other neuromuscular disorders, peripheral ner otides 1-3 and ending with a TAA codon at nucleotides Vous System disorders, dermatomyositis and polymyositis, 1063-1065. The start and stop codons are shown in bold inherited, metabolic, endocrine, and toxic myopathies, letters in Table 1A.

TABLE 1A NOV1 nucleotide sequence. (SEQ ID NO: 1) AGACGGGGCTGCTCCTCCTCAGCCTCCAGTCAGGCTGTGTGGCAGCGATCACCTCCATGTCGATGGAGTGTCTGTG

CAGTTTGGGAGCGAGGCTCTGCCTCTCTCGGTCTACCCTTGGGAGTGAAATAGTGACCGTCCCTTTGAGCCCGAGAG

CTGGGGAGAAGGCCGTGCCTGTTAACAGCTGCCTGGACCCTCTCTGGAGAGCAGCAGAGAGAGGCGGGGCTGGAGGA

GATGTTGCCAAGAACCTAAGGGTGAAAGTCATGCTTCGCATCTGTTCCACCTTGGCTCGAGATACTTCAGAATCCAG

CTCTTTCTTAAAGGTGGACCCACGGAAGAAGCAGATCACCTTGTACGATCCCCTGACTTGTGGAGGTCAAAATGCCT

TCCAAAAGAGAGGCAACCAGGTTCCTCCAAAGATGTTTGCCTTCGATGCAGTTTTTCCACAAGACGCTTCTCAGGCT

GAAGTGTGTGCAGGCACCGTGGCAGAGGTGATCCAGTCTGTGGTCAACGGGGCAGATGGCTGCGTGTTCTGTTTCGG

CCACGCCAAACTGGGAAAATCCTACACCATGATCGGAAAGGATGATTCCATGCAGAACCTGGGCATCATTCCCTGTG

CCATCTCTTGGCTCTTCAAGCTCATAAACGAACGCAAGGAAAAGACCGGCGCCCGTTTCTCAGTCCGGGTTTCCGCC US 6,989,232 B2 11 12

TABLE 1A-continued NOV1 nucleotide sequence.

GTGGAAGTGTGGGGGAAGGAGGAGAACCTGCGGGACCTGCTGTCGGAGGTGGCCACGGGCAGCCTGCAGGACGGCCA

GTCCCCGGGCGTGTACCTCTGTGAGGACCCCATCTGCGGCACGCAGCTGCAGAACCAGAGCGAGCTGCGGGCCCCCA

CCGCAGAGAAGGCTGCCTTTTTCCTGGATGCCGCCATTGCCTCCCGCAGGAGCCACCAACAGGACTGTGATGAGGAC

GACCACCGCAACTCACACGTGTTCTTCACACTGCACATCTACCAGTACCGGATGGAGAAGAGCGGGAAAGGGGGAAT

TCTGCTTTCGATTTGGAATCTGAAAGTAGGGAGAAATCTTGAAAACAAGGAAACAGTTCATAA

A disclosed NOV1 polypeptide (SEQ ID NO:2) encoded 15 parameter that describes the number of hits one can “expect” by SEQ ID NO:1 has 354 amino acid residues and is to See just by chance when Searching a database of a presented in Table 1B using the one-letter amino acid code. particular size. It decreases exponentially with the Score (S) SignalP, Psort and/or Hydropathy results predict that NOV1 that is assigned to a match between two Sequences. has a signal peptide and is likely to be localized extracel Essentially, the E value describes the random background lularly with a certainty of 0.4562. In an alternative 20 noise that exists for matches between Sequences. embodiment, NOV1 is likely to be localized to the endo The Expect value is used as a convenient way to create a plasmic reticulum membrane with a certainty of 0.1000, or Significance threshold for reporting results. The default to the endoplastic reticulum lumen with a certainty of value used for blasting is typically set to 0.0001. In BLAST 0.1000, or to the microbody (peroxisome) with a certainty of 2.0, the Expect value is also used instead of the P value 0.1000. The most likely cleavage site for a NOV1 peptide is 25 (probability) to report the significance of matches. For between amino acids 16 and 17, i.e., at the dash between example, an E value of one assigned to a hit can be amino acids VAA-IT, NOV1 has a molecular weight of interpreted as meaning that in a database of the current size 38.525.7 Daltons. one might expect to See one match with a similar Score

TABLE 1B Encoded NOV1 protein sequence. (SEQ ID NO: 2) MTGLLLLSLQSGCVAA/ITSMSMECLSCLGARLCLSRSTLGSEIVTVPLSPRAGEKAVPWNSCLDPLWRAAERGGAGGD

WAKNLRVKVMLRICSTLARDTSESSSFLKVDPRKKQITLYDPLTCGGONAFQKRGNOWPPKMFAFDAVFPQDASQAEWC

AGTWAEWIQSWWNGADGCWFCFGHAKLGKSYTMIGKDDSMONLGIIPCAISWLFKLINERKEKTGARFSWRVSAVEVWG

KEENLRDLLSEWATGSLQDGQSPGVYLCEDPICGTOLONQSELRAPTAEKAAFFLDAAIASRRSHQQDCDEDDHRNSHW

FFTLHIYOYRMEKSGKGGILLSIWNLKVGRNLENKETVH

In all BLAST alignments herein, the “E-vaLue” or Simply by chance. An Evalue of Zero means that one would “Expect' value is a numeric indication of the probability that 45 not expect to See any matches with a similar Score Simply by the aligned Sequences could have achieved their similarity to chance. See, e.g., BLAST educational information provided the BLAST query Sequence by chance alone, within the by the National Center for Biotechnology Information database that was searched. The Expect value (E) is a (NCBI), Bethesda, Md.

TABLE 1C

BLASTP results for NOV1 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect Q9ULI4; KIAA1236 PROTEIN 1481 155/222 185/222 4e-87 ABO33062: (FRAGMENT) homo sapiens (70%) (83%) BAA86550.1 6/2001 Q99PU2: KNESIN SUPERFAMILY 130 122/145 126/145, 7e-64 KIF26B; PROTEIN 26B (FRAGMENT). (84%) (87%) BAB32487 KIF26B, nus musculus 6/2001 Q99PT4: KNESIN SUPERFAMILY 147 106/147 130/147, 2e-58 ABO54031; PROTEIN 26A (FRAGMENT). (72%) (88%) BAB32495.1; KIF26A, nus musculus 6/2001 CG14535 PROTEIN. 302 69/165 99/165, 9e-28 US 6,989,232 B2 13 14

TABLE 1C-continued

BLASTP results for NOV1 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect AEO03619; droSophia melanogaster (42%) (60%) AAFS2569.1 6/2001 Q9U541; VAB-8L. caenorhabditis 1066 61/191 98/191, 1e-18 AF108229; elegans 6/2001 (32%) (51%) AAF173OO.1

The homology of these and other Sequences is shown regions that may be required to preserve Structural or graphically in the ClustalW analysis shown in Table 1D. In 15 functional properties), whereas non-highlighted amino acid the ClustalW alignment of the NOV1 protein, as well as all residues are less conserved and can potentially be mutated to other ClustalW analyses herein, the black outlined amino a much broader extent without altering protein Structure or acid residues indicate regions of conserved sequence (i.e., function.

TABLE 1D

ClustalW Analysis of NOV1

1) Novel NOV1 (SEQ ID NO: 2) 2) BAA86550 ... 1 partial sequence used (SEQ IS NO:3) 3) KIF26B ( SEQ ID NO: 4) 4) KIF26A ( SEQ ID NO:5) 5) CG14535 (SEQ ID NO : 6) 6) WAB-8L - partial sequence used (SEQ ID NO:7)

NOW1 MTGLLLLSLQSGCWAAITSMSMECLCSLGARLCLSRSTLGSEIWTVPLSP 50 BAA8655 O1. KIF2.6B KIF26A CG14535 MATTSTSNMS 10 WAB-8L

NOW1 RAGEKAWPWNSCLDPLWRAAERGGAGGDWAKNLRWKWMLRICSTILARDTS 100 BAA8655 O1. KIF2.6B KIF26A CG14535 RNGGFCGALQRAPPPMPPRLIRRLSSRECTGVGKWKVMLRVADRDRNSGG 60 WAB-8L MEACSSKTSLIL 11

NOW1 ESSSFLKVDPRKKQITSYDP-LTCGGONAFQKRGNOVP--PKMF 47 BAA8655 O1. DP-AAGPPGSAGFRRAATAAW-PKMF 35 KIF2.6B KIF26A CG14535 TEPDFMALDKKKRQW DPRTACPPPQAAQERAPMVAA-PKMF WAB-8L LHSPLRTIPKLRLCAS3 SEDVAHGRCSLTDOHLQIEGKNYSKTT

NOW1 BAA8655 O1. KIF2.6B KIF26A CG14535 TGEDEDKQSEWCASAESEV WAB-8L RTEA SE

NOW1 BAA8655 O1. KIF2.6B KIF26A CG14535 WAB-8L

NOW1 BAA8655 O1. KIF2.6B KIF26A CG14535 Hyssists. WAB-8L DPRHRVVKIVDDARTGVFIDNESEIRVE US 6,989,232 B2 15 16

TABLE 1D-continued ClustalW Analysis of NOV1 NOW1 HVFFTLHIYQYRMEKSGKG 332 BAA8655O1 VYQYRMEKCGEG 220 KIF2.6B HSHMLFTILHIYOYRM 130 KIF26A SSHMLFTLHVYQYR2EKCG- 147

CG14535 PPPSVRPFSSTORSPDA- 302 WAB-8L SHVFISLSLYSYKMGDKM&G 235

NOW1 G------ILISIWN-- 340 BAA865501 GMSGGRSRLHLIDLGSCEAAAGRAGEAAGGPLCLSLSALGSWILALWNGA 270 KIF2.6B 130 KIF26A 147 CG14535 3O2 WAB-8L G----RRRLCFLDMGIGERNSTNGG------MTMPALGSILLAMVQRN 273

NOW1 ------T.KWG------RNLENKEWH---- 354 BAA86550.1 KHVPYRDHRLTMLLRESLATAGCRTTMIAHVSDAPAQHAETLSTWOLAAR 320 KIF2.6B 130 KIF26A 147 CG14535 3O2 WAB-8L 319

NOW1 ------354 BAA8655 O1. IHRLRRKKAKYASSSSGGESSCEEGRARRPPHLRPFHPRTWALDPD 370 KIF2.6B 130 KIF26A 147 CG14535 3O2 WAB-8L IARTRAKSMWGHGRKSSGTMSTGTMESNSSSCG------TTTTPG 363

3O Other BLAST results include sequences from the Patp database, which is a proprietary database that contains TABLE 1 E-continued Sequences published in patents and patent publications. Patp Patp BLASTP Analysis for NOV1 results include those listed in Table 1E. 35 Sequences TABLE 1E producing High scoring Segment Protein? Length. Identity Positive Patp BLASTP Analysis for NOV1 Pairs Organism (aa) (%) (%) E Value fragment SEQ Sequences 40 ID NO: 243O3 producing High- Arabidopsis scoring Segment Protein? Length. Identity Positive thaliana Pairs Organism (aa) (%) (%) E Value patp: AAY51328 t KLIMP 1103 29 48 1.6e-11 The presence of identifiable domains in NOV1, as well as Epions 45 all other NOVX proteins, was determined by searches using Patp: AAB36227 Human 1816 29 49 8.2e-11 software algorithms such as PROSITE, DOMAIN, Blocks, kinesin-like Pfam, ProDomain, and Prints, and then determining the protein HKLP Interpro number by crossing the domain match (or numbers) patp: AAB94768 Human protein 664 29 50 6.3e-10 using the Interpro website (maintained by the European SEONO: 15849-ID 50 BioinformaticsD : : Institute, Hinxton, Cambridge, UK). H. Sapiens DOMAIN results for NOV1 as disclosed in Tables 1F, were Patp: AAYO6618 Thermomyces 784 26 46 1.4e-09 collected from the Conserved Domain Database (CDD) with lanuginosus Reverse Position Specific BLAST analyses. This BLAST Kinesin 55 analysis Software samples domains found in the Smart and motor protein Pfam collections. TL-gamma Thermomyces Table 1F lists the domain description from DOMAIN lanuginosus analysis results against NOV1. This indicates that the NOV1 Patp: AAYO1632 Amino acid 2954 38 58 1.8e-08 Sequence has properties Similar to those of other proteins sequence of 60 known to contain these domains. In a Sequence alignment sent- herein, fully conserved single residues are calculated to protein-E - determine percent homology, and conserved and "strong Xenopus sp Semi-conserved residues are calculated to determine percent Patp: AAG21666 Arabidopsis 452 30 s7 27-08 positives. The "strong" group of conserved amino acid thaliana 65 residues may be any one of the following groups of amino protein acids: STA, NEOK, NHOK, NDEO, OHRK, MILV, MILF, HY, FYW. US 6,989,232 B2 17 18

TABLE 1.F Domain Analysis of NOV1 Prodom analysis

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs. Score P(N) pram: 361 p36 (52). KINH (7) KINN (2) KF1 (2) // PROTEIN M . . . 189 3.2e-15 prom:12025 p36 (2) CYT1 (2) 7/ PROBABLE B-TYPE CYTOCHROME . . 55 O6 pram: 29378 p36 (1) RPSW STRCO / 1 RNA POLYMERASE SIGMA FAC . . 57 O 93 pram: 14019 p36 (2) CIK6 (2) // CHANNEL VOLTAGE-GATED POTA . . 49 O998 pram: 44434 p36 (1) ERY1 SACER // ERYTHRONOLIDE SYNTHASE, . . 49 O998

>pram: 361 p36 (52). KINH (7) KINN (2) KF1 (2) // PROTEIN MOTOR ATP-BINDING MICROTUBULES , KINESIN-LIKE CELL, KINESIN MITOSIS, 170 aa. Identities = 43/108 (39%), Positives = 66/108 (61%) for NOV1 : 139 to 246, and Sbjct: 61 to 166 >pram: 12025 p36 (2) CYT1 (2) // PROBABLE B-TYPE CYTOCHROME TRICARBOXYLIC ACID CYCLE ELECTRON TRANSPORT HEME TRANSMEMBRANE, 48 aa. Identities = 13/21 (61%), Positives = 15/21 (31%) >pram: 29378 p36 (1) RPSW STRCO // RNA POLYMERASE SIGMA FACTOR WHIG. TRANSCRIPTION REGULATION; SIGMA FACTOR; DNA-DIRECTED RNA POLYMERASE; DNA-BINDING.. 81 aa. Identities = 14/42 (33%), Positives = 21/42 (50%) >pram: 14019 p36 (2) CIK6 (2) // CHANNEL VOLTAGE-GATED POTASSIUM PROTEIN KV1.6 IONIC TRANSMEMBRANE ION TRANSPORT GLYCOPROTEIN 40 aa. Identities = 9/19 (47%), Positives = 13/19 (68%) >pram: 44434 p36 (1) ERY1 SACER // ERYTHRONOLIDE SYNTHASE MODULES 1 AND 2 (EC 2.3.1.94) (ORF 1) (6-DEOXYERYTHRONOLIDE B SYNTHASE I) (DEBS 1). TRANSFERASE; ACYLTRANSFERASE, ANTIBIOTIC BIOSYNTHESIS; NADP; PHOSPHOPANTETHEINE; MULTIFUNCTIONAL ENZYME, 55 aa. Identities = 14/35 (40%), Positives = 16/35 (45%) BLOCKS analysis AC# Description Strength Score

BLOO 411C Kinesin motor domain proteins. 642 283

BLOO 411B Kinesin motor domain proteins. 185 156

BLOO 411D Kinesin motor domain proteins. 217 107 BLOO853G Beta-eliminating lyases pyridoxal-phosphate a 858 105

BLOO509B Ras GTPase-activating proteins. 280 O73

BLO 1227A Uncharacterized protein family U2E0012 protei O59 O72 BLOOO94F C-s cytosine-specific DNA methylases proteins 186 O45

BLO 124 OE Purine and other phosphorylases family 2 prot 350 O39

BLOO 487G IMP dehydrogenase/GMP reductase proteins. 525 O29

BLOO 411A Kinesin motor domain proteins. 284 O 19 BLOO37 OB PEP-utilizing enzymes phosphorylation site pr 554 O15

BLOO838C Interleukins - 4 and -13 proteins. 661 011

BLOO 486A DNA mismatch repair proteins mutS family prot 290 O 10 US 6,989,232 B2 19

TABLE 1 F-continued Domain Analysis of NOV1 ProSite analysis NOV1 aa position Pattern-ID: ASN GLYCOSYLATION PS00001 (Interpro) 275 Pattern-DE: N-glycosylation site, Pattern: NIPISTIP Pattern-ID: GLYCOSAMINOGLYCAN PS00002 (Interpro) 329 Pattern-DE: Glycosaminoglycan attachment site, Pattern: SG. G Pattern-ID: PKC PHOSPHO SITE PS00005 (Interpro) 49, 226, 297 329 Pattern-DE: Protein kinase C phosphorylation site Pattern: IST.RK Pattern-ID: CK2 PHOSPHO SITE PS00006 (Interpro) 61 116, 152, 160 230 252 Pattern-DE: Casein kinase II phosphorylation site Pattern: IST. {2}IDE Pattern-ID: MYRISTYL PS00008 (Inrerpro) 12 29, 73, 124, 171 Pattern-DE: N-myristoylation site 201, 222 256 333 Pattern: GEDRKGPFYWI. {2}|STAGCNIP) Pattern-ID: ATP GTP A PS00017 (Interpro) 18O Pattern-DE: ATP/GTP-binding site motif A (P-loop) Pattern: AG. {4}GKIST

The disclosed NOV1 nucleic acid encoding a kinesin-like Specifically to any of the proteins of the invention. Also protein includes the nucleic acid whose Sequence is provided 35 encompassed within the invention are peptides and polypep in Table 1A, or a fragment thereof. The invention also tides comprising Sequences having high binding affinity for includes a mutant or variant nucleic acid any of whose bases any of the proteins of the invention, including Such peptides may be changed from the corresponding base shown in and polypeptides that are fused to any carrier particle (or Table 1A while Still encoding a protein that maintains its biologically expressed on the Surface of a carrier) Such as a kinesin-like activities and physiological functions, or a 40 bacteriophage particle. fragment of Such a nucleic acid. The invention further Kinesin family proteins are microtubule-based motor pro includes nucleic acids whose Sequences are complementary teins that drive the transport of molecular component within to those just described, including nucleic acid fragments that the cell. Translocation of components within the cell is critical for maintaining cell Structure and function. are complementary to any of the nucleic acids just described. Kinesin defines a ubiquitous, conserved family of over 50 The invention additionally includes nucleic acids or nucleic 45 proteins that can be classified into at least 8 Subfamilies acid fragments, or complements thereto, whose Structures based on primary amtino acid Sequence, domain Structure, include chemical modifications. Such modifications include, Velocity of movement, and cellular function. See review in: by way of nonlimiting example, modified bases, and nucleic Moore and Endow (1996) Bioessays 18:207-219; and Hoyt acids whose Sugar phosphate backbones are modified or (1994) Curr. Opin. Cell Biol. 6:63–68). The prototypical derivatized. These modifications are carried out at least in 50 kinesin molecule is involved in the transport of membrane part to enhance the chemical Stability of the modified nucleic bound vesicles amd organelles. This function is particularly acid, Such that they may be used, for example, as antisense important for axonal transport in neurons. Protein binding nucleic acids in therapeutic applications in a Subject. containing vesicles are constantly transported from the neu In the mutant or variant nucleic acids, and their ronal cell body along microtubules that span the length of complements, up to about 60% percent of the bases may be 55 the axon leading to the Synaptic terminal. Failure to Supply So changed. the Synaptic terminal with these vesicles blocks the trans The disclosed NOV1 protein of the invention includes the mission of neural signals. In the fruit fly Drosophila kinesin-like protein whose Sequence is provided in Table 1B. melanogaster, for example, mutations in kinesin cause The invention also includes a mutant or variant protein any Severe disruption of axonal transport in larval nerves which of whose residues may be changed from the corresponding 60 leads to progressive paralysis. See Hurd and Saxton (1996) residue shown in Table 1B while still encoding a protein that Genetics 144:1075-1085. This phenotype mimics the maintains its kinesin-like activities and physiological pathology of Some vertebrate motor neuron diseases, Such as functions, or a functional fragment thereof. In the mutant or amyotrophic lateral Sclerosis (ALS). In addition to axonal variant protein, up to about 60% percent of the residues may transport, kinesin is also important in all cell types for the be So changed. 65 transport of vesicles from the Golgi complex to the endo The invention further encompasses antibodies and anti plasmic reticulum. This role is critical for maintaining the body fragments, Such as F, or (F), that bind immuno identity and functionality of these Secretory organelles. US 6,989,232 B2 21 22 Members of the more divergent subfamilies of kinesin are acid or the protein are to be assessed. Additional disease called kinesin-related proteins (KRPs), many of which func indications and tissue expression for NOV1 is presented in tion during mitosis in eukaryotes as divergent as yeast and Example 2. human (Hoyt, Supra). Some KRPs are required for assembly Based on the tissues in which NOV1 is most highly of the mitotic Spindle. In Vivo and in vitro analyses Suggest 5 expressed specific uses include developing products for the that these KRPs exert force on microtubules that comprise diagnosis or treatment of a variety of diseases and disorders. the mitotic Spindle, resulting in the Separation of Spindle NOV1 nucleic acids and polypeptides are further useful in poles. Phosphorylation of KRP is required for this activity. the generation of antibodies that bind immuno-specifically Failure to assemble the mitotic spindle results in abortive to the novel NOV1 substances for use in therapeutic or mitosis and chromosomal aneuploidy, the latter condition 10 diagnostic methods. These antibodies may be generated being characteristic of cancer cells. In addition, a unique according to methods known in the art, using prediction KRP, centromere protein E. localizes to the kinetochore of from hydrophobicity charts, as described in the “Anti human mitotic chromosomes and may play a role in their NOVX Antibodies' Section below. The disclosed NOV1 Segregation to opposite Spindle poles. protein has multiple hydrophilic regions, each of which can AS described earlier, NOV1 Shares extensive Sequence 15 be used as an immunogen. In one embodiment, a contem homologies with kinesin family proteins, including kinesin plated NOV1 epitope is from about amino acids 50 to 80. In Superfamily protein 26A and 26B, and With kinesin-like another embodiment, a NOV1 epitope is from about amino proteins, including human kinesin-like motor protein acids 100 to 150. In additional embodiments, NOV1 (KLIMP), human kinesin-like protein (HKLP) and Thermo- epitopes are from about amino acids 190 to 200, from about myces lanuginosus Kinesin motor protein TLgamma. The 20 amino acids 205 to 275 and from about amino acids 280 to structural similarities indicate that NOV1 may function as a 330. These novel proteins can be used in assay systems for member of kinesin family proteins. Therefore, NOV1, like functional analysis of various human disorders, which will kinesin family proteins and kinesin-related proteins, may be help in understanding of pathology of the disease and asSociated with cancer, neurological disorders and disorders development of new drug targets for various disorders. of vehicular transport. Accordingly, the NOV1 nucleic acids 25 NOV2 and proteins identified here may be useful in potential A disclosed NOV2 nucleic acid of 7560 nucleotides (also therapeutic applications implicated in (but not limited to) referred to as 24CS059, CG56403-01 and 146556340) enco various pathologies and disorders as indicated herein. For dina a novel nuclear protein-like protein is shown in Table example, a cDNA encoding the kinesin-like protein NOV1 2A. An open reading frame was identified beginning, with may be useful in gene therapy, and the kinesin-like protein 30 an ATG initiation codon at nucleotides 7170-7172 and NOV1 may be useful when administered to a subject in need ending with a TGA codon at nucleotides 7476-7478. A thereof. The NOV1 nucleic acid encoding kinesin-like putative untranslated region upstream from the initiation protein, and the kinesin-like protein of the invention, or codon and downstream from the termination codon is under fragments thereof, may further be useful in diagnostic lined in Table 2A, and the start and stop codons are in bold applications, wherein the presence or amount of the nucleic letters.

TABLE 2A NOV2 nucleotide sequence. (SEQ ID NO:8) GTATTCTCAGAGCGCCAGGAGGCATCGAGCCTGTAATTTCCTGTTCTCTGAATCCCCCATCTTTCTGCAGCTCCAAGCTT

TGTGTCCCACAGCCTGTGACTCTGTGCTAACAAATCGCTATTGTCCAGTGGGGCGAATGGTGGCTGGAACTAAAGAATTGCT

GTCTGGTTTCTATTCAAATCCAGGTAGCGAGATATATGAATGGACTTTTCGAATCGTCATGTGAATAACGTCTGCTCGGCAT

GAAGGCTCAGAGCCATGCTAGGAAGGATTAACTCGTAGGCTGACCACTAACATCCTTTGTGGTACGAGGGAGAAACATTCCC

AAGTATCATTTTATTCACACTTAATTTTCTATCCCATACCCCCAAAAAAGGCTAGCTATAATTAGTTGGCGCTTTTCT

CTTAATTTTTAGGTTTCTGTTGATAATGTGTAAGTTTGGGAAAATGCTAAGTAGCTTTTCACTTAGAACACTGTATTTTC

TCTTTAAAGTTTTCTACCTTACATTTATTATAGCATAGTTATCTTTATAGCATAGATGCAGAAAGTAAGAGAGAGCTTGTTT

TTTCAAGAAAACAACCCTTTAAAATACTTTCCAACCCATGAAGGGAAAAATCCTCCTTTTTTCCCCCAAGTGCATTCTACTT

ATTACTTTGCATTTTTCTCCCAAAGTCCAAATTTATGCAAAGAAAATAGAAACAAGTTCAAATGCAATGCATTAACCAAATA

AAACAAGTCTGCTTCAAATTAGGAACCAACCTAAGCATTTGTAAAGTGTAGCAGAATCAGAATTCTTTTAAAAATTAGATTT

GGAACCTGAACTATATAATTCATAATTCTCATTTTTCTGTGGAAAATTATTTTATCTTTCTCCTGTATACCTGAAAAAATGT

CCATAGGCTTAAAGGGTCATGCTTTTACATTCCTTCCATATCACAGGTACTATGAAGTAAGGAGACTTTTAGGTTTCTTTTT

GTCTTAAACTCAGACAGCTTTGTAAGCAGTAGTGTGTAGATTACAAGAGTTAGACAAAAGCAGGCGCGACTGAGAAGAGTTG

GTGGGGGAGAAGCTTGGGGCACTTCCTGTCACTCAACACATTCCAGATCACTAAAAAATTTCCACACCCTCTGCATTCCCCC

TTGCCCACTCCAGTCCCGGTATTTTCTGATCCATATGTTGTGGTATTTACCATACTTCTCTCCCTCACTAGGCTCTGGCA

AGACTGCTTCAGAGGGGATGCATTCCTTTAGATTGCACAAAGCGGAGCTGGGAAAATGGCTGGCAGTTTCAGAATCTAGTCA,

US 6,989,232 B2 25 26

TABLE 2A-continued NOV2 nucleotide sequence.

TCATGGCACAAGGTGAAAGGGAGCTGGTGTGTGCAGAGATCACATGGTAGGAGAGGAGGAGGCAAGAGAGAGAAGAAGGAGG

TGCCAGACTACTTTAAAACCATCAGCTTTTGCAGGGAGTTATAGAGCCAGCACTCACTGACTACTGCAAGAATGGCACCAAG

ACATTCATGAGGGATCTGCCTTCATGACCCAGACACCTCCCACCAGGCCCCACCACCAACATAAGGGGTTAGATTTCAGCAT

GAGACT CAATGAGGGGGGAGCAAACAAATTACATCCAAACTGTAGCAACCACATTTTGTTTATCCATTCATCTGTCAATGGA

CACTTAAG TAGCTCCACTTTTTTGCTATCAAGACAGTTTTTCTTGACTATTCTAAAATCATGTGAGGGCTTCTTTACAGA

GCTGTTCTGACCCATCTCAGAAGCTCTTTTCACTTTATAAGTTGTAAGGGTTTTGATGGGCCTTTTAACTCTAGAGACCAGC

TAGTCCCTAACATCAGGTTTGCTAGAGAAGGGAAGATTCTTTCCAGCCTTCCTGGATGACACCTAATACATACTATATTCCT

AGTAATTCTGTTATACTTAAGATTTATGGGTTCATCTTTCCTGTTACACTGTGAGCCCTTCCTGGGCTGGGACGATGGCCAG

TTTCTCTTGAGTTGTGCCTTGTGCCTCTGTATAGGCACAGGGCCTATTATGAAGTAGATATCAATAAATATTAGTTGGAAAA

AATGTGAATTAGTAAATAATAATTTGTATTGGGTTTTTATGTGCCAGATGTTTTGAATACATTTAGCTAATTTAATCTTCAA

AACAGTCCTTTCAGATACATATTGTTATCTTCATTTAATAGATGAGGGAACTTGTCAAAGGCCTCAGAGATGTAAAATGTAT

AACTGGGATTTGAACCTTTGTTCAAATTGCTTGTTCTCGCTTGACTCAAGAGCCATTATGTTAGAGGCAGACTTCATAGTCA

GTTGATGATCAGTGGGTTTGGAAACATGAAATTTAGCTCAGGCATCGGCTCCAAATTAAATACTCTTTCATTGGGCATTAGG

AACTATACCCTTCTGATATGGCTCATGAATGGATGCTCAGAGGAAAGCTTGGCTCGTTAGTTACTTGGACCTTTTATAGGGA

CTTTAGCTGAACAACTAATTGCTGAACTCAGTTGGCAAAGGCTCTTCTGTGGGTAAATCCTCTTTCACATGTTATTTTGAAA

GTGCAGTTAAATTCTAACATACATGATGTGGCCCTGGAATGGATGCATCAGTTTTCTTTATTCTGTTTGTTTGGCAGGTGTG

TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTACAAAAAAAAAAAATGTATGTATAAAAGCAACCAGTATCTAGGTATCAG

GAACAAAACAAAGGTTTTTATGGAGCTTACATTCTAATGGGGAGACAGAAAAATGAATTCTCAAAGTACTATGAAGTGAAAC

ATGAAGCTACACTGTGAAGAAAATAGGGTAGTGTGGTGATGGAGAATGACTGACTGGTGGGATGTGGTGGATTGGGAGACAT

CTTGAATGAGGAAGTATCGGGCTATGCCTCTCTGAGGAACCAAAGTATGCAAGCTGAGAGCCAAGTCATGACATGAAGAACC

TCAGCCTACAAAGAGCCAGAAGAATGAACTGGGTAGTGGCAACAAGAAATGCAAGAGCTCTCATGTGGGATTGAGCTTAGTG

TGCTTGAGGAGCCAAAAGGGTAGTATGGCTAAAATGGAGTGAATGCAAGTAGGGGTGATGTTGGAGAGGTGGGATGGGGCCC

TATCACATAGGACCTTGAAGCTATAGTAAGAAATTTGGGTTTTTTCCAAGTGTATTTTTTCCCAAATTTGTTTTTTTCCCC

CCAAATAGTAGGACATTGGAAGGTTTTAAGCAGAATGGTAACTTGTTCTGCAGGCCGAAGAAGTCCTTGTGTGCAGTTCTTG

TCTATGTTTAGTCCTCTGAGGCCCCCTTGACACTATCTTTAACGGGGTTCCTCCCAAGCTGAGAACTTGCCAAGGTTCTC

ACATGTCAGTGGCCACCTTTGAGTGTCCTAGAAGAATCATATTTCTTTTATAACCATTTTGGGGCTAACATTGGTTTCATTG

CCCTTTCCACAACAGAGAGGGTTTGTTCAACGAGAGCTTCTTCCAGCATTTTCATACATCACTGTTGCCTGGGTAGGGTTTT

GCAGCCTGATTCTCTGTATTAATTTAGGATAAAATTCAGTTATTAATTAGACCTGATCTTTCTTTGTCAATAATTTAGAAGC

ATATGTCCTCGGCACATAATGTTGGCTGACTGTTTGGTTAATAATATGTTCTTGAAGACATACTTCTGGAAATCTGAAATTG

ATAAGTGAAGAGGAACTTTCTTACTATTCATAAATAAGGTTGTATTCAGCTATTCTGACTCTAGTAGGGTTAATTGCTAACA

TTTGACCTACATTATTTTATTTTTTCAATTTCTCAAAAACTCTGAAAAGTATAGGCCAGGGGCCTTGGCTCATGCCTGTAAT

GCCAGTGCTTTGGGACGCCATGGTGGAAGGATTGCTTGAGGCCAGGAGTTCGAGACCAGCCTTAGCAACATAGTAAGACCCC

CATATCTACAAAAAATAAATTTGCCTGGCTTGATGATATGTGCCTGTAGTTCTAGTTACTTGTGAGGGTGAGGAGAGAGGGT

CACTTGAGTGCAGGAGTTCAAGGCTGCAGTGAGCTAGATGATGCCACCATACTCCAGGATGGTGACAGAGACTCTGTCTCT

TAAAAAACAACAACAAAACAAACCTCTGACAAATACAGAAAATAACAGCATACACCTGATAGTCCCATTTTATAGGCAAGTG

ACATCTAGTATTTTCATAGTAAAATATCATGTAGTGTCATCTGATACTTTCTTCTTTTTACTAAAAAAAAAAAAAAGTTACT

TGCAAGCTACTCAGTTGATTTCACAGCTTACTGAAGGGGCAGCCAGAACTTTGGAAAGCACAAAAGGTGAGAAAACTGAGGC US 6,989,232 B2 27 28

TABLE 2A-continued NOV2 nucleotide sequence.

TCTGGTGGTTAAATGACTTGTCCAGTGTCACATAGCAAGGAAGAGGCAGAGCTGAGACTTGAACCAGAGCTTGATTCCAAAG

TTCTTGCTCGTACTAT

The NOV2 nucleic acid was identified on chromosome 9 10 complete the gene sequence. The DNA sequence was then by comparing the sequence to public databases. The NOV2 manually corrected for apparent inconsistencies thereby nucleic acid maps to the 9q33-34 locus, a region associated obtaining the Sequences encoding the full-length protein. with endotoxin hyporesponsiveness (OMIM 603030), adrenocortical insufficiency without ovarian defect (OMIM 184757) and other diseases/disorders. Single nucleotide 15 A disclosed NOV2 polypeptide (SEQID No.9) encoded polymorphisms were identified for NOV2, as described in by SEQ ID NO:8 has 102 amino acid residues and is Example 3. It was found that NOV2 had homology to the presented in Table 2C using the one-letter amino acid code. nucleic acid sequences shown in the BLASTN data listed in SignalP, Psort and/or Hydropathy results predict that NOV2 Table 2B. has no known signal peptide and is likely to be localized in

TABLE 2B

BLASTN results for NOV2 Gene Indexi Protein? Begin- Length. Identity Identifier Organism End (nt) (%) Expect AL158075 Human DNA sequence from 1–756O 102867 7560/7560 O.O clone RP11-348K2 on 3799–4086 (100%) chromosome 9q33.1-34.13, 4584-4654 complete sequence. 6/2001. 5736-5773 Strand = Plus/Minus 6954–7071 7003-7071 AKO21895 Homo sapiens cDNA FLJ11833 1-2237 2237 2234/2237 O.O fis, clone HEMBA1006579. (100%) 9f2OOO.

BLASTN homology of NOV2 to the GenBank Acc. No. the nucleus with a certainty of 0.300. In alternative AL158075 genomic clone in Table 2B depicts a proposed embodiments, a NOV2 polypeptide is located in the mito exon and intron structure for the NOV2 gene, which is most 40 chondrial matrix space with a certainty of 0.100, in a likely encoded on the AL158075 clone minus strand. The lysosome (lumen) with a certainty of 0.100, or in a micro NOV2 nucleic acid is likely to be expressed in 10 week body (peroxisome) with a certainty of 0.0101. NOV2 has a embryo and whole embryo, mainly head, based on its molecular weight of 11700.6 Daltons.

TABLE 2C Encoded NOV2 protein sequence. (SEQ ID NO:9) MMMPPYSRMVTETLSLKKQQQNKPLTNTENNSIHLIVPFYRQWTSSIFIVKYHVVSSDTFFFLLKKKKSYLOATOLISQLT

EGAARTILESTKGEKTEALWWK homology to GenBank Acc. No. AKO2 1895. GenBank 55 No sequences were found in the EMBL, PIR or GenBank AKO21895 disclosed in September 2000, has homology to databases that had homology to the NOV2 polypeptide in an unfiltered BLASTP search (expectation value=1.0 for input the 5' untranslated NOV2 sequence. parameter). Exons were predicted by homology and the intron/exon The presence of identifiable domains in NOV2, as well as boundaries were determined using Standard genetic rules, as 60 all other NOVX proteins, was determined by searches using described in Example 1. Exons were further selected and software algorithms such as PROSITE, DOMAIN, Blocks, refined by means of Similarity determination using multiple Pfam, ProDomain, and Prints, and then determining the Interpro number by crossing the domain match (or numbers) BLAST (for example, tRlastN, BlastX, and BlastN) using the Interpro website (maintained by the European Searches, and, in Some instances, GeneScan and Grail. 65 Bioinformatics Institute, Hinxton, Cambridge, UK). Expressed Sequences from both public and proprietary data DOMAIN results for NOV2 as disclosed in Tables 1E, were bases were also added when available to further define and collected from the Conserved Domain Database (CDD) with US 6,989,232 B2 29 30 Reverse Position Specific BLAST analyses. This BLAST cent homologies of NOV2 to the domains found in the analysis Software samples domains found in the Smart and BLASP analyses. Homology to one or more domains indi Pfam collections. cates that the NOV2 Sequence has properties Similar to those Table 2D lists the domain description from DOMAIN of other proteins known to contain these domains, and is a analysis results against NOV2. Table 2E provides the per- likely phosphoprotein.

TABLE 2D Domain Analysis of NOV2 PRODOM Analysis

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) pram: 38396 p36 (1) DRTS PLAFK-DIHYDROFOLATE REDUCTASE . . . 51 O37 pram: 48689 p36 (1) Y360 MYCGE-HYPOTHETICAL PROTEIN MG3 . . . 51 O37 pram: 55080 p36 (1) DPOM PODAN-PROBABLE DNA POLYMERASE . . . 61 O 69 pram: 16122 p36 (2) PHAC (1) PHBC (1)-POLYMERASE SYNTHAS . . . 46 0.84 pram: 24351 p36 (1) RS6 HAEIN-30S RIBOSOMAL PROTEIN S6 . . . . 46 0.84 BLOCKS Protein Domain Analysis AC# Description Strength Score

BLOO243G Integrins beta chain cysteine-rich domain pro 1511 1011 BLOO951C ER lumen protein retaining receptor proteins. 1661 10O2

BLO 1081 Bacterial regulatory proteins, tetR family pr 1354 1002

BLOO 126A 3'5'-cyclic nucleotide phosphodiesterases pro 1312 1 OOO

BLOO764A Endonuclease III iron-sulfur binding region p 1181 1 OOO ProSite Protein Domain Analysis AA of NOV2 (SEQ ID NO: 4)

Pattern-ID: ASN GLYCOSYLATION PS00001 (Interpro) 30 Pattern-DE: N-glycosylation site Pattern: NIPISTIP Pattern-ID: CAMP PHOSPHO SITE PS00004 (Interpro) 66 Pattern-DE: cAMP- and cqMP-dependent protein kinase phosphorylation site Pattern: RK{2}.IST Pattern-ID: PKC PHOSPHO STTE PSOOOO5 (Interpro) 15, 90 Pattern-DE: Protein kinase C phosphorylation site Pattern: IST.RK Pattern-ID: CK2 PHOSPHO SITE PS00006 (Interpro) 26, 91 Pattern-DE: Casein kinase II phosphorylation site Pattern: IST. {2}IDE Pattern-ID: MYRISTYL PS00008 (Interpro) 83 Pattern-DE: N-myristoylation site Pattern: GEDRKHPFYWI. {2}|STAGCNIP) US 6,989,232 B2 31 32

TABLE 2E

ProDom results for NOW2

ProDom Length Identity Positive Identifier Protein/Organism (nt) (%) (%) Expect pram: 38396 p36 (1) DRTS PLAFK- 52 11/41 24/41 O. 46 DIHYDROFOLATE REDUCTASE (EC (26%) (58%) 1.5.1.3) / THYMIDYLATE SYNTHASE (EC 2.1.1. 45) (DHFR-TS). MULTIFUNCTIONAL ENZYME; OXIDOREDUCTASE; TRANSFERASE; NADP; METHYLTRANSFERASE; NUCLEOTIDE BIOSYNTHESIS; ONE CARBON METABOLISM pram: 48689 p36 Y360 MYCGE-HYPOTHETICAL 38 14/34 19/34 O. 46 PROTEIN MG360 (41%) (55%) pram:55080 p36 (1) DPOM PODAN-PROBABLE 135 14/60 28/60 1 - 2 DNA POLYMERASE (EC 2.7.7.7) DNA (23%) (46%) DIRECTED DNA POLYMERASE pram: 16122 p36 (2) PHAC (1) PHHC (1)- 55 14/37 20/37 1.8 POLYMERASE SYNTRASE PHA POLY 3 (37%) (54%) HYDROXYALKAKOATE PHA-POLYMERASE POLYHYDROXYALKANOIC ACID BIOSYNTHESIS TRANSFERASE pram 24351 36 (1) RS6 HAEIN // 30S 35 10/23 14/23 1.8 RIBOSOMAL PROTEIN S 6. RIBOSOMAL (43%) (60%) PROTEIN; RRNA-BINDING

Other BLAST results include sequences from the Patp Such modifications include, by way of nonlimiting example, database, which is a proprietary database that contains modified bases, and nucleic acids whose Sugar phosphate Sequences published in patents and patent publications. Patp backbones are modified or derivatized. These modifications results include those listed in Table 2F. are carried out at least in part to enhance the chemical

TABLE 2F Patp alignments of NOV2

PatP Length Identity Positive Identifier Protein/Organism (nt) (%) (%) Expect AAB 43292 Human ORFX OORF3056 polypeptide 110 69/101 77/101 3.4 e-29 sequence SEQ ID NO: 6112, (68%) (76%) PN = WO2OOO58473-A2 AAGO2872 Human secreted protein, SEQ ID 144 60/101 73/10 1 1.1e-25 NO: 6953, PN = EP1033401-A2 (59%) (72%) AAR9079 Respiratory Syncytial Virus 61 15/30 17/30 2.1 antigenic fragment 30 (50%) (56%) AAR97084 Respirar ory Syncytial Virus 51 15/30 17/30 2.1 antigenic fragment 35 (50%) (56%) AAR9708O Respiratory Syncytial Virus 59 15/30 17/30 2.1 antigenic fragment 31 (50%) (56%) AAR9081 Respiratory Syncytial Virus 57 15/30 17/30 2.1 antigenic fragment 32 (50%) (56%) AAR97082 Respiratory Syncytial Virus 55 15/30 17/30 2.1 antigenic fragment 33 (50%) (56%) AAR97083 Respiratory Syncytial Virus 55 15/30 17/30 2.1 antigenic fragment 34 (50%) (56%)

The disclosed NOV2 nucleic acid encoding a nuclear 55 stability of the modified nucleic acid, such that they may be protein-like protein includes the nucleic acid whose used, for example, as antisense binding nucleic acids in Sequence is provided in Table 2A, or a fragment thereof. The therapeutic applications in a Subject. In the mutant or variant invention also includes a mutant or variant nucleic acid any nucleic acids, and their complements, up to about 67% of whose bases may be changed from the corresponding percent of the bases may be So changed. base shown in Table 2A while still encoding a protein that The disclosed NOV2 protein of the invention includes the maintains its nuclear protein-like activities and physiologi 60 nuclear protein-like protein whose Sequence is provided in cal functions, or a fragment of Such a nucleic acid. The Table 2B. The invention also includes a mutant or variant invention further includes nucleic acids whose Sequences are protein any of whose residues may be changed from the complementary to those just described, including nucleic corresponding residue shown in Table 2B while still encod acid fragments that are complementary to any of the nucleic ing a protein that maintains its nuclear protein-like activities acids just described. The invention additionally includes 65 and physiological functions, or a functional fragment nucleic acids or nucleic acid fragments, or complements thereof. In the mutant or variant protein, up to about 66% thereto, whose structures include chemical modifications. percent of the residues may be So changed. US 6,989,232 B2 33 34 The invention further encompasses antibodies and anti protein-like protein of the invention, or fragments thereof, body fragments, Such as F, or (F), that bind immuno may further be useful in diagnostic applications, wherein the Specifically to any of the proteins of the invention. presence or amount of the nucleic acid or the protein are to The above defined information for this invention Suggests be assessed. that this nuclear protein-like protein (NOV2) may function as a member of a nuclear protein family. Therefore, the NOV2 nucleic acids and polypeptides are further useful in NOV2 nucleic acids and proteins identified here may be the generation of antibodies that bind immuno-specifically useful in potential therapeutic applications implicated in (but to the novel NOV2 substances for use in therapeutic or not limited to) various pathologies and disorders as indicated diagnostic methods. These antibodies may be generated herein. The potential therapeutic applications for this inven according to methods known in the art, using prediction tion include, but are not limited to: cancer research tools, for from hydrophobicity charts, as described in the “Anti all tissues and cell types composing (but not limited to) those NOVX Antibodies' Section below. The disclosed NOV2 defined here, including cancerous and normal tissue, endot protein has multiple hydrophilic regions, each of which can oxin hyporesponsiveness (OMIM 603030), adrenocortical be used as an immunogen. In one embodiment, a contem insufficiency without ovarian defect (OMIM 184757) and 15 plated NOV2 epitope is from about amino acids 10 to 38. In other diseases/disorders. another embodiment, a NOV2 epitope is from about amino The NOV2 nucleic acids and proteins of the invention are acids 55 to 102. These novel proteins can be used in assay useful in potential therapeutic applications implicated in Systems for functional analysis of various human disorders, cancer including but not limited to and/or other pathologies which will help in understanding of pathology of the disease and disorders. For example, a cDNA encoding the nuclear and development of new drug targets for various disorders. protein-like protein (NOV2) may be useful in cancer NOV3 therapy, and the nuclear protein-like protein (NOV2) may be A disclosed NOV3 nucleic acid of 7380 nucleotides (also useful when administered to a subject in need thereof. By referred to as 24SC113) encoding a novel LIM-domain way of nonlimiting example, the compositions of the present containing Prickle-like protein is shown in Table 3A. An invention will have efficacy for treatment of patients suffer 25 open reading frame was identified beginning with an ATG ing from diseases including but not limited to endotoxin initiation codon at nucleotides 1991 to 1993 and ending with hyporesponsiveness and cancer. The NOV2 nucleic acid a TGA codon at nucleotides 2951 to 2953. The start and stop encoding nuclear protein-like protein, and the nuclear codons are in bold letters in Table 3A.

TABLE 3A NOV3 nucleotide sequence. (SEQ ID NO: 10) GTGAGTCAGGGAGGAGAAAGGTAGGCTGCTTGGGCCGGTGGCCTTTTGTTCTTGCAATTCTCTTCTTCTC

CCTAATTTCTGGTTCATTGCCTCTTTAGACAAGTCTCCAGAAGTTCTTCCTTGAAAGTCCAGGCTCAGGA

ACTCTCAGCCACTGAAGATAAAGGCCACATTAGTCCCTTTTTCTGGGAAGCCGTGTATCATTACGCATCA

GGAGAATGCAGGGGTCCTGGTCCACCCTACAGTCATAGCTTGAGGCTATATTCCCAGCAGGCTCTCCCCA

CGGGAAGGGGCCCCAGCAGCTCCCAGTTTCCATTCTGCCAGTTTTACTGCTGCTATAAAAAGAGCCTGCT

GTGTGACTGCCTTAGCAAAAGTCCTGCCTTAGAAAAAGCAATGAGAGGTGTTGGCTTAGTGCAGGTCACT

TGCCCACCCCTGAATCAGTCCCTGGGTGCCAGGAGAGCAGATTTTTTTTGCTGGCCTATGTTGGGCCCCA

GATCAGCTTTTGCCCCACCCAAAGCTCACGGCCTGAAGATGGCAGGGAAATGGTGTCCCACAGGGAGAGG

AAGTCCTATAACCAGAAGAGGGCAGAGATGATGAGAAGGCAGAACCCCTGGGGCTGTGGGAGGCTCCCTT

AGTACGCAGTGTGGCCAGGCTATATAAACCTGGCGCAGGCCTGTCACAGGGAGGAATCGTACCTCTTCCT

TCCCTGATGAAATTAAGCAAAGGGTACTTACGCTCCCAGAGGGGCAGTAGCTTTGGCAATACCGTGTCTA

GGTTTTTCTTTACCGAAAGCAGATTTTTCCTTAACAAGAGTTGAAATCCACATTTTTATTTCCCACTAAG

TCTGTTGAGACTGCTTTAACGGAATAGCACAGACTGGGTGGCCTCTGAGTAACAGAAATGTATTGCTGAC

AGTTCTGAAAGCTGGGAAGTTCAAACTCAAGGCACCAGCAAATGCAGTGTCTGCTGAGGGCCTGTTTTTT

GTTTCCTGGATGATACTTTCTGGCAGAGTCATCATATAGTGGAAGGAGCAAACAGGCTCCCTTGGGCCTC.

TGTTATAAGGGCACTAATCTCATTCATGAGGTATCCACTCTCATGACCTAGTCACCTCCCAAAAAGCTCC

ATCTCCTAATGCCATCACTTTAGGATTTAGGTGTTAAACTTAGGAGTTCTGAAGAAAACATTCACCATAG

CATCCACTGAGTTGCTGCTGTGACTTACCCATTGGAATAGCATATGCTAGTAATGGGATTCACTCGATCT

ATCTACACACAAAGAGCCCTGTCATACACCAGGCCATGTTCCAGGTCCTGGAGATGCTGTAGAAACTCAA

TGAGTCTGTCCTCATAGAGCTTCACTTTTAGCGGGGGAGAGAAATAATAAACAGATGCATGTATATACTG

US 6,989,232 B2 37 38

TABLE 3A-continued NOV3 nucleotide sequence.

GACAGGCAGGCTGAAAAGAACAGAAGTAGAGAGAGAGAGATAATGGCATGCTTCTCTCTCCAGTGAAGTT

GTCCAGCTGGTTTTGTGTGCGTGGGAAGACTGATGTTGGCCAGGCATGGTGGCTCATGCCTGTAATTTCA

GCACTTTGGGGAGGCCAAGGCAGGAGGATCACTTGAGGCCAGGAGTTGGAGACCAGCCTGGGCAACCATA

GTGAGACTCTGTCTCTACAAACATATATGTGTGTGTATATATATAAAATATATAGCGTGTGTATATATAT

ATCATATATAATATATATTGTGTGTATATATAATATATAAATATATATGATATAATATATACAAATGTGT

TATATATATATATATAAATTAGCTGGACTTGGTGGCACATGCTCATAGTCCCAGCTACTTAGGAGACTAA

AGCAGGAGGATCACTTGAGCCCAGGAAGTTGAGGCTGAACTAAGCAATGATCCCACCTCTGCACTCCAGC

CTGGGCAGCAGAGTGACAACCTGTCTCTAGAAAAAAAAAAAAAAAAATTTAATATTATTGATTTAATATT

TTAAACATTATTTAAAAAATATTTTTAAATGTGGGAAAAAATAGAGTAACGTAGATTTTCTCTGTGATAG

TGCTACTTAAAGCAGAATCTGAGGATAACACTGGCTGAGAACTATCACCCATCAGCAGTGAGATTAGTAC

TTAACACCTATCAGCAGCGAGATTAGTACTGAAACTGGAAGTGTTAGAAACTTATAGCAGTTCGATGTTG

CGGTGCCATCCAAGTGCGTTTTCAGCAGGCTTGTCTTATTGATCAGGTTATAGACCCATCAGGGTGTTAT

AGAACT CACATACTGAGCTCTTTGTGCTTTGTGCTGTGTCTCAGACATGCTCAGCAGGGCCATATGTCGG

TCCACAAGGGATTGAAAATGAAAACAAACTGGTCCTTCACCACTGATAGCTTGAGAAGAGTAGCGCTCTA

AGATGTGCTAAGTATATCTGCCCCTTTGTGGGCAAGGTACCAGAGGAGGGAGATATACGTCTGCCCCTTA

CAGCAAGGATTCCATAGCCGATGGTGTCTGGATAGAGACTGTGATAATGTTAGCCCCATTTGAAGGGGAC

GGCCACTGCTCAGCTCCAGCTGCTTGTTGCCATGGCTGGGAATTAGACCACCTAACCTTTATAT

AGCTCTTGCAATGTGTCAAACATTGTTCTGAGCACGTCATAAATATTAGCTTGCTTAATTACATTGTCAT

AACACTGTGAGGGAGGAATATTGTTATGATTCTCATTTCAGAGTTGAAGAAACAGAAATGGAGAGGTTGA

GGGACT CACCCAAAGTCACTCAGCTTTCAGAGTGGTAGAGCAGGGATTTGAACCTGTGCATATGATTTCA

GAACCTTGCTCTTAATCACACCAGGCTGCCAGTCTAATACAAGCCCCATCCTGTCAGATCTTCCAGTTTT

TCCAGAGAAGTTAAAAATGTGGATTTTTAAAAATATGAAATCTATTTCA ACACTGCTAGACAAACAAAAT

GAGGCTCTGAGTTGTAGCTTGTCCATGCAGGGGTTTTACTTTCTATCCTCCCAAATACATCCACACT

GTGTTCCCATTTGTCCAAGAACAAAGAGTAGATATCCTCATCCCCATGTTTCAGATGGAAAAAAAAAAAA

AAAATGAGGCCTTGGTGACTAAGCGCCTTGCCTGATGTCTTAGAAGGGAGCAATTAGTGCAGAGTGATGA

CTGCCTGCTTCCAGCCCAGGTTATGTTATTCTCGAAAGATTTATGTGCTATAATTATTTAAGAGGACAGC

AGATAAATATATACTTCAGCCTCTGAAGAAGAGTTTCTCAAAGCTAGACCACCTGCATTAGAATCATGGG

TGTGCTTGATTCAAACATAGGCTCCTGGGCCTCCCCCTAACCCCTTGCATCAGAACTCTACAGAGGTGGG

GCCCAGGAATCTGCATGTTAAGCAGATCTCTGCTGAGGCTGATGTGCACCATTGTCTGAGGGGAGATGTG

CCTGGGTTTGTCTGCTCTGACTGTATCATCCTCACGTTGTGGCTCATGAGGAAATCAGAAGGGCTAGAGG

TTGAGGAATGCTGGAAAGGGCAAGTGAGGAAGACACTCAATTTCCATTCCTAAGGAGGGAGTGGACGCGG

TTTCCATTCCTAAAGAAGACATCATGGGAGATTTACTCTCATGATTTTCTAGGATCCTTGGGCAAAGCAA

CTAATGCCCCTTTGCCTCAGATTTTTGGGAAGCAACCCTGGCCATGCCTGATAAAACTGAGGGAAAAAAA

CTCCTGAGATCAGCACTGTCTAATATGGCAGCCATATGGGGCTGTGGAAATTTAAACGAATTAAAATTAA

ATGAAATTAAAATTTCAGGCCATTAGTTGCACTAGACACATTTTAAGTACT CAACAGCAATGGCCTGAAG

TTTAAATTTTATTTAATTTTAATTCTTTTAAATTCAATAGCCTCCTGGGCTAGAGGTGACCCTGCTAG

AAGGTGCAGATGACAGAGTGAACTGATAAGATGGGCACGATATTAAGCCATCATTAGTCTCTGAAGTTCT

TACAGAGCCCTAATTTTTTGTCTTTCTAATAATAATAGTTAGGATTACTGGTCTGGAGTCACACT US 6,989,232 B2 39 40

TABLE 3A-continued NOV3 nucleotide sequence.

GCTGGGATGAGATCAAGCCTTCATCATTTAGGAGTTGTGTGGCCTTGAACAAGTCACTTAAACTCTGCAA

AACT CAATTTCCTCATCCATGGAATTTTGTGAATAAGTGGATAAAGGTGTTCCTGTAGTACTTCCTTTGT

ATAGCTTTGGTGAGGGTTAAATGATAATTGCGTTTAAAATCATTAATATAGTCTTTGACACATATGACCT

TCTATAATGGTTACCTGCGACTTTTTATTATATTAATTCTTTCTCCTCCCAAACACACTGATTCAAGT

TTGACCTGTTGTGGCTACTAACTTCTCCCACCACCACCAGCTGGCAGGTTTGCATTTTAGATTTGAAA

ATACTCCTGCATGGGCCAGGCGTGGTGGCTCACACCTGTAATCTCAACACTTTGGGAGGCCAAGGCAGGT

GGATCACTTGAGGCCAGAAGTTCAAGACCAGCCTTGCCAACGTGGCAAAACCCCGTCTCTACTAAAAATA

CAGAAATTAGCCAGGCATGGTGGTGCATGACTGTAGTTCCAGCTTTTTGGGAGGCTGAGGCACAAGAATC

ACTTGAACCCAGGAGGCGGAGGTTTCAGTG

The NOV3 nucleic acid was identified on chromosome 3. presented in Table 3B using the one-letter amino acid code. This information was assigned using OMIM, the electronic SignalP results predict that NOV3 contains no known signal northern bioinformatic tool implemented by CuraGen peptide. Psort and/or Hydropathy results predict that NOV3 Corporation, public ESTs, public literature references and/or is likely to be localized extracellularly with a certainty of genomic clone homologies. This was executed to derive the 25 0.3700. In an alternative embodiment, NOV3 is likely to be chromosomal mapping of the SeqCalling assemblies, localized to the lysosome lumen with a certainty of 0.1900, Genomic clones, literature references and/or EST Sequences or to the endoplastic reticulum membrane with a certainty of that were included in the invention. 0.1000, or to the endoplastic reticulum lumen with a cer A disclosed NOV3 polypeptide (SEQ ID NO:11) encoded tainty of 0.1000. NOV3 has a molecular weight of 35510.0 by SEQ ID NO:10 has 320 amino acid residues and is Daltons.

TABLE 3B Encoded NOW3 protein sequence. (SEQ ID NO: 11) MCFTSQCRALSSSSASPMSLIFLGIDQGQMTYDGOHWHATETCFCCAHCKKSLLGRPFLPKQGQIFCSRACSAGEDPNGSD

SSDSAFONARAKESRRSAKIGKNKGKTEEPMLNQHSQLOWSSNRLSADVDPLSLQMDMLSLSSOTPSLNRDPIWRSREEPY

HYGNKMEONOTQSPLOLLSQCNIRTSYSPGGOGAGAQPEMWGKHFSNPKRSSSLAMTGHAGSFIKECREDYYPGRLRSQES

YSDMSSQSFSETRGSIQWPKYEEEEEEEGGLSTQQCRTRHPISSLKYTEDMTPTEQTPRGSMESLALSNATGRFCSP

The reverese complement for NOV3 is presented in Table 3C.

TABLE 3C Reverse complement of the NOV3 sense strand. (SEQ ID NO: 12) CACTGAAACCTCCGCCTCCTGGGTTCAAGTGATTCTTGTGCCTCAGCCTCCCAAAAAGCTGGAACTACAGTCATGCACCAC

CATGCCTGGCTAATTTCTGTATTTTTAGTAGAGACGGGGTTTTGCCACGTTGGCAAGGCTGGTCTTGAACTTCTGGCCTCA

AGTGATCCACCTGCCTTGGCCTCCCAAAGTGTTGAGATTACAGGTGTGAGCCACCACGCCTGGCCCATGCAGGAGTATTTT

CAAATCTAAAATGCAAACCTGCACAGCTGGTGGATGGTGGGAGAAGTTAGTAGCCACAACAGGTCAAAACTTGAATCAGTG

TGTTTGGGAGGAGAAAGAATTAATAATAATAAAAAGTCGCAGGTAACCATTATAGAAGGTCATATGTGTCAAAGACTATAT

TAATGATTTTAAACGCAATTATCATTTAACCCTCACCAAAGCTATACAAAGGAAGTACTACAGGAACACCTTTATCCACTT

ATTCACAAAATTCCATGGATGAGGAAATTGAGTTTTGCAGAGTTTAAGTGACTTGTTCAAGGCCACACAACTCCTAAATGA

TGAAGGCTTGATCTCATCCCAGCAAGTGTGACTCCAGAACCAGTAATCCTAACTATTAATTAATTAGAAAGACAAAAAATT

AGGGCTCATGTAAGAACTTCAGAGACTAATGATGGCTTAATATCGTGCCCATCTTATCAGTTCACTCTGTCATCTGCACCT US 6,989,232 B2 41 42

TABLE 3C-continued Reverse complement of the NOV3 sense strand.

TCTAGCAGGGTCACCTCTAGCCACAGGAGGCTATTGAAATTTAAAAGAATTAAAATTAAATAAAATTTAAACTTCAGGCCA

TTGCTGTTGAGTACTTAAAAGTGTCTAGGCAACTAATGGCCTGAAATTTTAATTCATTTAATTTTAATTCGTTTAAAT

TTCCACAGCCCCATATGGCTGCCATATTAGACAGTGCTGATCTCAGGAGTTTTTTTCCCTCAGTTTTATCAGGCATGGCCA

GGGTTGCTTCCCAAAAATCTGAGGCAAAGGGGCATTAGTTGCTTTGCCCAAGGATCCTAGAAAATCATGAGAGTAAATCTC

CCATGATGTCTTCTTTAGGAATGGAAACCGCGTCCACTCCCTCCTTAGGAATGGAAATTGAGTGTCTTCCTCACTTGCCCT

TTCCAGCATTCCTCAACCTCTAGCCCTTCTGATTTCCTCATGAGCCACAACGTGAGGATGATACAGTCAGAGCAGACAAAC

CCAGGCACATCTCCCCTCAGACAATGGTGCACATCAGCCTCAGCAGAGATCTGCTTAACATGCAGATTCCTGGGCCCCACC

TCTGTAGAGTTCTGATGCAAGGGGTTAGGGGGAGGCCCAGGAGCCTATGTTTGAATCAAGCACACCCATGATTCTAATGCA

GGTGGTCTAGCTTTGAGAAACTCTTCTTCAGAGGCTGAAGTATATATTTATCTGCTGTCCTCTTAAATAATTATAGCACAT

AAATCTTTCGAGAATAACATAACCTGGGCTGGAAGCAGGCAGTCATCACTCTGCACTAATTGCTCCCTTCTAAGACATCAG

GCAAGGCGCTTAGTCACCAAGGCCTCATTTTTTTTTTTTTTTTCCATCTGAAACATGGGGATGAGGATATCTACTCTTTGT

TCTTGGACAAATGGGAACACAGATGTGGATGTATTTGAGGAGGATAGAAAGTAAAACCCACTGCATGGACAAGCTACAACT

CAGAGCCTCATTTTGTTTGTCTAGCAGTGTTGAAATAGATTTCATATTTTTAAAAATCCACATTTTTAACTTCTCTGGAAA

AACTGGAAGATCTGACAGGATGGGGCTTGTATTAGACTGGCAGCCTGGTGTGATTAAGAGCAAGGTTCTGAAATCATATGC

ACAGGTTCAAATCCCTGCTCTACCACTCTGAAAGCTGAGTGACTTTGGGTGAGTCCCCAACCTCTCCATTTCTGTTTCTT

CAACTCTGAAATGAGAATCATAACAATATTCCTCCCTCACAGTGTTATGACAATGTAATTAAGCAAGCTAATATTTATGAC

GTGCTCAGAACAATGTTTGACACATTGCAAGAGCTATATAAAGGTTAGGTGGATACATAAATATCCCAGCACATGGCAACA

AGCAGCTGGAGCTGAGCAGTGGCCGTCCCCTTCAAATGGGGCTAACATTATCACAGTCTCTATCCAGACACCATCGGCTAT

GGAATCCTTGCTGTAAGGGGCAGACGTATATCTCCCTCCTCTGGTACCTTGCCCACAAAGGGGCAGATATACTTAGCACAT

CTTAGAGCGCTACTCTTCTCAAGCTATCAGTGGTGAAGGACCAGTTTGTTTTCATTTTCAATCCCTTGTGGACCGACATAT

GGCCCTGCTGAGCATGTCTGAGACACAGCACAAAGCACAAAGAGCTCAGTATGTGAGTTCTATAACACCCTGATGGGTCTA

TAACCTGATCAATAAGACAAGCCTGCTGAAAACGCACTTGGATGGCACCGCAACATCGAACTGCTATAAGTTTCTAACACT

TCCAGTTTCAGTACTAATCTCGCTGCTGATAGGTGTTAAGTACTAATCTCACTGCTGATGGGTGATAGTTCTCAGCCAGTG

TTATCCTCAGATTCTGCTTTAAGTAGCACTATCACAGAGAAAATCTACGTTACTCTATTTTTTCCCACATTTAAAAATATT

TTTTAAATAATGTTTAAAAATAAACAATAAATAAATTTTTTTTTTTTTTTTCTAGAGACAGGTTGTCACTCTGCT

GCCCAGGCTGGAGTGCAGAGGTGGGATCATTGCTTAGTTCAGCCTCAACTTCCTGGGCTCAAGTGATCCTCCTGCTTTAGT

CTCCTAAGTAGCTGGGACTATGAGCATGTGCCACCAAGTCCAGCTAATTTATATATATATATATAACACATTTGTATATAT

TATATCATATATATTTATATATTATATATACACACAATATATATTATATATGATATATATATACACACGCTATATATTTTA

TATATATACACACACATATATGTTTGTAGAGAGAGAGTCTCACTATGGTTGCCCAGGCTGGTCTCCAACTCCTGGCCTCAA

GTGATCCTCCTGCCTTGGCCTCCCCAAAGTGCTGAAATTACAGGCATGAGCCACCATGCCTGGCCAACATCAGTCTTCCCA

CGCACACAAAACCAGCTGGACAACTTCACTGGAGAGAGAAGCATGCCATTATCTCTCTCTCTCTACTTCTGTTCTTTTCAG

CCTGCCTGTCTCTCCTTGTCTAGCCAAACATTTCTTAGTCTTACTGCACTTGAAGCCAGTTCCCTGGTGAGAAACA

GGTTCAGAACAGTCTCAGAAGCTGTCATGGATGCTTTGGTGCCCCAGGCAGCACCACGCTGGGGCAACTGGGCTCCAGGTG

CTGTGAGTACTGCCATCAGCTCACAGCTACACCTTCTCCACCTTCATATGGTACAAGGGGCAAACAAGTTCCCTTGGACCT

CTTTTATAAGGGCATTAATCCCATTCATGAGGTCTCCACTCTCATGACTTAATCACCTCCCAAAAGGCCCCTTCTCCTAAT

GCCATCACCTTGGGATTTAGGATCTTAATTTAAGAATTTTGGGGAAAACATTCAGACCATAGTATTTTCTTTCTTTAATGC

TGCCTTTGGCCAACAGGAGCTGCCTTGCCCCAAGCCCACATCCACTATCTCATCCCTTTGTCACTCCTCAGCAGCCCAAAG

CTGATGACCAACTAACACGATGGTACAAAACATCAGCCCCTTCGTCACAAGAAGCAATTGATTCTATAGTGCAACTCATCC

ATCAGAGTCCCCTGTGGGATCAGGCAGAAGCCAGACCCCAGTGAAGACACCATTCCTCACTTGGCTCCTCCCCATGCTCTG

US 6,989,232 B2 45 46

TABLE 3C-continued Reverse complement of the NOV3 sense strand.

CTGCACTAAGCCAACACCTCTCATTGCTTTTTCTAAGGCAGGACTTTTGCTAAGGCAGTCACACAGCAGGCTCTTTTTATA

GCAGCAGTAAAACTGGCAGAATGGAAACTGGGAGCTGCTGGGGCCCCTTCCCGTGGGGAGAGCCTGCTGGGAATATAGCCT

CAAGCTATGACTGTAGGGTGGACCAGGACCCCTGCATTCTCCTGATGCGTAATGATACACGGCTTCCCAGAAAAAGGGACT

AATGTGGCCTTTATCTTCAGTGGCTGAGAGTTCCTGAGCCTGGACTTTCAAGGAAGAACTTCTGGAGACTTGTCTAAAGAG

GCAATGAACCAGAAATTAGGGAGAAGAAGAGAATTGCAAGAACAAAAGGCCACCGGCCCAAGCAGCCTACCTTTCTCCTCC

CTGACT CAC

15 The full NOV3 amino acid sequence of the protein of the invention was found to have 59 to 120 amino acid residues (49%) identical to, and 80 to 120 amino acid residues (66%) similar to, the 1011 amino acid residue SPTREMBL ACC:O9NDO8 PRICKLE 2 from Ciona intestinalis. In additional searches of the public databases, NOV3 has homology to the amino acid Sequences shown in the BLASTP data listed in Table 3D.

TABLE 3D

BLAST results for NOV3 Matching Entry (in SwissProt + Length Identity Positives SpTrEMBL) Description (aa) (%) (%) Expect Q9NDQ8: ABO36841; PRICKLE 2. Ciong 1011 59/122 78/122 1e-23 BABOO618.1 intestinalis 6/2001 (48%) (64%) Q9NDQ9; ABO36840; PRICKLE 1. Ciong 1066 58/122 77/122 BABOO617.1 intestinalis. (48%) (63%) 1e-22 prickle 16/2001 LIM-DOMAIN PROTEIN 785 47/69 60/89 2e-20 (ESN PROTEIN). (53%) (67%) drosophila nelanogaster 6/2001 O76007; AJO11654 TRIPLE LIM DOMAIN 615 38/61 49/61 4e-20 CAAO9726.1 PROTEIN. On (62%) (80%) Sapiens 6/2001 CG11084 PROTEIN 1268 47/105 62/105 8e-20 drosophila (45%) (59%) melanogaster 6/2001

The homology of these and other Sequences is shown regions that may be required to preserve Structural or graphically in the ClustalW analysis shown in Table 3E. In functional properties), whereas non-highlighted amino acid the ClustalW alignment of the NOV3 protein, as well as all so residues are less conserved and can potentially be mutated to other ClustalW analyses herein, the black outlined amino a much broader extent without altering protein Structure or acid residues indicate regions of conserved sequence (i.e., function.

TABLE 3E

ClustalW Analysis of NOV3

1) Novel NOV3 (SEQ ID NO: 11) 2) Q9DQ8 (SEQ ID NO : 13) 3) Q9DQ9 (SEQ ID NO : 14) 4) Q9U1I1 (SEQ ID NO: 15) 5) O76 007 (SEQ ID NO:16) 6) Q9V4I9 c-ter fragment (SEQ ID NO: 17) US 6,989,232 B2 47 48

TABLE 3E-continued ClustalW Analysis of NOV3

1 1 1 1 1 ------SEFARGSRRRR 10 241 EEESPEQEAPKPALPPKQKQQRPVPPLPPPPANRVTQDQGTQPAAPQVPLOPETAGDLQF 30

11 SERGEMPSNIDPKS------AGLDQDIVIRGP 11 MPSNIDPKS------AGLDQDIVIRGP 11 GEMPSNIDPKS- ---AGLDQDIVIRGP 37 SHSQR--PES------AISQVASTAHLDVP 11 SGRAPPEAEDPDR------GQPCNSCREQCP 301 LNLSLRQRSLPRSMKPFKDAHDISFTFNELDTSAEPEVATGAAQQESNECRTPLTQISYL 360

35 35 35 66 ------GSGGSAWSGGSGG----- APESAGRFWS---PLQR-- 93 41 ------41 361 QKIPTLPRHFSPSGQGLATPPALGSGGMGLPSSSSASALYAAQAAAGILPTSPLPLQRHQ 420

35 ----- TENEVR------RRQSRRQAS------WRHNR-- 55 35 .. --RRQSRRQAS- NR-- 55 35 RVR------RRQSRRQAS------WRHNR-- 55 93 - - - - - RHCPP- - -SHLPLNSVASPLRTASY AVSE HQ-- 133 41 WRKICQHCKCP------REEHAVHAVPVDLERIMCRLIS-- 74 421 QYLPPHHQHPGAGMGPGPGSGAAAGPPLGPQYSPGCSAS 480

55 82 55 82 55 82 133 162 74 101 481 540

83 ESIGEKYRWRQLLHQLPPH KE 142 83 SIGEKYRVR KR 142 83 ...'. KR 142 163 R 222 102 KK 161 541 RK 600

279 221 657

339 222 281 658 717

263 322 263 322 263 322 34 O 399 282 341 718 777

323 382 323 ESDSQHSSSQYENPQLPTSHNVRRSL 382 323 ESDSQHSSSQYENPQLPTSHNVRRSL 382 400 PSTSSG------4.38 342 -TAPGPSRRSWS - 383 778 PSESSGTGMYT-- - 820

383 NLDNLSIHDKPWEDKGELSPASNNWFIDAADMY SAAW TRYSKGHTRPSHPYLDGM 442 383 NLDNLSIHDKPWEDKGELSPASNNWFIDAADMY SAAW TRYSKGHTRPSHPYLDGM 442 383 NLDNLSIHDKPWEDKGELSPASNNWFIDAADMY SAAW TRYSKGHTRPSHPYLDGM 442 400 ------PQLRP HRASTSSQIAKSPRRGGER------463 342 ------AG------TAPL T------395 778 ------TPTP QRVRTPHQAPLPARIPSSH------845

US 6,989,232 B2

Other BLAST results include sequences from the Patp database, which is a proprietary database that contains TABLE 3F-continued Sequences published in patents and patent publications. Patp Patp Ali f NOV3 results include those listed in Table 3F. at p AlignmentAISC O 5 Sequences TABLE 3F producing High-scoring Protein? Length. Identity Positive Patp Alignment of NOV3 Segment Pairs Organism (as) (%) (%) Expect

Sequences 1O Patp: AAGO1529 Human 126 3O 44 5.8e-05 secreted producing protein High-scoring Protein? Length. Identity Positive H. Sapiens Segment Pairs Organism (as) (%) (%) Expect Patp AAY84378 Amino acid 28O 32 5O O.OOO77 patp:tp: AAW83952 PolwpeptidOlypeptide 159 44 67 1.4e-O7a-e- sequence of a encoded by 15 domainhuman LIMprotein gene 2 clone homologue HDTAY29- H. Sapiens H. Sapiens Patp: AAY57563 Human testin 421 44 67 3.4e-05 (HTES)- H. 2O Sapiens - patp. AAB93751 Human protein 464 44 67 3.4e-05 The results of a domain search indicate that the NOV3 SEO ID NO: protein contains the protein domain (as defined by Interpro) 13416 H. named IPRO01781 at amino acid positions 43 to 76. Table

Pat:at: AAB42119 RéSal ORFX 464 44 Ue is 3G lists the domain description from further DOMAIN p ORF1883 analysis results against NOV3. This indicates that NOV3 has polypeptide- properties similar to those of other proteins known to H. Sapiens contain these domains and Similar to the properties of these domains.

TABLE 3G Domain Analysis of NOV3 PRODOMANALYSIS

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) Nprdm: 21599 p36 (1) TES2 MOUSE / TESTIN 2 (TES2) (CONTAIN ... 127 1.8e-08 prdm: 39635 p36 (1) ZYX MOUSE // ZYXIN. REPEAT LIM MOTIF; . . . 68 O.048 prdm: 67 p36 (155) LIM1 (10) LIM3 (8) PAXI(8) || PROTEIN ... 67 O.O61 prdm:55854 p36 (1) HMW1 MYCGE / CYTADHERENCE HIGH MOLECU . . . 72 O.15 prdm: 7588 p36 (3) SLI3 (2) LRG1 (1) || PROTEIN LIM MOTIF ... 61 O.25 prdm: 21599 p36 (1) TES2 MOUSE / TESTIN 2 (TES2) (CONTAINSTESTIN 1 (TES1)). LIM MOTIF: METAL-BINDING: ZINC: ALTERNATIVE SPLICING, 66 aa. Expect = 1.8e-08, Identities = 19743 (44%), Positives = 29/43 (67%) for NOV3 aa residues 29 to 71; and LIM Domain residues 19 to 61 >prdm: 39635 p36 (1) ZYX MOUSE / ZYXIN. REPEAT LIM MOTIF: METAL-BINDING: ZINC; CELL ADHESION, 44 aa. Identities = 13/34 (38%), Positives = 19/34 (55%) >prdm: 67 p36 (155) LIM1 (10) LIM3 (8) PAXI (8) || PROTEIN LIM MOTIF METAL-BINDING ZINCREPEAT HOMEOBOX NUCLEAR DNA-BINDING DEVELOPMENTAL, 68 aa. Identities = 14/37 (37%), Positives = 20/37 (54%) >prdm:55854 p36 (1) HMW1 MYCGE / CYTADHERENCE HIGH MOLECULAR WEIGHT PROTEIN 1 (CYTADHERENCE ACCESSORY PROTEIN 1). STRUCTURAL PROTEIN, 107 aa. Identities = 18/67 (26%), Positives = 37/67 (55%) >prdm: 7588 p36 (3) SLI3 (2) LRG1 (1) If PROTEIN LIM MOTIF METAL-BINDING ZINC REPEAT SKELETAL MUSCLE LIM-PROTEIN SLIM, 67 aa. Identities = 20/55 (36%), Positives = 30/55 (54%)

BLOCKS ANALYSIS

AC# Description Strength Score AA# BLOO11SR Eukaryotic RNA polymerase II heptapeptide rep 2074 1110 124 BLOO911C Dihydroorotate dehydrogenase proteins. 1314 1OSO 2O1 BLO1137D Uncharacterized protein family UPF0006 protei 1297 104.8 126 BLOO576D General diffusion Gram-negative porins protei 1391 1047 172 BLO1182C Glycosyl hydrolases family 35 proteins. 1577 1046 73 US 6,989,232 B2 S3 S4

TABLE 3G-continued Domain Analysis of NOV3 ProSite Analysis NOV3 aa position Pattern-ID: ASN GLYCOSYLATION PS00001 (Interpro) 78, 171, 312 Pattern-DE: N-glycosylation site, Pattern: NI PST PI Patcern-ID: CAMP PHOSPHO SITE PS00004 (Interpro) 211 Pattern-DE: cAMP- and coMP-dependent protein kinase phosphorylation site Pattern. RK {2} . STI Pattern-ID: PKC PHOSPHO SITE PS00005 (Interpro) 95, 98, 123, 287, 300, 314 Pattern-DE: Protein kinase C phosphorylation site Pattern: ST. RK Pattern-ID: CK2 PHOSPHO SITE PS00006 (Interpro) 72, 157, 243, 251, 295 Pattern-DE: Casein kinase II phosphorylation site Pattern: ST. {2} DE Pattern-ID: TYR PHOSPHO SITE PS00007 (Interpro) 156, 227 Pattern-DE: Tyrosine kinase phosphorylation site Pattern: RK. {2,3} IDE. {2,3}Y Pattern-ID: MYRISTYL PS00008 (Interpro) 24, 63, 79, 192, 272,303 Pattern-DE: N-myristoylation site Pattern: GEDRKHPFYW. {2} STAGCNIP Pattern-ID: LEUCINE ZIPPER PS00029 (Interpro) 119 Pattern-DE: Leucine zipper pattern

The LIM domain is a Zinc finger Structure that is present 25 grossly resembles leukoplakia, is not precancerous. The eye in Several types of proteins, including homeodomain tran lesions resemble pterygia (see OMIM 178000). The only Scription factors, kinases and proteins that consist of Several Symptoms are produced by involvement of the cornea, LIM domains. Proteins containing LIM domains have been resulting in impairment of vision. discovered to play important roles in a variety of fundamen The human homolog of Drosophila discs large-3 (DLG3) tal biological processes including cytoskeleton organization, is a protein related to Prickle and LIM. See, OMIM Entry cell lineage specification and organ development, but also 300189. Mutations of the “discs large (dlg) tumor suppres for pathological functions such as oncogenesis, leading to sor locus in Drosophila lead to imaginal disc neoplasia and human disease. The LIM domain has been demonstrated to a prolonged larval period followed by death. Drosophila dlg. be a protein-protein interaction motif that is critically and related proteins form a Subfamily of the membrane involved in these processes. The recent isolation and analy 35 associated guanylate kinase (MAGUK) protein family and sis of more LIM domain-containing proteins from Several are important components of Specialized cell junctions. See Species have confirmed and broadened our knowledge about DLGI (OMIM 601014). A partial cDNA encoding NEDLG LIM protein function. Furthermore, the identification and (neuroendocrine DLG) was isolated by searching an EST characterization of factors that interact with LIM domains database for Sequences related to dig and DLGI. See, illuminates mechanisms of combinatorial developmental 40 Makino et al. (1997). Northern blot analysis revealed that regulation. NEDLG is highly expressed in neuronal and endocrine LIM domain containing proteins generally have two tan tissues. Immunolocalization Studies indicated that the pro dem copies of a domain, called LIM (for Lin-11 Isl-1 Mec-3) tein was expressed mainly in nonproliferating cells, Such as in their N-terminal Section. ZyXin and paxillin are excep neurons, cells in Langerhans islets of the pancreas, myocytes tions in that they contains respectively three and four LIM 45 of heart muscles, and the prickle and functional layer cells domains at their C-terminal eXtremity. In apterous, isl-1, of the esophageal epithelium. In a yeast 2-hybrid assay, LH-2, lin-11, lim-1 to lim-3, limx-1 and ceh-14 and mec-3 NEDLG interacted with the C-terminal region of the APC there is a homeobox domain some 50 to 95 amino acids after (OMIM 175100) tumor suppressor protein. Therefore, the LIM domains. In the LIM domain, there are seven NEDLG may negatively regulate cell proliferation through conserved cysteine residues and a histidine. The arrange 50 its interaction with the APC protein. By fluorescence in situ ment followed by these conserved residues is C-x(2)-C-X hybridization, Makino et al. (1997) mapped the NEDLG (16.23)-H-x(2)-CH-x(2)-C-X(2)-C-X(16.21)-C-x(2,3)- gene to Xq13. Using radiation hybrid panels, Stathakis et al. CHD). The LIM domain binds two zinc ions. LIM does not (1998) refined the map position to Xq13.1. DLG3 is located bind DNA, rather it seems to act as interface for protein within the dystonia-parkinsonism syndrome (DYT3; OMIM protein interaction. 55 314250) locus. The Prickle gene in Drosophila belongs to a family of The disclosed NOV3 nucleic acid encoding a LIM “tissLue polarity” genes that control the orientation of domain-containing Prickle-like Secreted protein includes the bristles and hairs in the adult cuticle. (See Gubb and nucleic acid whose Sequence is provided in Table 3A, or a Garcia-Bellido, J. Embryol. Exp. Morphol. 68:37-57 fragment thereof. The invention also includes a mutant or (1982)) These “tissue polarity genes play important roles in 60 variant nucleic acid any of whose bases may be changed the organization of the cytoskeleton. Prickle has been shown from the corresponding base shown in Table 3A while still to be involved in hereditary benign intraepithelial dyskera encoding a protein that maintains its LIM-domain tosis (OMIM Entry: 127600). Characteristic histologic containing Prickle-like activities and physiological changes of the prickle cell layer of the mucosa include functions, or a fragment of Such a nucleic acid. The inven numerous round, waxy-looking, eosinophilic cells that 65 tion further includes nucleic acids whose Sequences are appear to be engulfed by normal cells. The conjunctiva and complementary to those just described, including nucleic oral mucous membranes are affected. The oral lesion, which acid fragments that are complementary to any of the nucleic US 6,989,232 B2 SS acids just described. The invention additionally includes and (v) a composition promoting tissue regeneration in vitro nucleic acids or nucleic acid fragments, or complements and in Vivo (vi) biological defense weapon. thereto, whose structures include chemical modifications. Based on the tissues in which NOV3 is most highly Such modifications include, by way of nonlimiting example, expressed, including kidney and ovary, Specific uses include modified bases, and nucleic acids whose Sugar phosphate developing products for the diagnosis or treatment of a backbones are modified or derivatized. These modifications variety of diseases and disorders. Additional disease indi are carried out at least in part to enhance the chemical cations and tissue expression for NOV3 is presented in stability of the modified nucleic acid, such that they may be Example 2. used, for example, as antisense binding nucleic acids in The nucleic acids and proteins of the invention are useful therapeutic applications in a Subject. In the mutant or variant in potential diagnostic and therapeutic applications impli nucleic acids, and their complements, up to about 17% cated in, but not limited to, Various diseases and disorders percent of the bases may be So changed. described below and/or other pathologies. For example, the The disclosed NOV3 protein of the invention includes the compositions of the present invention will have efficacy for LIM-domain-containing Prickle-like protein whose treatment of patients Suffering from: dystonia-parkinsonism sequence is provided in Table 3B. The invention also 15 Syndrome, dyskeratosis, hereditary benigh intraepithelial; includes a mutant or variant protein any of whose residues developmental disorders and other diseases, disorders and may be changed from the corresponding residue shown in conditions of the like. A cDNA encoding the LIM-domain Table 3B while still encoding a protein that maintains its containing Prickle-like protein NOV3 may be useful in gene LIM-domain-containing Prickle-like activities and physi therapy, and the Prickle-like protein NOV3 may be useful ological functions, or a functional fragment thereof. In the when administered to a Subject in need thereof. mutant or variant protein, up to about 16% percent of the These materials are further useful in the generation of residues may be So changed. antibodies that bind immunospecifically to the novel Sub The invention further encompasses antibodies and anti stances of the invention for use in therapeutic or diagnostic body fragments, such as Fab, (Fab)2 or single chain FV methods. constructs, that bind immunospecifically to any of the pro 25 teins of the invention. Also encompassed within the inven NOV3 nucleic acids and polypeptides are further useful in tion are peptides and polypeptides comprising Sequences the generation of antibodies that bind immuno-specifically having high binding affinity for any of the proteins of the to the novel NOV3 Substances for use in therapeutic or invention, including Such peptides and polypeptides that are diagnostic methods. These antibodies may be generated fused to any carrier particle (or biologically expressed on the according to methods known in the art, using prediction Surface of a carrier) Such as a bacteriophage particle. from hydrophobicity charts, as described in the “Anti The protein similarity information, expression pattern, NOVX Antibodies' Section below. The disclosed NOV3 and map location for the novel LIM-domain-containing protein has multiple hydrophilic regions, each of which can Prickle-like NOV3 protein and nucleic acid disclosed herein be used as an immunogen. In one embodiment, a contem Suggest that this novel LIM-domain-containing Prickle-like 35 plated NOV3 epitope is from about amino acids 25 to 50. In protein may have important Structural and/or physiological another embodiment, a NOV3 epitope is from about amino functions characteristic of the LIM-domain-containing acids 55 to 140. In additional embodiments, NOV3 epitopes Prickle-like protein family. For example, NOV3 may be are from about amino acids 145 to 180, from about amino important for the proper organization of cytoskeleton, or in acids 180 to 225, and from about amino acids 250 to 280. the treatment of dystonia-parkinsonism Syndrome; heredi 40 These novel proteins can be used in assay Systems for tary benign intraepithelial dyskeratosis, developmental dis functional analysis of various human disorders, which will orders and other diseases, disorders and conditions of the help in understanding of pathology of the disease and like. Accordingly, NOV3 nucleic acids and proteins may development of new drug targets for various disorders. have potential diagnostic and therapeutic applications in NOV4 treating disorders that involve cytoskeleton malfunctions. 45 A disclosed NOV4 nucleic acid of 1278 nucleotides (also These include Serving as a specific or Selective nucleic acid referred to as CG56824-01) encoding a novel lipid or protein diagnostic and/or prognostic marker, wherein the metabolism-like protein is shown in Table 4A. An open presence or amount of the nucleic acid or the protein are to reading frame was identified beginning with an ATG initia be assessed, as well as potential therapeutic applications tion codon at nucleotides 184 to 186 and ending with a TGA Such as the following: (i) a protein therapeutic, (ii) a small 50 codon at nucleotides 1195 to 1197. Putative untranslated molecule drug target, (iii) an antibody target (therapeutic, regions upstream from the initiation codon and downstream diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic from the termination codon are underlined in Table 4A, and acid useful in gene therapy (gene delivery/gene ablation), the Start and Stop codons are in bold letters.

TABLE 4A NOV4 nucleotide sequence. (SEQ ID NO: 18) CTCTCGGGCCCAACGCCCCAATCCTTGCGTGTCCTTGCAGTCCCACCCCACACTCAGCCTTGTGTCCCTCGATCCAGT

CTCCGACTCCATTTCCCACCCTAAACCGCCTACCCGGGTCTGTTCCCCGCCCGGTTGTCCTCGCCCTGCTGCGCTGAG

TGTCCCCTGTTAGCCTCGACCCCAGGCGCTGCAGACGCTGCAGAGCTCGTGGGTGACCTTCCGCAAGATCCTGTCTCAC

TTCCCCGAGGAGCTGAGTCTGGCTTTCGTCTACGGCTCCGGGGTGTACCGCCAGGCAGGGCCCAGTTCAGACCAGAAGAA

TGCTATGCTGGACTTTGTGTTCACAGTAGATGACCCTGTCGCATGGCATTCAAAGAACCTGAAGAAAAATTGGAGTCACT US 6,989,232 B2 57 58

TABLE 4A-continued NOV 4 nucleotide sequence.

ACTCTTTCCTAAAAGTTTTAGGGCCCAAGATTATCACGTCCATCCAGAATAACTATGGCGCTGGAGTTTACTACAATTCA

TTGATCATGTGTAATGGTAGGCTTATCAAATATGGAGTTATTAGCACTAACGTTCTGATTGAAGATCTCCTCAACTGGAA

TAACTTATACATTGCTGGACGACTCCAAAAACCGGTGAAAATTATCTCAGTGAACGAGGATGTCACTCTTAGATCAGCCC

TCGATAGAAATCTGAAGAGTGCTGTGACCGCTGCTTTCCTCATGCTCCCCGAAAGCTTTTCTGAAGAAGACCTCTTCATA

GAGATTGCCGGTCTCTCCTATTCAGGTGACTTTCGGATGGTGGTTGGAGAAGATAAAACAAAAGTGTTGAATATTGTGAA

GCCCAATATAGCCCACTTTCGAGAGCTCTATGGCAGCATACTACAGGAAAATCCTCAAGTGGTGTATAAAAGCCAGCAAG

GCTGGCTGGAGATAGATAAAAGCCCAGAAGGACAGTTCACTCAGCTGATGACATTGCCCAAAACCTTACAGCAACAGATA

AATCATATTATGGACCCTCCTGGAAAAAACAGAGATGTGGAAGAAACTTTATTCCAAGTGGCTCATGATCCCGACTGTGG

AGATGTGGTGCGACTAGGGCTTTCAGCAATCGTGAGACCGTCTAGTATAAGACAGAGCACGAAAGGCATTTTTACTGCTG

GCCTGAAGAAGTCAGTGATTTATAGTTCACTAAAACTGCACAAAATGTGGAAAGGGTGGCTGAGGAAAACATCCGATTT

TGCTTGCTTTTATATATGTTATGTGTAGATGAATAAAGTGTTTGATCCTTTTTGACAAAAAAAAAAAAAAAAAAAAAA

In a search of public sequence databases, the NOV4 25 Sequence was cloned by the polymerase chain reaction nucleic acid sequence has 96 of 101 bases (95%) identical to (PCR) using the primer set NOV4-2, shown in Table 17A. a human cDNA clone NT2RP3003346. Public nucleotide The PCR product derived by exon linking, covering the databases include all GenBank databases and the GeneSeq entire NOV4 open reading frame, was cloned into the patent database. pCR2.1 vector from Invitrogen to provide clone A disclosed NOV4 polypeptide (SEQ ID NO:19) encoded by SEQ ID NO:18 has 337 amino acid residues and is 11 O189::COR24SC128.69823O.M23. presented in Table 4B using the one-letter amino acid code. SignalP, Psort and/or Hydropathy results predict that NOV4 has a signal peptide. The most likely cleavage Site is between amino acid positions 14 and 15, i.e., at the dash between 35 TFR-KI. NOV4 is likely to be localized to the mitochondrial matrix space with a certainty of 0.6567. In alternative embodiments, NOV4 is localized to the mitochondrial inner membrane with a certainty of 0.3497, to the mitochondrial intermembrane space with a certainty of 0.3497, or the 40 mitochondrial outer membrane with a certainty of 0.3497. NOV4 has a molecular weight of 38,078.6 Daltons.

TABLE 4B Encoded NOV4 protein sequence. (SEQ ID NO:19) MALQTLOSSWWTFRKILSHIFPEELSLAFWYGSGWYROAGPSSDQKNAMLDFWFTWDDPVAWHSKNLKK

NWSHYSFLKVLGPKIITSIONNYGAGWYYNSLIMCNGRLIKYGVISTNWLIEDLLNWNNLYIAGRLOK

PWKIISWNEDWTLRSALDRNLKSAWTAAFILMLPESFSEEDLFIEIAGLSYSGDFRMWWGEDKTKWLNI

WKPNIAHFRELYGSILOENPOVWYKSQQGWLEIDKSPEGQFTQLMTLPKTLQQQINHIMDPPGKNRDV

EETLFOVAHDPDCGDVWRLGLSAIVRPSSIRQSTKGIFTAGLKKSWIYSSLKLHKMWKGWLRKTS

The NOV4 nucleic acid was tentatively localized to The reverse complement for NOV4 is presented in Table human chromosome 3. The cDNA coding for the NOV4 4C.

TABLE 4C NOV4 reverse complement (SEQ ID NO: 20) TTTTTTTTTTTTTTTTTTTTTTGTCAAAAAGGATCAAACACTTTATTCATCTACACATAACAATAAAAAGCAAGCA US 6,989,232 B2 59 60

TABLE 4 C-continued NOV4 reverse complement

AAATCAGGATGTTTTCCTCAGCCACCCTTTCCACATTTTGGCAGTTTTAGTGAACTATAAATCACGACTTCTTCAG

GCCAGCAGTAAAAATGCCTTTCGTGCTCTGTCTTATACTAGACGGTCTCACGATTGCTGAAAGCCCTAGTCGCACCAC

ATCTCCACAGTCGGGATCATGAGCCACTTGGAATAAAGTTTCTTCCACATCTCTGTTTTTTCCAGGAGGGTCCATAAT

AGATTACTGTTGCTGTAAGGTTTTGGGCAATGTCATCAGCTGAGTGAACTGCCTTCGGGCTTTTATCTATCC

CAGCCAGCCTTGCTGGCTTTTATACACCACTTGAGGATTTTCCTGTAGTATGCTGCCATAGAGCTCTCGAAAGTGGGC

TATATTGGGCTTCACAATATTCAACACTTTTGTTTTATCTTCTCCAACCACCATCCGAAAGTCACCTGAATAGGAGAG

ACCGGCAATCTCTATGAAGAGGTCTTCTTCAGAAAAGCTTTCGGGGAGCATGAGGAAAGCAGCGGTCACAGCACTCTT

CAGATTTCTATCGAGGGCTGATCTAAGAGTGACATCCTCGTTCACTGAGATAATTTTCACCGGTTTTTGGAGTCGTCC

AGCAATGTATAAGTTATTCCAGTTGAGGAGATCTTCAATCAGAACGTTAGTGCTAATAACTCCATATTTGATAAGCCT

ACCATTACACATGATCAATGAATTGTAGTAAACTCCAGCGCCATAGTTATTCTGGATGGACGTGATAATCTTGGGCCC

TAAAACTTTTAGGAAAGAGTAGTGACTCCAATTTTTCTTCAGGTTCTTTGAATGCCATGCGACAGGGTCATCTACTGT

GAACACAAAGTCCAGCATAGCATTCTTCTGGTCTGAACTGGGCCCTGCCTGGCGGTACACCCCGGAGCCGTAGACGAA

AGCCAGACTCAGCTCCTCGGGGAAGTGAGACAGGATCTTGCGGAAGGTCACCCACGAGCTCTGCAGCGTCTGCAGCGC

CATGGGGTCGAGGCTAACAGGGGACACTCAGCGCAGCAGGGCGAGGACAACCGGGCGGGGAACAGACACCGGGTAGGC

GGTTTAGGGTGGGAAATGGAAGTCGGAGACTGGATCGAGGGACACAAGGCTGAGTGTGGGGTGGGACTGCAAGGACAC

GCAAGGATTGGGGCGTTGGGCCACGAAGAG

In a search of public sequence databases, the NOV4 amino acid Sequence has 90 of 214 amino acid residues TABLE 4D-continued (42%) identical to, and 137 residues (21.4%) positive with, the 274 amino acid residue C. elegans Y71F9B.2 protein. BLAST results for NOV4 Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. Gene Indexf Length. Identity Positives It was also found that NOV4 had homology to the O 4 Identifier Protein?Organism (aa) (%) (%) Expect acid sequences shown in the BLASTP data listed in Table 4D. AEOO3706; PROTEIN. (41%) (57%) AAF55108.1 drosophila TABLE 4D nelanogaster. BLAST results for NOV4 45 5/2OOO Q9SN75; HYPO- 332 102,314 170/314, 7e-41 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect AL132955; THETICAL (32%) (54%) CAB61989.1 37.4 KDA Q9CW36; 1500001M2OR1K 367 271/332 304/332, 1e-160 AKO05100; PROTEIN (82%) (92%) 50 PROTEIN. BAB23818.1 (FRAGMENT). arabidopsis nus musculus. thaliana. 6/2001 O74339; HYPO- 383 119/325 174/325, 2e-47 5/2OOO AL031174: THETICAL (37%) (54%) CAA2O110.1 44.3 KDA PROTEIN 55 C1A406C IN HROMOSOME The homology of these and other Sequences is shown schizosac- graphically in the ClustalW analysis shown in Table 4E. In charomyces the ClustalW alignment of the NOV4 protein, as well as all g other ClustalW analyses herein, the black outlined amino Q9N4G7; Y71F9B.2 274 111/320 169/320, 5e-47 acid residues indicate regions of conserved sequence (i.e., ACO242O1; PROTEIN (35%) (53%) regions that may be required to preserve Structural or AAF36018.1 eleganscaenorhabditis functional properties), whereas non-highlighted amino acid 10/2OOO 65 residues are less conserved and can potentially be mutated to Q9VFF2; CG3641 647 109/269 152/269, 2e-44 a much broader extent without altering protein Structure or function.

US 6,989,232 B2 63 64

TABLE 4E-continued ClustalW Analysis of NOV4

NOW4 346 Q9CW36 346 Of 4339 361 Q9N4GT 258 ------IY------SMAK------266 Q9 VFF2 543 AASGPKSLSSKELEQMAQWKYEQQRDDEQESMAKRHKKKHKREESLVELHQKKLRKEQRE 6 O2 Q9SN75 315 ------ATMY------ES 323

NOW4 354 ------LNKMWKGWMSKAS---- 367

Q9CW36 354 -LNKMWKGWMSKAS---- 367 Of 4339 371 --CHSFRWYMSMRS.---- 383 Q9N4G7 266 ------MSKFLKSK------274 Q9WFF2 603 KPERRPFSRDVDLKLNKIDKNOTKQIVDKAKILNTKFSRGOAKYL 647 Q9SN75 323 ------MRKAWNSRA------332

Table 4F lists the domain description from DOMAIN Sequence has properties Similar to those of other proteins analysis results against NOV4. This indicates that the NOV4 known to contain this domain.

TABLE 4F Domain Analysis of NOV4 ProDom Protein Domain Analysis prdm:5074.9 p36 (1) YG1W YEAST //HYPOTHETICAL 44.2 KD PROTEIN IN RME1-TFC 4 INTERGENIC REGION. HYPOTHETICAL PROTEIN, 385 aa. Expect = 2. 1e-41, Identities = 85/209 (40%), Positives = 117/209 (55%) for NOV4: 16 to 222; Sbjct: 116 to 324 Expect = 2. 1e-41, Identities = 19/39 (48%), Positives = 28/39 (71%) for NOV4: 290 to 328; Sbict: 344 to 382 prdm: 29671 p36 (1) PMFF PROMIA/PUTATIVE MINOR FIMBRIAL SUBUNIT PMFF PRECURSOR. FIMBRIA; SIGNAL 53 a.a. Expect = 0. 64, Identities = 15/48 (31%), Positives = 27/48 (56%) for NOV4: 157 to 202; Sbict: 6 to 53 prdM: 16833 p36 (2) VL96 (2)//L96 PROTEIN REPEAT DNA PACKAGING DNA-BINDING, 61 aa. Expect = 2.2, Identities = 11/32 (34%), Positives = 18/32 (56%) for NOV4: 21 to 52; Sbict: 9 to 40 prdm: 2442 p36 (10) INVO (10)//INVOLUCRIN KERATINOCYTE REPEAT 65 aa. Expect = 4.7, Identities = 14/40 (35%), Positives = 20/40 (50%) for NOV4: 242 to 276; Sbict: 8 to 47 prdm: 15830 p.36 (2) GLG1 (1) GLG2 (1) // GLYCOGEN SYNTHESIS INITIATOR PROTEIN BIOSYNTHESIS GLG1 GLG2, 51 aa. Expect = 6. O. Identities = 10/23 (43%), Positives = 14/23 (60%) for NOV4: 254 to 276; Sbict: 22 to 44

BLOCKS Protein Domain Analysis AC# Description Strength Score BLOO 115R 0 Eukaryotic RNA polymerase II heptapeptide rep 2074 1110 BLOO 911C 0 Dihydroorotate dehydrogenase proteins. 1314 105 O BLO 1137D 0 Uncharacterized protein family UPF0006 protei 1297 104.8 BLOO576B O General diffusion Gram-negative porins protei 1391 104.7 BLO 1182C O Glycosyl hydrolases family 35 proteins. 1577 1046

ProSite Protein Domain Analysis NOV4 aa position Pattern-ID : ASN GLYCOSYLATION PS00001 (Interpro) 69 Pattern-DE: N-glycosylation site Pattern: N. PST P.

Pattern-ID: CAMP PHOSPHO SITE PS00004 (Interpro) 334 Pattern-DE: cAMP- and ccMP-dependent protein kinase phosphorylation site Pattern: RK {2} . STI Pattern-ID : PKC PHOSPHO SITE PSOOOO5 (Interpro) 12, 148, 301, 305, 322 Pattern-DE: Protein kinase C phosphorylation site US 6,989,232 B2 65 66

TABLE 4F-continued Domain Analysis of NOV4 Pattern: ST.RK

Pattern-ID : CK2 PHOSPHO SITE PSOOOO6 (Interpro) 54, 142, 151, 171 Pattern-DE: Casein kinase II phosphorylation site Pattern: ST. {2} DE Pattern-ID : MYRISTYL PSOOOO8 (Interpro) 94, 111, 183, 308 Pattern-DE: N-myristoylation site M Pattern: G EDRKHPFYW. {2} STAGCN P

Other BLAST results include sequences from the Patp 15 The above defined information for this invention Suggests database, which is a proprietary database that contains that this lipid metabolism-like protein (NOV4) may function Sequences published in patents and patent publications. A as a member of a v family”. Therefore, the NOV4 nucleic BLASTP analysis of the patp database showed that NOV4 acids and proteins identified here may be useful in potential has 85 of 209 aa residues (40%) identical to, and 117 of 209 therapeutic applications implicated in (but not limited to) aa residues (55%) positive with, the 385 aa Saccharomyces various pathologies and disorders as indicated below. The cerevisiae Lipid metabolism protein encoded by the open potential therapeutic applications for this invention include, reading frame YGR046w (patp: AAB19189, Expect=1.6e but are not limited to: cardiovascular disease research tools, 40). Patp results include those listed in Table 4G. for all tissues and cell types composing (but not limited to) those defined here TABLE 4G Based on the tissues in which NOV4 is most highly 25 expressed; including duodenum, Small intestine, uterus, Patp alignments of NOVA thymus, CAEC, liver, breast, lung, kidney; Specific uses Smallest include developing products for the diagnosis or treatment Sum of a variety of diseases and disorders. Additional disease High Prob. indications and tissue expression for NOV4 is presented in Sequences producing High-scoring Segment Pairs: Score P (N) Example 2. patp:AAB19189 Lipid metabolism protein encoded by 374 1.6e-40 The NOV4 nucleic acids and proteins of the invention are the open reading frame YGR046w-Saccharomyces useful in potential therapeutic applications implicated in cerevisiae, 385 aa. cancer including but not limited to heart disease, Stroke and/or other pathologies and disorders. For example, a The disclosed NOV4 nucleic acid encoding a lipid 35 cDNA encoding the lipid metabolism-like protein (NOV4) metabolism associated protein-like protein includes the may be useful in cardiovascular disease therapy, and the nucleic acid whose Sequence is provided in Table 4A, or a lipid metabolism-like protein (NOV4) may be useful when fragment thereof. The invention also includes a mutant or administered to a subject in need thereof. By way of variant nucleic acid any of whose bases may be changed nonlimiting example, the compositions of the present inven from the corresponding base shown in Table 4A while still tion will have efficacy for treatment of patients Suffering encoding a protein that maintains its lipid metabolism-like 40 from cardiovascular disease including but not limited to activities and physiological functions, or a fragment of Such heart disease, hypertension, diabetes, Stroke and renal fail a nucleic acid. The invention further includes nucleic acids ure. The NOV4 nucleic acid encoding lipid metabolism-like whose Sequences are complementary to those just described, protein, and the lipid metabolism-like protein of the including nucleic acid fragments that are complementary to invention, or fragments thereof, may further be useful in any of the nucleic acids just described. The invention 45 diagnostic applications, wherein the presence or amount of additionally includes nucleic acids or nucleic acid the nucleic acid or the protein are to be assessed. fragments, or complements thereto, whose Structures NOV4 nucleic acids and polypeptides are further useful in include chemical modifications. Such modifications include, the generation of antibodies that bind immuno-specifically by way of nonlimiting example, modified bases, and nucleic to the novel NOV4 substances for use in therapeutic or acids whose Sugar phosphate backbones are modified or 50 diagnostic methods. These antibodies may be generated derivatized. These modifications are carried out at least in according to methods known in the art, using prediction part to enhance the chemical Stability of the modified nucleic acid, Such that they may be used, for example, as antisense from hydrophobicity charts, as described in the “Anti binding nucleic acids in therapeutic applications in a Subject. NOVX Antibodies' Section below. The disclosed NOV4 In the mutant or variant nucleic acids, and their protein has multiple hydrophilic regions, each of which can complements, up to about 45% percent of the bases may be 55 be used as an immunogen. In one embodiment, a contem So changed. plated NOV4 epitope is from about amino acids 1 to 20 In The disclosed NOV4 protein of the invention includes the another embodiment, a NOV4 epitope is from about amino lipid metabolism-like protein whose Sequence is provided in acids 30 to 55. In additional embodiments, NOV4 epitopes Table 4B. The invention also includes a mutant or variant are from about amino acids 60 to 75, from about amino acids protein any of whose residues may be changed from the 80–95, from about amino acids 120 to 160, from about corresponding residue shown in Table 4B while still encod 60 amino acids 185-290 and from about amino acids 300-337. ing a protein that maintains its lipid metabolism-like activi These novel proteins can be used in assay Systems for ties and physiological functions, or a functional fragment functional analysis of various human disorders, which will thereof. In the mutant or variant protein, up to about 58% help in understanding of pathology of the disease and percent of the residues may be So changed. development of new drug targets for various disorders. The invention further encompasses antibodies and anti 65 NOV5 body fragments, Such as F, or (F) that bind immuno In another embodiment, the novel sequence is NOV5 Specifically to any of the proteins of the invention. (alternatively referred to herein as 24SC239), which US 6,989,232 B2 67 68 includes the 983 nucleotide sequence (SEQ ID NO:26) shown in Table 5A. A NOV5 ORF begins with a Kozak consensus ATG initiation codon at nucleotides 66-68 and ends with a TGA codon at nucleotides 551-553. Putative untranslated regions upstream from the initiation codon and downstream from the termination codon are underlined in Table 5A, and the start and stop codons are in bold letters.

TABLE 5A NOV5 Nucleotide Sequence (SEQ ID NO: 26) CCGCGGCTGTGTCGTCATACTTGCGCGCCGACGCCGCCGCTCGCTTGTGAAACTGGAAGGCTGCCAGGCTAGCCCAGC

CGCCTCCTCGGTGCGACCACCGAGGCCCAAGAAAGAGCCGCAGACGCTCGTCATCCCCAAGAATGCGGCGGAGGAGCAG

AAGCTCAAGCTGGAGCGGCTCATGAAGAACCCGGACAAAGCAGTTCCAATTCCAGAGAAAATGAGTGAATGGGCACCTC.

GACCTCCCCCAGAATTTGTCCGAGATGTCATGGGTTCAAGTGCTGGGGCCGGCAGTGGAGAGTTCCACGTGTACAGACA

TCTGCGCCGGAGAGAATATCAGCGACAGGACTACATGGATGCCATGGCTGAGAAGCAAAAATTGGATGCAGAGTTTCAG

AAAAGACTGGAAAAGAATAAAATTGCTGCAGAGGAGCAGACCGCAAAGCGCCGGAAGAAGCGCCAGAAGTTAAAAGAGA

AGAAATTACTGGCAAAGAAGATGAAACTTGAACAGAAGAAACAAGAAGGACCCGGTCAGCCCAAGGAGCAGGGGTCCAG

CAGCTCTGCGGAGGCATCTGGAACAGAGGAGGAGGAGGAAGTGCCCAGTTTCACCATGGGGCGATGACAATGTTTGCCA

CAGCCTCTGCCTGGAACCTGGCTCGTGCTGTGACCAGAAGGGAAAGGCGGCTGTTTGGCTCTTTCTCCCCCGCAAGGAC

CCGCTGACCCGCTGGATGGAGAGCAAAGGAGACCCCTCCCGAGCCGCTCACAGTCCTGTATTTGGCAGGTTTGGGAGCC

TGAGGGGCCATCTCCCTGACACTCAGAGGCACTGCCTTGCAGACACCATCCGTGCTCCTGGTAAAGGGGGACAGAGAGC

CTCACCTTGCCACATATTTGAACAGTGATGAGTTTGGGGCTGGTTTCTGGGAAGGGAACGTTTATTTAGTAAAGAGCAG

AACACCCTTAAAAAAAAAAAAAAAAAAAAAAAAAA

35 The NOV5 protein (SEQ ID NO:27) encoded by SEQ ID NO:26 is 184 amino acids in length and is presented using the one-letter code in Table 5B. The Psort profile for NOV5 predicts that this Sequence has no known signal peptide and is likely to be localized at the nucleus with a certainty of 40 0.9883. In alternative embodiments, a NOV5 polypeptide is located to the mitochondrial matrix Space with a certainty of 0.1000, or the lysosome (lumen) with a certainty of 0.1000. The NOV5 protein has a molecular weight of 20996.9 Daltons.

TABLE 5B NOV5 protein sequence (SEQ ID NO: 27) MASPAASSWRPPRPKKEPQTLVIPKNAAEEOKLKLERLMKNPDKAWPIPEKMSEWAPRPPPEFVRDVMGSSAGAGSGEF

EQGSSSSAEASGTEEEEEWPSFTMGR

The reverse complement for NOV5 is presented in Table 5C.

TABLE 5C NOV5 reverse complement (SEQ ID NO: 28) TTTTTTTTTTTTTTTTTTTTTTTTTTAAGGGTGTTCTGCTCTTTACTAAATAAACGTTCCCTTCCCAGAAACCAGCCCC US 6,989,232 B2 69 70

TABLE 5C-continued NOV5 reverse complement

AAACTCATCACTGTTCAAATATGTGGCAAGGTGAGGCTCTCTGTCCCCCTTTACCAGGAGCACGGATGGTGTCTGCAAG

GCAGTGCCTCTGAGTGTCAGGGAGATGGCCCCTCAGGCTCCCAAACCTGCCAAATACAGGACTGTGAGCGGCTCGGGAG

GGGTCTCCTTTGCTCTCCATCCAGCGGGTCAGCGGGTCCTTGCGGGGGAGAAAGAGCCAAACAGCCGCCTTTCCCTTCT

GGTCACAGCACGAGCCAGGTTCCAGGCAGAGGCTGTGGCAAACATTGTCATCGCCCCATGGTGAAACTGGGCACTTCCT

CCTCCTCCTCTGTTCCAGAGCCTCCGCAGAGCGCTGGACCCCTGCTCCTTGGGCTGACCGGGTCCTTCTTGTTTCTT

CTGTTCAAGTTTCATCTTCTTTGCCAGTAATTTCTTCTCTTTTAACTTCTGGCGCTTCTTCCGGCGCTTTGCGGTCTGC

TCCTCTGCAGCAATTTTATTCTTTTCCAGTCTTTTCTGAAACTCTGCATCCAATTTTGCTTCTCAGCCATGGCATCCA

TGTAGTCCTGTCGCTGATATTCTCTCCGGCGCAGATGTCTGTACACGTGGAACTCTCCACTGCCGGCCCCAGCACTTGA

ACCCATGACATCTCGGACAAATTCTGGGGGAGGTCGAGGTGCCCATTCACTCATTTTCTCTGGAATTGGAACTGCTTTG

TCCGGGTTCTTCATGAGCCGCTCCAGCTTGAGCTTCTGCTCCTCCGCCGCATTCTGGGGATGACGAGCGTCTGCGGCT

CTTTCTTGGGCCTCGGTGGTCGCACCGAGGAGGCGGCTGGGCTAGCCATGGCAGCCTTCCAGTTTCACAAGCGAGCGGC

GGCGTCGGCGCGCAAGTATGACGACACAGCCGCGG

BLASTP results for NOV5 are shown in Table 5D. TABLE 5D-continued TABLE 5D 3O BLAST results for NOVS BLAST results for NOV5 Matching Matching Entry (in Entry (in SwissProt + ala % % E SwissProt + ala % % E SpTrEMBL) Description Length. Identity Positive Value SpTrEMBL)p lir ) DescriptionDescripti Length.ng Identitventity PositiPositive Valalue is AKO18438, PROTEIN. (90%) (92%) Q9H875; CDNA FLJ13902 184 184/184 184/184, 1e-102 BAB312121 nts AKO23964; FIS, CLONE (100%) (100%) nusculus. BAB14742.1 THY RO1OO1793. 6/2001 hono Sapiens. Q9V7K1; CG8441 253 75/158 99/158, 3e-30 3/2001 AEOO3808: PROTEIN. (47%) (63%) Q9CWv6, S430424D23RIK 186 170/106 174,186, 4-89 40 '' tellste AKO10359; PROTEIN. mus (91%) (94%) 5ig BAB26879.1 musculus. 6/2001 Q9CY32; 843O424D23RIK 186 170/186 174/186, 4e–89 AKO10359; PROTEIN. mus (91%) (94%) 45. A multiple Sequence alignment is given in Table 5E, with BAB26879.1 musculus.6/2001 the NOV5 protein of the invention being shown on lines 1 Q9CXA5; 843.0424D23RIK 148 133/148 136/148, 2e-67 in a ClustalW analysis comparing NOV5 with related pro tein sequences of Table 5D.

TABLE 5E

Information for the ClustalW proteins: 1. SEQ ID NO:27, NOV5 2. SEQ ID NO : 29, Q9H875 CDNA FLJ13902 FIS, CLONE THYRO1001793. homo sapiens. 3/2001 3. SEQ ID NO:30, Q9CWV6 8430 424D23RIK PROTEIN mus musculus 6/2001 4. SEQ ID NO:31, Q9CY32 8430 424D23RIK PROTEIN mus musculus 6/2001 5. SEQ ID NO:32, Q9CXA5 8430 424D23RIK PROTEIN mus musculus 6/2001 6. SEQ ID NO:33, Q9V7K CG8441 PROTEIN, drosophila melanogaster. 5/2000

US 6,989,232 B2 71 72

TABLE 5E-continued Information for the ClustalW proteins: NOW5 45 AWPIPEKMSEWAPRP-PPEFWRDWMGSSAGAGSGEFHWYRHILRRREYQRODYMDAMAEKQ 103 Q9H875 45 WAPRP-PPEFWRDWMGSSAGAGSGEFHWYRHLRRREYQRQDYMDAMAEKQ 103 Q9CWV6 45 A-PPEFWRDWMGSSAGAGSGEFHWYRHLRRREYQRQDYMDAMAEKQ 103 Q9CY32 45 A-PPEFWRDWMGSSAGAGSGEFHWYRHLRRREYQRQDYMDAMAEKQ 103 Q9CXA5 7 A-PPEFWRDWMGSSAGAGSGEFHWYRHLRRREYQRQDYMDAMAEKQ 65 Q9v7K1 61 PVVIPEORRERDFMSSVPTFVRSVMGSSAGAGSGEFHWYRHLRRKEYAROKNINQSARE 120

NOW5 O4 G 153 Q9H875 104 G 153 Q9CWW6 104 KLDAEFQKRLEKNKIAAEEQTAKRRKKRQKL E 154 Q9CY 32 104 KLDAEFQKRLEKNKIAAEEQTAKRRKKRQKL E 154 Q9CXA5 66 KLDAEFQKRLEKNKIAAEEQTAKRRKKRQKL E 116 Q9V7K1 121 AADEAYOKLEDNRRAAEEKTAKRAKRLKRKORAKKPREEKKPLAKEASEDSNTDSEEE 180

NOW5 54 84 Q9H875 154 84 Q9CWV6 155 86 Q9CY 32 155 86 Q9CXA5 117 48 Q9W7K1 181 AVSNTEAKSAEDTNAVELDSTEATKE 240

NOW5 84 ------84 Q9H875 184 84 Q9CWV6 186 86 Q9CY 32 1.86 86 Q9CXA5 148 48 Q9W7K1 241 SQNVDQEQDKPWP 253

ProDom results for NOV5 were collected from using a proprietary database. The results are listed in Table 5F with the Statistics and domain description.

TABLE 5F

ProDom results for NOW5 ProDom Analysis

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs : Score P(N) pram: 38062 p36 (1) INCE CHICK//INNER CENTROMERE PROTEIN . . 119 1.1e-O 6 pram: 26.211 p36 (1) D7 DICDI//CAMP-INDUCIBLE PRESPORE PR . . 82 OOOO51 pram: 4957 p36 (5) CALD (5) //CALDESMON CDM MUSCLE PROTE . . . 74 O. OO 41 pram: 2 2005 p36 (1) INCE CHICK//INNER CENTROMERE PROTEIN . . 72 OOOTO

>pram: 38062 p36 (1) INCH CHICK//INNER CENTROMERE PROTEIN (INCENP). CELL DIVISION; MICROTUBULES; COILED COIL; CENTROMERE; MITOSIS; CELL CYCLE; NUCLEAR PROTEIN; ALTERNATIVE SPLICING 218 aa. Identities = 31/94 (32%), Positives = 57/94 (60%) for NOV5: 86-179, Sbjct: 9-98 Identities = 29/97 (29%), Positives = 55/97 (56%) for NOV5: 86-182, Sbjct: 9-104 Identities = 24/79 (30%), Positives 46/79 (58%) for NOV5: 98-176, Sbict: 2-73 >pram. 26.211 p.36 (1) D7 DICDI//CAMP-INDUCIBLE PRESPORE PROTEIN D7 PRECURSOR. SPORULATION; SIGNAL, 112 aa. Identities = 24/90 (26%), Positives 47/90 (52%) for NOV5: 88-177, Sbict. 16-96 Identities = 21/76 (27%), Positives 38/76 (50%) for NOV5: 8-152, Sbict: 16-91 >pram: 4957 p36 (5) CALD (5) //CALDESMON CDM MUSCLE PROTEIN ACTIN-BINDING CALMODULIN-BINDING PHOSPHORYLATION ALTERNATIVE SPLICING REPEAT 89 aa. Identities = 24/73 (32%), Positives = 40/73 (54%) for NOV5: 11-184, Sbjct: 8-80 >pram: 2 2005 p36 (1) INCE CHICK//INNER CENTROMERE PROTEIN (INCENP). CELL DIVISION; MICROTUBULES; COILED COIL; CENTROMERE; MITOSIS; CELL CYCLE; NUCLEAR PROTEIN; ALTERNATIVE SPLICING 71 aa. Identities = 18/67 (26%), Positives 40/67 (59%) for NOV5: 96-160, Sbict: 2-68 Identities = 16/56 (28%), Positives 29/56 (51%) for NOV5: 86-71, Sbict: 16-71 US 6,989,232 B2 73 74

TABLE 5F-continued

ProDom results for NOW5 PROSITE-Protein Domain Matches for Gene ID: NOW05 Pattern-ID: PKC PHOSPHO SITE PSOOOO5 (Interpro) PDOC00005 Pattern-DE: Protein kinase C phosphorylation site Pattern: IST.RK Pattern-ID: CK2 PHOSPHO SITE PSOOOO6 (Interpro) PDOCOOOO6 Pattern-DE: Casein kinese II phosphorylation site Pattern: IST. {2}IDE Pattern-ID: MYRISTYL PSOOOO8 (Interpro) PDOCOOOO8 Pattern-DE: N-myristoylation site

The INCE CHICK//INNER CENTROMERE PROTEIN (INCENP) is involved in cell division, microtubules, and centromeres. It is also involved with cell cycle through involvement with nuclear proteins and alternative splicing. The D7 DICDI/CAMP-INDUCIBLE PRESPORE PRO TEIN D7 PRECURSOR is involved with cell signaling and Sporulation. BLOCKS analysis was also performed on NOV5. Protein families that NOV5 was similar to are shown in Table 5G.

TABLE 5G BLOCKS Analysis of NOV5 AC# Description Strength Score BLOO500 O Thymosin beta- 4 family proteins. 993 O89 BLO1103E O Aspartate-semialdehyde dehydrogenase proteins 1372 O57 BL00936A O Ribosomal protein L35 proteins. 518 O39 BL010O2C O Translationally controlled tumor protein. 430 O26 BLO 1179A. O. Phosphotyrosine interaction domain proteins ( 1196 O25 BLO 1104C O Ribosomal protein L13e proteins. 458 O22 BLOO 412B O Neuromodulin (GAP-43) proteins. 927 OO6 BLO 1252D O Endogenous opioids neuropeptides precursors p 1763 OO5 BLO 1118B O Translation initiation factor SUI1 proteins. 517 OO3 BLOO892B O HIT family proteins. 5 OO OO2

Other BLAST results include sequences from the Patp database, which is a proprietary database that contains Sequences published in patents and patent publications. Patp results include those listed in Table 5H.

TABLE 5H Patp alignments of NOV5

; ; Sequences producing High-scoring Segment Pairs : Identity Positive patp:AAB50322 Human cytoskeleton-associated protein #2 - . 100; 100; patp: AAB94798 Human protein sequence SEQ ID NO: 15925-Ho . . . 100; 100; US 6,989,232 B2 75 76

TABLE 5H-continued Patp alignments of NOV5

Sequences producing High-scoring Segment Pairs : Identity Positive patp: AAG 429 O2 Arabidopsis thaliana protein fragment SEQ I 453 57. patp: AAG 42903 Arabidopsis thaliana protein fragment SEQ I 453 57. patp: AAG 42904 Arabidopsis thaliana protein fragment SEQ I 453 57. patp: AAG51246 Arabidopsis thaliana protein fragment SEQ I 473 583 patp: AAG51247 Arabidopsis thaliana protein fragment SEQ I 473 583 patp AAG51248 Arabidopsis thaliana protein fragment SEQ I 473 583

NOV5 is expressed in at least the following tissues: lung, cancer, and/or other pathologies and disorders. For example, ovary, prostate, tonsil, breast cancer, and ovarian cancer. a cDNA encoding the novel protein (NOV5) may be useful This information was derived by determining the tissue 25 in cancer therapy, and the novel protein (NOV5) may be Sources of the Sequences that were included in the invention including but not limited to SeqCalling sources, Public EST useful when administered to a subject in need thereof. By Sources, Literature Sources, and/or RACE Sources. way of nonlimiting example, the compositions of the present The disclosed NOV5 nucleic acid encoding a novel invention will have efficacy for treatment of patients suffer protein includes the nucleic acid whose Sequence is provided ing from cancer including but not limited to breast and in Table 5A, or a fragment thereof. The invention also ovarian cancer. The NOV5 nucleic acid encoding novel includes a mutant or variant nucleic acid any of whose bases protein, of the invention, or fragments thereof, may further may be changed from the corresponding base shown in be useful in diagnostic applications, wherein the presence or Table 5A while still encoding a protein that maintains its amount of the nucleic acid or the protein are to be assessed. activities and physiological functions, or a fragment of Such 35 NOV5 nucleic acids and polypeptides are further useful in a nucleic acid. The invention further includes nucleic acids the generation of antibodies that bind immuno-specifically whose Sequences are complementary to those just described, to the novel NOV5 substances for use in therapeutic or including nucleic acid fragments that are complementary to diagnostic methods. These antibodies may be generated any of the nucleic acids just described. The invention 40 according to methods known in the art, using prediction additionally includes nucleic acids or nucleic acid from hydrophobicity charts, as described in the “Anti fragments, or complements thereto, whose Structures NOVX Antibodies' Section below. The disclosed NOV5 include chemical modifications. Such modifications include, protein has multiple hydrophilic regions, each of which can by way of nonlimiting example, modified bases, and nucleic be used as an immunogen. In one embodiment, a contem acids whose Sugar phosphate backbones are modified or 45 plated NOV5 epitope is from about amino acids 1 to 20. In derivatized. These modifications are carried out at least in another embodiment, a NOV5 epitope is from about amino part to enhance the chemical Stability of the modified nucleic acids 25 to 45. In additional embodiments, NOV5 epitopes acid, Such that they may be used, for example, as antisense are from about amino acids 50 to 55, from about amino acids binding nucleic acids in therapeutic applications in a Subject. 50 60 to 70, from about amino acids 85 to 100, and from about In the mutant or variant nucleic acids, and their amino acids 105 to 175. These novel proteins can be used in complements, up to about 37% percent of the bases may be assay Systems for functional analysis of various human So changed. disorders, which will help in understanding of pathology of The disclosed NOV5 protein of the invention includes the the disease and development of new drug targets for various novel protein whose sequence is provided in Table 5B. The 55 disorders. invention also includes a mutant or variant protein any of NOV6 whose residues may be changed from the corresponding residue shown in Table 5B while still encoding a protein that In another embodiment, the EIF-2B epsilon subunit-like maintains its activities and physiological functions, or a protein is NOV6 (alternatively referred to herein as functional fragment thereof. In the mutant or variant protein, 60 24SC300), which includes the 2456 nucleotide sequence up to about 37% percent of the residues may be So changed. (SEQ ID NO:34) shown in Table 6A. A NOV6 ORF begins The invention further encompasses antibodies and anti with a Kozak consensus ATG initiation codon at nucleotides body fragments, Such as F, or (F) that bind immuno 836-838 and ends with a TGA codon at nucleotides Specifically to any of the proteins of the invention. 1934-1936. Putative untranslated regions upstream from the The NOV5 nucleic acids and proteins of the invention are 65 initiation codon and downstream from the termination codon useful in potential therapeutic applications implicated in are underlined in Table 6A, and the Start and Stop codons are cancer including but not limited to breast cancer, ovarian in bold letters. US 6,989,232 B2 77 78

TABLE 6A NOV6 Nucleotide Sequence (SEQ ID NO:34) GAATTCCTGACTGCCACAGGTGTACAGGAAACATTTGTCTTTTGTTGCTGGAAAGCTGCTCAAATCAAAGAACA

TTTACTGAAGTCAAAGTGGTGCCGCCCTACATCTCTCAATGTGGTTCGAATAATTACATCAGAGCTCTATCGAT

CACTGGGAGATGTCCTCCGTGATGTTGATGCCAAGGCTTTGGTGCGCTCTGACTTTCTTCTGGTGTATGGGGAT

GTCATCTCAAACATCAATATCACCAGAGCCCTTGAGGAACACAGGTTGAGACGGAAGCTAGAAAAAAATGTTTC

TGTGATGACGATGATCTTCAAGGAGTCATCCCCCAGCCACCCAACTCGTTGCCACGAAGACAATGTGGTAGTGG

CTGTGGATAGTACCACAAACAGGGTTCTCCATTTTCAGAAGACCCAGGGTCTCCGGCGTTTTGCATTTCCTCTG

AGCCTGTTTCAGGGCAGTAGTGATGGAGTGGAGGTTCGATATGATTTACTGGATTGTCATATCAGCATCTGTTC

TCCTCAGGTGGCACAACTCTTTACAGACAACTTTGACTACCAAACTCGAGATGACTTTGTGCGAGGTCTCTTAG

TGAATGAGGAGATCCTAGGGAACCAGATCCACATGCACGTAACAGCTAAGGAATATGGTGCCCGTGTCTCCAAC

CTACACATGTACTCAGCTGTCTGTGCTGACGTCATCCGCCGATGGGTCTACCCTCTCACCCCAGAGGCGAACTT

CACTGACAGCACCACCCAGAGCTGCACTCATTCCCGGCACAACATCTACCGAGGGCCTGAGGTCAGCCTGGGCC

ATGGCAGCATCCTAGAGGAAAAGTGCTCCTGGGCTCTGGCACTGTCATTGGCAGCAATTGCTTTATCACCAAC

AGTGTCATTGGCCCCGGCTGCCACATTGGTGAGCACAGGTGATAACGTGGTGCTGGACCAGACCTACCTGTGGC

AGGGTGTTCGAGTGGCGGCTGGAGCACAGATCCATCAGTCTCTGCTTTGTGACAATGCTGAGGTCAAGGAACGA

GTGACACTGAAACCACGCTCTGTCCTCACTTCCCAGGTGGTCGTGGGCCCAAATATCACGCTGCCTGAGGGCTC

GGTGATCTCTTTGCACCCTCCAGATGCAGAGGAAGATGAAGATGATGGCGAGTTCAGTGATGATTCTGGGGCTG

ACCAAGAAAAGGACAAAGTGAAGATGAAAGGTTACAATCCAGCAGAAGTAGGAGCTGCTGGCAAGGGCTACCTC.

TGGAAAGCTGCAGGCATGAACATGGAGGAAGAGGAGGAACTGCAGCAGAATCTGTGGGGACTCAAGATCAACAT

GGAAGAAGAGAGTGAAAGTGAAAGTGAGCAAAGTATGGATTCTGAGGAGCCGGACAGCCGGGGAGGCTCCCCTC.

AGATGGATGACATCAAAGTGTTCCAGAATGAAGTTTTAGGAACACTACAGCGGGGCAAAGAGGAGAACATTTCT

TGTGACAATCTCGTCCTGGAAATCAACTCTCTCAAGTATGCCTATAACATAAGTCTAAAGGAGGTGATGCAGGT

ACTGAGCCACGTGGTCCTGGAGTTCCCCCTGCAACAGATGGATTCCCCGCTTGACTCAAGCCGCTACTGTGCCC

TGCTGCTTCCTCTGCTAAAGGCCTGGAGCCCTGTTTTTAGGAACTACATAAAGCGCGCAGCCGACCATTTGGAA

GCGTTAGCAGCCATTGAGGACTTCTTCCTAGAGCATGAAGCTCTTGGTATTTCCATGGCCAAGGTACTGATGGC

TTTCTACCAGCTGGAGATCCTGGCTGAGGAAACAATTCTGAGCTGGTTCAGCCAAAGAGATACAACTGACAAGG

GCCAGCAGTTGCGCAAGAATCAACAGCTGCAGAGGTTCATCCAGTGGCTAAAAGAGGCAGAAGAGGAGTCATCT

GAAGATGACGAAGTCACACTGCCTGCTCCTTTGGGTGTGATTGAGTGCCCTCCTGGCTCCTGGGCTGGGACAA

GTGAGGAACTAGCTGCAGAGGGATGAGTGACCACCATCCAGGCTGAGACTGAAAGGAGCAGAGGCTGGAACTAC

AGTATTCTTTCCCCTGCTAGCAACCATGTGCCTCCCATCCTGACTGTGGAGTTGGGATGTGGAAGTGGGGCTGG

AACAAAGCTTCTGCCTAGGGAGGAGCTAAGCAGGCCCGGCAGTTGGAGGAAGGCCAGAGGAACAGCTTTGTGCT

CCGGCTTTCCCTCAGGGAACAGCAGAGAGCAGTTGGCTCTTTCTGCTGCTTGTATATGTTAATATTAAAAGAGA

GAGTGGTGTATTTGGTTTGTCTCCATCCCCGACTAATCAGCCAGTGAAGTATGTGACCAGAATCACATGATAGC

CTTTCCTTAACACCTGGGGGAGAGGGAGGACGGGTGTGCCAGCCACTAGGTGGTACTGTGGTACCTTGCTAATT

AACCTTTCCCATGG

The NOV6 40789.4 Dalton protein (SEQ ID NO:35) amino acids VSL-AP. NOV6 is likely to be localized outside encoded by SEQ ID NO:34 is 366 amino acids in length and the cell with a certainty of 0.6138. In alternative is presented using the one-letter code in Table 6B. The Psort embodiments, a NOV6 polypeptide is located to the lyso profile for NOV6 predicts that this sequence has a signal 65 some (lumen) with a certainty of 0.01900, the endoplasmic peptide. The most likely cleavage site for a NOV6 peptide reticulum (membrane) with a certainty of 0.1000, or the is between amino acids 21-22, i.e. at the dash between endoplasmic reticulum (lumen) with a certainty of 0.1000.

US 6,989,232 B2 81 82

TABLE 6 C-continued NOV6 reverse complement

TGGCAACGAGTTGGGTGGCTGGGGGATGACTCCTTGAAGATCATCGTCATCACAGAAACATTTTTTTCTAGCTTCCGT

CTCAACCTGTGTTCCTCAAGGGCTCTGGTGATATTGATGTTTGAGATGACATCCCCATACACCAGAAGAAAGTCAGAG

CGCACCAAAGCCTTGGCATCAACATCACGGAGGACATCTCCCAGTGATCGATAGAGCTCTGATGTAATTATTCGAACC

ACATTGAGAGATGTAGGGCGGCACCACTTTGACTTCAGTAAATGTTCTTTGATTTGAGCAGCTTTCCAGCAACAAAAG

ACAAATGTTTCCTGTACACCTGTGGCAGTCAGGAATTC

BLASTP results for NOV6 are shown in Table 6D.

TABLE 6D

BLAST results for NOV6 Matching Entry (in SwissProt + ala % % E SpTrEMBL) Description Length. Identity Positive Value E2BE HUMAN: TRANSLATION INITIATION 641 335/336 336/336, O.O U23028; FACTOREIF-2B. EPSILON (100%) (100%) AACSO 646.1 SUBUNIT (EIF-2B GDP GTPEXCHANGE FACTOR) (FRAGMENT). homo Sapiens. 7/1999 E2BE RABIT; TRANSLATION INITIATION 721 294/336 318,336, 1e-171 U23037; FACTOREIF-2B. EPSILON (88%) (95%) AAC486.18.1 SUBUNIT (EIF-2B GDP GTPEXCHANGE FACTOR). Oryctolagus cuniculus. 7/1999 E2BE RAT: TRANSLATION INITIATION 716. 292/.336 314/336, 1e-168 U19516; FACTOREIF-2B. EPSILON (87%) (93%) AAB17690.1 SUBUNIT (EIF-2B GDP GTPEXCHANGE FACTOR). rattus norvegicus. 7/1999 O64760; PUTATIVE TRANSLATION 730 100/362. 170/362, 1e-34 ACOO4238; INITIATION FACTOR (28%) (47%) AAC12836.1 EIF-2B-EPSILON SUBUNIT arabidopsis thaliana. 6/2001 Q9SRU3; PUTATIVE TRANSLATION 676 96/341 166/341, 8e-29 ACOO9755, INITIATION FACTOR (28%) (49%) AAFO2111.1 EIF-2B. EPSILON SUBUNIT arabidopsis thaliana. 6/2001

A multiple Sequence alignment is given in Table 6E, with in a ClustalW analysis comparing NOV6 with related pro the NOV6 protein of the invention being shown on lines 1 tein sequences of Table 6D.

TABLE 6E Information for the ClustalW protein: 1. SEQ ID NO:35, NOV6 2. SEQ ID NO:37, E2BE HUMAN EIF-2B GDP-GTPEXCHANGE FACTOR 7/1999 3. SEQ ID NO:38, E2BE RABIT EIF-2B GDP-GTPEXCHANGE FACTOR 7/1999 4. SEQ ID NO:39, E2BE RAT EIF-2B GDP-GTPEXCHANGE FACTOR 7/1999 5. SEQ ID NO: 40, O6476O PUTATIVE EIF-2B-EPSILON SUBUNIT 6/2001 6. SEQ ID NO: 41, Q9SRU3 PUTATIVE EIF-2B EPSILON SUBUNTI 6/2001

NOW6 1 ------1 E2BE HUMAN 1 ------1 E2BE RABIT 1 MATTWWAPPGAVSDRANKRGGGPGGGGGGGGARGAEEESPPPLQAVLVADSFNRRFFPIS 60 E2BE RAT 1 -MAATAAWPSAWGGRANKRGGGSGGGG----TOGAEEEPPPPLQAVLVADSFDRRFFPIS 55 O64760 1 ------MGAQKKGGAAARVSEDAEWOS------RHRLQAILLADSFATKFRPVT 42 Q9SRU3 1 ------MASRKK--RAAKISEDSEEEQS------TTQTLQAILLADSFATKLLPLT 42

US 6,989,232 B2 85 86

TABLE 6E-continued Information for the ClustalW protein:

NOW6 613 64 1. E2BE HUMAN 613 64 1. E2BE RABIT 693 721 E2BE RAT 688 71.6 O64760 694 730 Q9SRU3 651 676

ProDom results for NOV6 were collected from a public the PFAM HMM database. The results are listed in Table 6F database. DOMAIN results for NOV6 were collected using with the Statistics and domain description.

TABLE 6F

Domain results for NOV6 ProDom Analysis pram: 15525 p36 (2) E2BE (2)//TRANSLATION FACTOR EIF-2B INITIATION EPSILON SUBUNIT GDP-GTP EXCHANGE AMINO-ACID BIOSYNTHESIS 311 a.a. Identities = 270/311 (86%), Positives = 290/.311 (93%) for Query: 56-366 and Sbjct: 1-311 >pram: 14746 p36 (2) // FACTOR TRANSLATION EIF-2B SUBUNIT EXCHANGE INITIATION EPSILON GDP-GTP AMINO-ACID BIOSYNTHESIS 261 aa. Identities = 61/245 (24%), Positives = 109/245 (44%) for Query: 129-358 and Sbjct: 17-261 >pram: 3752 p36 (7) IF5 (7)//INITIATION FACTOR PROTEIN EUKARYOTIC TRANSLATION EIF-5 BIOSYNTHESIS GTP-BINDING PROBABLE ALTERNATIVE, 260 aa. Identities = 37/94 (39%), Positives = 51/94 (54%) for Query: 278-363 and Sbjct: 126-219 >pram: 48803 p36 (1) SSRP DROME//SINGLE-STRAND RECOGNITION PROTEIN (SSRP) (CHORION-FACTOR 5). DNA-BINDING; RNA-BINDING; NUCLEAR PROTEIN, 58 aa. Identities = 9/20 (45%), Positives = 15/20 (75%) for Query: 100-119 and Sbjct: 2-20 Identities = 10/29 (34%), Positives = 15/29 (51%) for Query: 165-193 and Sbjct: 29-56 >pram: 25633 p36 (1) FKB1 DROME//39 KD FK506-BINDING NUCLEAR PROTEIN (PEPTIDYL-PROLYL CIS TRANSISOMERASE) (PPIASE) (EC 5. 2. 1.8). ISOMERASE, ROTAMASE; NUCLEAR PROTEIN, 85 aa Identities = 27/85 (31%), Positives = 42/85 (49%), for Query: 102-186, Sbjct: 3-78 PFAM HMM Domain Analysis of NOV06 Model Description Score E-value

W2 (InterPro) e-IF 4-gamma/eIF5/eIF2-epsilon 121.5 1. 6e-32 hormone2 (InterPro) Peptide hormone 10. 4 O6

Parsed for domains:

Model Domain seq-f seq-t hinm-f hinm-t SCOe E-value

hormone2 1/1 342 357 . . 13 28 . 10. 4 O6 W2 1/1 284 366 . I 1 87 121.5 1. 6e-32

PROSITE-Protein Domain Matches for Gene ID: NOWO6 Pattern-ID: ASN GLYCOSYLATION P50 0001 (Interpro) PDOC00001 Pattern-DE: N-glycosylation sites Pattern: NIPIST IP NOV6 Position: 85- NITL; 213-NISC; 231-NISL Pattern-ID: PKC PHOSPHO SITE PSOOOO5 (Interpro) PDOC00005 Pattern-DE: Protein kinase C phosphorylation sites Pattern: IST.RK NOV6 Position: 69 -TLK; 225 - SLK; 233-SLK; 259-SSR; 331-SQR; 336-TDK Pattern-ID: CK2 PHOSPHO SITE PSOOOO6 (Interpro) PDOCOOOO6 US 6,989,232 B2 87 88

TABLE 6F-continued

Domain results for NOV6

Pattern-DE: Casein kinase II phosphorylation sites Pattern: IST. {2} DE NOV6 Position: 29-STGD; 87-TLPE; 114-SGAD; 170-SESE; 233-SLKE; 255-SPLD 331-SQRD; 362-SSED Pattern-ID: MYRISTYL PSOOOO8 (Interpro) PDOCOOOO8 Pattern-DE: N-myristoylation sites Pattern: GEDRKHPFYW. {2} (STAGCN Pl NOV6 Position: 44-GWRVAA; 91-GSWISL; 161-GLKINM; 305-GISMAK BLOCKS Analysis AC# Description Strength Score BLOO260 O Glucagon / GIP / secretin / VIP family protei 1460 1100 BLOO5O1B O Signal peptidases I serine proteins. 1234 1061 BLOO558A 0 Eukaryotic mitochondrial porin proteins 1284 1056 BLOO 486C 0 DNA mismatch repair proteins mutS family prot 1682 1037 BLOO 808J 0 ADP-glucose pyrophosphorylase proteins. 1397 1036 BLOO 992B 0 Serum amyloid. A proteins. 1851 1024 BLO 1271B O Sodium: sulfate symporter family proteins. 1480 1022 BLOO 132E O Zinc carboxypeptidases, zinc-binding region 1 1608 1020

35 The translation factor eif-2B initiation epsilon subunit is NOV6 is expressed in at least the following tissues: involved with GDP-GTP exchange, and amino acid biosyn- placenta, Small intestine, larynx, kidney, muscle, colon, thesis. The initiation factor protein eukaryotic translation tonsil, stomach, uterus, bone marrow, brain and others This EIF-5 is thought to be involved with biosynthesis and information was derived by determining the tissue Sources GTP-binding. The single-strand recognition protein (SSRP) 40 of the sequences that Were included in the invention includ (chorion-factor 5) is involved with DNA-binding; and RNA. ing but E. limited tO Sally,NE Public EST binding. The FK506-binding nuclear protein (peptidyl- Sources, Literature sources, and/or KA Sources. prolyl cis-trans isomerase) (PPIASE) (EC 5.2.1.8) is a The disclosed NOV6 nucleic acid encoding a novel rotamase, and is involved with nuclear proteins. protein includes the nucleic acid whose Sequence is provided in Table 6A, or a fragment thereof. The invention also Other BLAST results include Sequences from the Palp 45 includes a mutant or variant nucleic acid any of whose bases database, which is a proprietary database that contains may be changed from the corresponding base shown in Sequences published in patents and patent publications. Patp Table 6A while still encoding a protein that maintains its results include those listed in Table 6G. activities and physiological functions, or a fragment of Such

TABLE 6G Patp alignments of NOV6

Sequences producing High-scoring Segment Pairs : Identity Positive patp:AAB 43883 Human cancer associated protein sequence 29/96 55/96 SEQ ID NO: 1328 - Homo sapiens, 424 aa. PN = WO200055350-A1. (30%) (57%) Expect = 7.6e-06

The eIF4-gamma/eIF5/eIF2-epsilon proteins are involved a nucleic acid. The invention further includes nucleic acids with regulation of genes at the translational level, and are whose Sequences are complementary to those just described, involved with GTP-GDP exchange. Peptide hormones are including nucleic acid fragments that are complementary to involved in many physiological processes including glucose 65 any of the nucleic acids just described. The invention and fat metabolism, immune System regulation, and neu- additionally includes nucleic acids or nucleic acid ronal regulation. fragments, or complements thereto, whose Structures US 6,989,232 B2 89 90 include chemical modifications. Such modifications include, be useful in diagnostic applications, wherein the presence or by way of nonlimiting example, modified bases, and nucleic amount of the nucleic acid or the protein are to be assessed. acids whose Sugar phosphate backbones are modified or NOV6 nucleic acids and polypeptides are further useful in derivatized. These modifications are carried out at least in the generation of antibodies that bind immuno-specifically part to enhance the chemical Stability of the modified nucleic to the novel NOV6 substances for use in therapeutic or acid, Such that they may be used, for example, as antisense diagnostic methods. These antibodies may be generated binding nucleic acids in therapeutic applications in a Subject. according to methods known in the art, using prediction In the mutant or variant nucleic acids, and their from hydrophobicity charts, as described in the “Anti complements, up to about 13% percent of the bases may be NOVX Antibodies' Section below. The disclosed NOV6 So changed. protein has multiple hydrophilic regions, each of which can The disclosed NOV6 protein of the invention includes the be used as an immunogen. In one embodiment, a contem novel protein whose sequence is provided in Table 6B. The plated NOV6 epitope is from about amino acids 60 to 75. In invention also includes a mutant or variant protein any of another embodiment, a NOV6 epitope is from about amino whose residues may be changed from the corresponding acids 100 to 135. In additional embodiments, NOV6 residue shown in Table 6B while still encoding a protein that 15 epitopes are from about amino acids 145 to 155, from about maintains its activities and physiological functions, or a amino acids 160 to 190, from about amino acids 200 to 220, functional fragment thereof. In the mutant or variant protein, from about amino acids 230 to 235, from about amino acids up to about 13% percent of the residues may be So changed. 250 to 270, from about amino acids 280 to 290, and from The invention further encompasses antibodies and anti about amino acids 320 to 360. These novel proteins can be body fragments, Such as F, or (F), that bind immuno used in assay Systems for functional analysis of various Specifically to any of the proteins of the invention. human disorders, which will help in understanding of The NOV6 nucleic acids and proteins of the invention are pathology of the disease and development of new drug useful in potential therapeutic applications implicated in targets for various disorders. cancer including but not limited to breast cancer, ovarian NOV7 cancer, and/or other pathologies and disorders. For example, 25 In another embodiment, the novel sequence is NOV7 a cDNA encoding the novel protein (NOV6) may be useful (alternatively referred to herein as 24SC526), which in cancer therapy, and the novel protein (NOV6) may be includes the 2004 nucleotide sequence (SEQ ID NO:42) useful when administered to a subject in need thereof. By shown in Table 7A. A NOV7 ORF begins with a Kozak way of nonlimiting example, the compositions of the present consensus ATG initiation codon at nucleotides 176-178 and invention will have efficacy for treatment of patients suffer ends with a TGA codon at nucleotides 404–406. Putative ing from cancer including but not limited to breast and untranslated regions upstream from the initiation codon and ovarian cancer. The NOV6 nucleic acid encoding novel downstream from the termination codon are underlined in protein, of the invention, or fragments thereof, may further Table 7A, and the start and stop codons are in bold letters.

TABLE 7A NOV7 Nucleotide Sequence (SEQ ID NO: 42) GGGCGCGCGGCCTCGAGGCCTTCCGGTGCGGGAGAAACTACTACTCCCATAATGCCCCGCGGTCCCGCGAGCTG

CCAGTCTCGTCGCGAGAAGCAGCGGCCCGGGGCGACTGAGCGGACAAACGGAAGTGTAGGTTACGGTCTGAGAC

ATCACCGCCAAGCTGGGCATCGGGGAGATGGCCGAGACTGACCCCAAGACCGTGCAGGACCTCACCTCGGTGGT

GCAGACACTCCTGCAGCAGATGCAAGATAAATTTCAGACCATGTCTGACCAGATCATTGGGAGAATTGATGATA

TGAGTAGTCGCATTGATGATCTGGAAAAGAATATCGCGGACCTCATGACACAGGCTGGGGTGGAAGAACTGGAA

AGTGAAAACAAGATACCTGCCACGCAAAAGAGTGAAGGTTGCTAATAATTTATACTGGAATCTGGCATTTTTC

CAAGCCAAGAGAAGATCGAATGGCTTTTTGCAGCTAACTACTATGTGTAGACAGGTTTTATATTATAAAGTATG

CATTCTTACACCTAGTATATAGTTAGTTTGTAGAGTGATTTCCCCCCAGTTTCTTGAACATGGTATCTTCACA

TCTTGGACCTTGGTCAGTTGTGCTATTCATTATTAAACACTAAAACTTTGGCGGTTCTTGCATAACATTGTCAG

ATTTTTTAGTGTATTTCTGTGAAGTCATTTTTTTTCTTGTCATTCCTTTTGTAGTAGTTGCTGTTTGGATAAAA

GTTGATGTGTGATTTTTTATTAAACAAATAGTAAACCCTTCAATTATAGTTAGTCTTGGTGAAGTAAGATGTTT

GTAGACTTTAGAGTTCTTTAATTCTTGGCACAACGTGACTTTTGAGCTAACACCAAATAGTGTGTTGGCAATAC

TTTTCAAATGGCTGAAAACACCTAAAAATTGTTCATTCAGAAATATCTGTCACTGCTCTGTTGCCAAAACTCAG

AATAGAACTTAGACGTATGTCTGAGTCCCTGAGATCACATGCTAAAGTCGATGAAAAGTAACCACTGCCACTGT

CTTGTGTCAGAACTTTTACAGTACAGAAAATAACAGAATAGCCTTCTGTAATGAGGCGTTTGTTAGAGTTTTGC

ATGAGATTCTAATACTTCAGTAGGACCCTACCTACGTGGTTCATCTACAATGGTTACCATAAAAAATCTGGCAG

GATTTTAAAACT CAATCAGTCTTTCCTTTGAGCTAGTGACTTGAAAAGAAAGAGAGAAGAAAAAGAGACCATAT US 6,989,232 B2 91 92

TABLE 7A-continued NOV7 Nucleotide Sequence

TAAGTCCATGCCAGTTGCTTGGCTAGAATATGATCAACGACTTGTAGTAGACTCAAGTTTTTAAAAAACACTAT

TTTACTTAAACTGTTTCTTATCTAAATTCTTGCAGAGTGTCAATGTTATCATTGATTATAGAAGACAGGGATAA

TACCTTTATCTCTGGCCACTCAAAAATGCAGTGCCAGGAGTGCTAAACCTAGAGGCCAATACTGATGACCTGGA

AGGTGATCCATATGATTGTCACCACAAAGTGCTTTTACACAAAAACTTGAAAATTTGAAAAACATGATTTTTTT

AAGTTTCTCATCTCACCAGTCTTGGTGTTTATATTGCAAATCTATCAAAGTAAGAAATAATTTGTGCTGTATAC

AAATTACATGGGGAACATAAAGGAGTGAGATCCTTCTGTGATAAAATGAATTCACCACTCTGGTTACCCAACTA.

CAGAACCTCCTTTGATCAGGCCAGTAGGTTGTGATGCAGGCTGGAGCCCCCGAATGCCCCACACACACTGCAGC

ATTGACCAGACCATCCGAAACCTGCGTCCCTGGTGATGTTCTCAAGCCTCGGAAGTGGCAAATGGAAATGATAT

GGCCGGTTGCGGTTGTAGGAGAGTTGTGACTTAGGCAGGAGTCGACCTCCTCAAGTAATGGAACGATTTCAAAG

GCAGGCTGCCCTGACCAAAAATATCTGCCATGAATAAAGGTGCCTGAAATCCTGCTAAAAAAAAAAAAAAAAAA

AAAAAA

The NOV7 8543.5 Dalton protein (SEQ ID NO:43) 25 encoded by SEQ ID NO:42 is 76 amino acids in length and is presented using the one-letter code in Table 7B. The Psort profile for NOV7 predicts that this sequence has no known Signal peptide and is likely to be localized in the cytoplasm with a certainty of 0.6500. In alternative embodiments, a NOV7 polypeptide is located to the mitochondrial matrix space with a certainty of 0.1000, or the lysosome (lumen) with a certainty of 0.1000.

TABLE 7B NOV7 protein sequence (SEQ ID NO: 43) MAETDPKTVQDLTSWWQTLLQQMQDKFQTMSDQIIGRIDDMSSRIDDLEKNIADLMTQAGWEELESENKIPATQKS

The reverse complement for NOV7 is presented in Table 7C.

TABLE 7C NOV7 reverse complement (SEQ ID NO:44) TTTTTTTTTTTTTTTTTTTTTTTTAGCAGGATTTCAGGCACCTTTATTCATGGCAGATATTTTTGGTCAGGGCAGCCTG

CCTTTGAAATCGTTCCATTACTTGAGGAGGTCGACTCCTGCCTAAGTCACAACTCTCCTACAACCGCAACCGGCCATAT

CATTTCCATTTGCCACTTCCGAGGCTTGAGAACATCACCAGGGACGCAGGTTTCGGATGGTCTGGTCAATGCTGCAGTG

TGTGTGGGGCATTCGGGGGCTCCAGCCTGCATCACAACCTACTGGCCTGATCAAAGGAGGTTCTGTAGTTGGGTAACCA

GAGTGGTGAATTCATTTTATCACAGAAGGATCTCACTCCTTTATGTTCCCCATGTAATTTGTATACAGCACAAATTATT

TCTTACTTTGATAGATTTGCAATATAAACACCAAGACTGGTGAGATGAGAAACTTAAAAAAATCATGTTTTTCAAATTT

TCAAGTTTTTGTGTAAAAGCACTTTGTGGTGACAATCATATGGATCACCTTCCAGGTCATCAGTATTGGCCTCTAGGTT

TAGCACTCCTGGCACTGCATTTTTGAGTGGCCAGAGATAAAGGTATTATCCCTGTCTTCTATAATCAATGATAACATTG

ACACTCTGCAAGAATTTAGATAAGAAACAGTTTAAGTAAAATAGTGTTTTTTAAAAACTTGAGTCTACTACAAGTCGTT

GATCATATTCTAGCCAAGCAACTGGCATGGACTTAATATGGTCTCTTTTCCTTCCTCTTTCTTTTCAAGTCACTAGCT

CAAAGGAAAGACTGATTGAGTTTTAAAATCCTGCCAGATTTTTTATGGTAACCATTGTAGATGAACCACGTAGGTAGGG US 6,989,232 B2 93 94

TABLE 7C-continued NOV7 reverse complement

TCCTACTGAAGTATTAGAATCTCATGCAAAACTCTAACAAACGCCTCATTACAGAAGGCTATTCTGTTATTTTCTGTAC

TGTAAAAGTTCTGACACAAGACAGTGGCAGTGGTTACTTTTCATCGACTTTAGCATGTGATCTCAGGGACTCAGACATA

CGTCTAAGTTCTATTCTGAGTTTTGGCAACAGAGCAGTGACAGATATTTCTGAATGAACAATTTTTAGGTGTTTTCAGC

CATTTGAAAAGTATTGCCAACACACTATTTGGTGTTAGCTCAAAAGTCACGTTGTGCCAAGAATTAAAGAACTCTAAAG

TCTACAAACATCTTACTTCACCAAGACTAACTATAATTGAAGGGTTTACTATTTGTTTAATAAAAAATCACACATCAAC

TTTTATCCAAACAGCAACTACTACAAAAGGAATGACAAGAAAAAAAATGACTTCACAGAAATACACTAAAAAATCTGAC

AATGTTATGCAAGAACCGCCAAAGTTTTAGTGTTTAATAATGAATAGCACAACTGACCAAGGTCCAAGATGTGAAGATA

CCATGTTCAAGAAACTGGGGGGAAATCACTCTACAAACTAACTATATACTAGGTGATAAGAATGCATACTTTATAATAT

AAAACCTGTCTACACATAGTAGTTAGCTGCAAAAAGCCATTCGATCTTCTCTTGGCTTGGAAAAATGCCAGATTCCAGT

ATAAATTATAGCAACCTTCAACCTTTTGCGTGGCAGGTATCTGTTTTCACTCCAGTTCTCCACCCCAGCCCTGT

GTCATGAGGTCCGCGATATTCTTTTCCAGATCATCAATGCGACTACTCATATCATCAATTCTCCCAATGATCTGGTCAG

ACATGGTCTGAAATTTATCTTGCATCTGCTGCAGGAGTGTCTGCACCACCGAGGTGAGGTCCTGCACGGTCTTGGGGTC

AGTCTCGGCCATCTCCCCGATGCCCAGCTTGGCGGTGATGTCTCAGACCGTAACCTACACTTCCGTTTGTCCGCTCAGT

CGCCCCGGGCCGCTGCTTCTCGCGACGAGACTGGCAGCTCGCGGGACCGCGGGGCATTATGGGAGTAGTAGTTTCTCCC

GCACCGGAAGGCCTCGAGGCCGCGCGCCC

BLASTP results for NOV7 are shown in Table 7D.

TABLE 7D

BLAST results for NOV7 Matching Entry (in Swissprot + ala % % E SpTrEMBL) Description Length. Identity Positive Value HBP1 HUMAN: HEAT SHOCK 76 76/76 76/76, 4e-36 AFO68754; FACTORBINDING (100%) (100%) AAC25186.1 PROTEIN 1. hono Sapiens. 5/2OOO Q9COZ1: AKO18708; 0610007A03RIK 76 67/76 71/76, 8e-32 BAB31359.1 PROTEIN (88%) (93%) (SIMILAR TO HEAT SHOCK FACTOR BINDING PROTEIN1). nus musculus. 6/2001 Q9VK90; AEO03636; CG5446 PROTEIN. 86 44f61 51/61, 1e-18 AAF53188.1 drosophila (72%) (84%) nelanogaster. 5/2OOO Q9U3B7; Z77666: KO8E7.2 PROTEIN. 8O 36/54 44/54, 3e-13 CABO1233.2 caenorhabditis (67%) (81%) elegan.S. 3/2001 Q9FP22; APOO3O44; POO36C05.1 99 28/56 42/56, 5e-10 BAB19328.1 PROTEIN. oryza (50%) (75%) Satiya. 3/2001

A multiple Sequence alignment is given in Table 7E, with the NOV7 protein of the invention being shown on lines 1 in a ClustalW analysis comparing NOV7 with related pro tein sequences of Table 7D. US 6,989,232 B2 95 96

TABLE 7E Information for the ClustalW protein: 1. SEQ ID NO:43, NOV7 2. SEQ ID NO: 45, HBP1 HUMAN HEAT SHOCK FACTOR BINDING PROTEIN 1. 5/2000 3. SEQ ID NO:46, Q9CQZ1 O 610007A03RIK PROTEIN mus musculus. 6/2001 4. SEQ ID NO: 47, Q9VK90 CG5446 PROTEIN. drosophila melanogaster. 5/2000 5. SEQ ID NO : 48, Q9U3B7 KO8E7.2 PROTEIN. caenorhabditis elegans. 3/2001 6. SEQ ID NO: 49, Q9FP22 POO38C05. 1 PROTEIN. oryza sativa. 3/2001

NOWF 1 QQDKFQTMSDQIIGR HBP1 HUMAN 1 O

NOWF 43 SEELESENKIPATQKS------76 HBP1 HUMAN 43 ELESENKIPATQKS-- 76 Q9CQ21 43 ELDPENKIPTAQKS------76 Q9WK90 61 GQGPEK------86 Q9U3B7 55 SEHPPSAQ------8O Q9FP22 60 3EINDLKVEMGTEGITPTKPKDEESKPAGSSAE 99

BLASTP domain results for NOV7 were collected from a proprietary database. The results are listed in Table 7F with the Statistics and domain description.

TABLE 7F

Domain results for NOW7 ProDom Analysis

Smallest Sum High Probability Sequence producing High-scoring Segment Pairs : Score P(N) pram: 42125 p36 (1) STE4 SCHPO // SEXUAL DIFFERENTIATION P . . . 78 OOO3O pram:56790 p36 (1) BUD6 YEAST // BUD SITE SELECTION PROTE . . . 73 OOO59 pram:53072 p36 (1) GAGY DROME // RETROVIRUS-RELATED GAG P . . . 57 OOOT 4 pram: 35747 p36 (1) RLX2 SALTY // 22 KD RELAXATION PROTEIN . . . 69 OO 17 pram: 89.37 p36 (3) YOPE (3) // OUTER MEMBRANE VIRULENCE P . . . 64 O OF3 pram: 42125 p36 (1) STE4 SCHPO// SEXUAL DIFFERENTIATION PROTEIN STE4. MEIOSIS, 264 aa dentities = 20/70 (28%), Positives = 42/70 (60%) for NOV7: 11-76, Sbjct: 62-131 >pram.56790 p36 (1) BUD6 YEAST // BUD SITE SELECTION PROTEIN BUD6 (ACTIN INTERACTING PROTEIN 3), 788 aa. dentities = 12/50 (24%), Positives = 32/50 (64%) for NOV7: 20-69, Sbjct: 559-608 dentities = 7/24 (29%), Positives = 14/24 (58%) for NOV7: 3-26, Sbjct: 106-129 >pram.53072 p36 (1) GAGY DROME // RETROVIRUS-RELATED GAG POLYPROTEIN (TRANSPOSON GYPSY). CORE PROTEIN; POLYPROTEIN; TRANSPOSABLE ELEMENT, 451 aa. dentities = 12/38 (31%), Positives = 20/38 (52%) for NOV7: 5-41, Sbjct : 43-80 dentities = 8/19 (42%), Positives = 13/19 (68%) for NOV7: 58-76, Sbjct: 412-430 >pram:35747 p36 (1) RLX2 SALTY // 22 KD RELAXATION PROTEIN PLASMID, 194 aa. dentities = 20/70 (28%), Positives = 37/70 (52%) for NOV7: 7-24, Sbjct: 20-89 >pram: 89.37 p36 (3) YOPE (3) // OUTER MEMBRANE VIRULENCE PROTEIN YOPE PLASMID, 219 aa. dentities = 16/37 (43%), Positives = 22/37 (59%) for NOV7: 2-38, Sbjct: 111-147 PFAM HMM Domain Analysis Scores for sequence family classification (score includes all domains):

Model Description Score E-value N

Leptin (InterPro) Leptin 2 - 2 10 1 US 6,989,232 B2

TABLE 7F-continued

Domain results for NOW7 Parsed for domains:

Model Domain seq-f seq-t himm-f himm-t SCOe E-value

Leptin 1/1 20 42 . . 1 25 . 2.2 10

PROSITE-Protein Domain Matches for Gene ID: NOW7 Pattern-ID: PKC PHOSPHO SITE PSOOOO5 (Interpro) PDOC00005 Pattern-DE: Protein kinase C phosphorylation site Pattern: STRK NOV7 Position 42-ssr; 73-tok Pattern-ID: CK2 PHOSPHO SITE PSOOOO6 (Interpro) PDOCOOOO6 Pattern-DE: Casein kinase II phosphorylation site Pattern: IST. {2} DE NOV7 Position: 8-TVQD; 29-TMSD; 43-SRID BLOCKS Analysis AC# Description Strength Score BLO 1291A. 0 NAD: arginine ADP-ribosyltransferases proteins 1609 102.7 BLOOO58A O DNA mismatch repair proteins mutL / hexB / PM 1767 1001 BLOO 902A O Glutamate 5-kinase proteins. 1549 994 BLO 1213C 0 Protozoan/cyanobacterial globins proteins. 1420 994 BLOO579B O Ribosomal protein L29 proteins. 1361 991 BLOO 487G O IMP dehydrogenase / GMP reductase proteins. 1525 989 BLOO564F 0 Argininosuccinate synthase proteins. 1759 987 BLOO15 4A 0 E1-E2 ATPases phosphorylation site proteins. 1268 983

The STE4 SCHPO//sexual differentiation protein STE4 NOV7 is expressed in at least the following tissues: Small is involved with meiosis. The bud6 yeast/bud site selec intestine, Skin, Spleen, thyroid, placenta, colonl, cervix, tion protein BUD6 (actin interacting protein 3) interacts with heart, uterus, tonsil, lung, parathyroid and others, This the cytoskeleton. The gagy drome/retrovirus-related GAG 40 information was derived by determining the tissue Sources polyperotein (transposon gypsy) is involved with viral core of the Sequences that were included in the invention includ proteins, plyproteins, and transposable elements. Leptin is ing but not limited to SeqCalling sources, Public EST involved in fatty acid metabolism and body weight regula Sources, Literature Sources, and/or RACE Sources. Based on tion. the tissues in which NOV7 is most highly expressed, specific Other BLAST results include sequences from the Patp 45 uses include developing products for the diagnosis or treat database, which is a proprietary database that contains ment of a variety of diseases and disorders. Additional Sequences published in patents and patent publications. Patp disease indications and tissue expression for NOV7 is pre results include those listed in Table 7G. Sented in Example 2.

TABLE 7G Patp alignments of NOV7

; ; Sequences producing High-scoring Segment Pairs : Identity Positive patp: AAG19756 Arabidopsis thaliana protein fragment SEQ I - - - 54; 73; patp: AAG19757 Arabidopsis thaliana protein fragment SEQ I - - - 60; 78; patp: AAG19758 Arabidopsis thaliana protein fragment SEQ I - - - 60; 773 patp:AAM60940 Streptococcus pneumoniae encoded polypeptid 323 513 patp: AAY 43986 Mouse alcohol dehydrogenase #1-Mus sp., 37 353 57. patp: AAY 43987 Rat alcohol dehydrogenase #1-Rattus sp., 3 353 57. US 6,989,232 B2 99 100 The disclosed NOV7 nucleic acid encoding a novel a cDNA encoding the novel protein (NOV7) may be useful protein includes the nucleic acid whose Sequence is provided in cancer therapy, and the novel protein (NOV7) may be in Table 7A, or a fragment thereof. The invention also useful when administered to a subject in need thereof. By includes a mutant or variant nucleic acid any of whose bases way of nonlimiting example, the compositions of the present may be changed from the corresponding base shown in invention will have efficacy for treatment of patients suffer Table 7A while still encoding a protein that maintains its ing from cancer including but not limited to breast and activities and physiological functions, or a fragment of Such ovarian cancer. The NOV7 nucleic acid encoding novel a nucleic acid. The invention further includes nucleic acids protein, of the invention, or fragments thereof, may further whose Sequences are complementary to those just described, including nucleic acid fragments that are complementary to be useful in diagnostic applications, wherein the presence or any of the nucleic acids just described. The invention amount of the nucleic acid or the protein are to be assessed. additionally includes nucleic acids or nucleic acid NOV7 nucleic acids and polypeptides are further useful in fragments, or complements thereto, whose Structures the generation of antibodies that bind immuno-specifically include chemical modifications. Such modifications include, to the novel NOV7 Substances for use in therapeutic or by way of nonlimiting example, modified bases, and nucleic 15 diagnostic methods. These antibodies may be generated acids whose Sugar phosphate backbones are modified or according to methods known in the art, using prediction derivatized. These modifications are carried out at least in from hydrophobicity charts, as described in the “Anti part to enhance the chemical Stability of the modified nucleic NOVX Antibodies' Section below. The disclosed NOV7 acid, Such that they may be used, for example, as antisense protein has multiple hydrophilic regions, each of which can binding nucleic acids in therapeutic applications in a Subject. be used as an immunogen. In one embodiment, a contem In the mutant or variant nucleic acids, and their plated NOV7 epitope is from about amino acids 1 to 10. In complements, up to about 18% percent of the bases may be another embodiment, a NOV7 epitope is from about amino So changed. acids 20 to 25. In additional embodiments, NOV7 epitopes The disclosed NOV7 protein of the invention includes the are from about amino acids 35 to 55, and from about amino novel protein whose sequence is provided in Table 7B. The 25 acids 60 to 75. These novel proteins can be used in assay invention also includes a mutant or variant protein any of Systems for functional analysis of various human disorders, whose residues may be changed from the corresponding which will help in understanding of pathology of the disease residue shown in Table 7B while still encoding a protein that and development of new drug targets for various disorders. maintains its activities and physiological functions, or a NOV8 functional fragment thereof. In the mutant or variant protein, A disclosed NOV8 nucleic acid of 4204 nucleotides (also up to about 18% percent of the residues may be So changed. referred to as 24SC714) encoding a novel secreted protein is The invention further encompasses antibodies and anti shown in Table 8A. An open reading frame was identified body fragments, Such as F, or (F), that bind immuno beginning with an ATG initiation codon at nucleotides Specifically to any of the proteins of the invention. 1911-1913 and ending with a TGA codon at nucleotides The NOV7 nucleic acids and proteins of the invention are 35 2181-2183. A putative untranslated region upstream from useful in potential therapeutic applications implicated in the initiation codon and downstream from the termination cancer including but not limited to breast cancer, ovarian codon is underlined in Table 8A, and the Start and Stop cancer, and/or other pathologies and disorders. For example, codons are in bold letters.

TABLE 8A NOV8 nucleotide sequence. (SEQ ID NO: 50) TTTTTGGAATATAAGTAGGGGGTTTATTTGGGCCAGTCTTGAGGATTGAAACTTCAAAGCACAGATTAAAGTTATCCTGAAT

ATGTAGTCCGGTCCCACCAGCAACAGTTACAAATGGATTTTTAAAGGAAATAAAAGAAAAGGCAGTTCCTAAGTTGTTTAGC

AATAATTAACATATGAAAATAACATAAGCTATTGATCTGGCTATATGTTGTTCTTTGTTTCCTAAATTACAAGAAACGAAAG

ATAATGGGTGAGGCAGCTAGTTAGGAACTAAATGCTTTTAAACAATTCCCCCCACCCCCCACCCGTGTGGGTCCTGTGAGGG

AGTGGGAGCATGACTGAAGTCCCATACTCACGCTGGCCCTGATCAAGTTTTCATACCTCACATAGCTCAGCCTGCTCTGAGT

TGATTCTTTTTTATTGCTTGATTCATGTGGAGTTGACACTGCATTCTGAAGCCAAGTGGAGTTTCTCATTACTTTTGCCCA

ACAAAGCAGGAGAGACTTCAAATAAGGGTCCAGAATTCTTACACTGAAGAAGAAAATTTTTCCACTGTCTCTAACCTTCCTC.

TCTCCACTCATAACTTACCCTCACTCGCTTCTCTCTGCTAAATATGAACTGCCACACCCACCTAAGCTTTGCCTTCTC

CTTCATGCTATAAATGTTCCTTGTCACTCCAATGCTTTGACAGAAGGCCAGAGGACATTGGGTTCAGGACCAGAGTCTTCAC

CCTGCAGGTTTTGATGGAATTTGAGCAGAATCCAGCATGGTTCATCCCTGTCAGGTCTGGATGGCACTGAGTTATCACTACA

AGCAAATGCAAATCCAGCCATTCAGATGTCAGAAAGGCCTTCGCAAATTTGCCTTTCTATTTCAGATTCCCGGGAAGGTGAC

TGTTCTCTTCTCAAGTTAGAAGATTTCAGGTCAGAGGCCAGAATATGGGAGGAATGCCTGTCTCTGCAAACCCACATGGCTC

TGGATTAGTTGGGACGGGACCCCAAGGTCATGGTGAGGAACAAACTGTACTCTTCAGCCAAAGTGTGGCGCTCACTCTGCAG

AGGTCCCTATAAAATAATAAGCTTCCTTTTGGCATCTGGATATTTTCTGCCCCTGCTTGAGCCCATGGATTTCAGAAAGACC US 6,989,232 B2 101 102

TABLE 8A-continued NOV8 nucleotide sequence.

TAACTGTTGGCTTACAACAGTCCAGCATCTGGGTCAAAAAAGGGGAACTCTAGGCTAGCGGTCCTCAATGTATGGTCTGCAG

GACAAGTTGCATCAGCATCATATGGGAACTGGTTAGAAACTCAAATTAATGAGCTCTGCCTTAGAACTACAGAACCAAAAAC

TATCAGGGTAGAGTTCAGCAATCAGTGTTTTAACATGATGCCTTAGGTGAGTCTGATGCAAGCTCAAGTTTCAGAAATACCA

CTCTTAAGTCTAAGAAGATGAAGGTTCTAGGACTTCAAAGTACTCTAATGCTTCTCCTATGGTAGAGCTAGCAGGAGTTCAT

TTATTATTCGTCCAGATGCTGATTATGCAGTTCCAGGAATTTGAGTCAATGCCAGAGCAGTTGAGGTAGAGCAAGGAGGAAT

AACAAAAATGCTAGGATATCGTGGTGTTCTGAGACAGGTGAGCTTTTCGGAGCCTCCCAACTTGTCCCCTAGTGCTTAAAAT

TTGGCACAGATGCTACCATCAGCCATGACATGGATAGAGGAGACTCTCCCCTTTATGCTGATGTATACACCAAAACGAGTCA

CAGAAAAAGCAGGCTTCCAAGATTTTTCAGCTCCCGTTGTTCCAATCATCTTCTATGATTCTGTCTCCTAGACCTGTAGCCT

TAAAGCAAGCTTATTTAAAATAAATCTGCCAGTCTGTTTCAAAGAGATTTGTTCTCCTAAATTTGTCCCAGACTGAAAACTG

CACACGTCCAAAGTTTAAGAGGTTAGTTAGGAGAAATTGAACATTATGTTTTCCTACTGCTACTTAAATTTCCAGAGGCAT

TTACAAAAATTAAACATCAATGGGAAGCCAAGTCCTTTATGAAGCTAGCAATAGACATTGATCCTGTGATAATGTTATTATT

TTTCTTATTGCTCTTGTCAGTATGCATTTCATCATCGCTGGGTTGGATGAGTATAGGGCAGCATGGGAAAACAATGTTTATT

GACTTGCAGTTTCTAGGTGCTTTAAAAAAAGTTATGCACAGGTACATAGAGCATATTAAAGCTCTTAATTTGTGTTTCTAA

TAATTTCTTCTTGAATCTCTAAAATTATGACACTACGATTAGCATTTTATTACCACATGTACAATCTATCCAGTCACCTTGA

AGTTAGATTAGATGGCATTCAAGTCACTCAGCACAGGTGAGTCAGACGGACTTTTGACCTCTCTGTAAAATAGGAAAATAAA

GACAGTGACTTTATTTATAAGAAAAATGAACTTGGCCAACAACATTAGAGAATGCTTACTCATTCTGTACCTAGACACAGAG

GAGCTTGGAACAGACCAGGAGAAATGAGACCATTATATACCCTATAATTACAACTTGTCTAATTGATCCAAGGGGAAGCAGA

GAAAGTTAACTGTAGGGCAGCAAGATGTAAACTTGGGAAGTCAGATAAGAATGGACCTTGAAAGGGACCTTGAAAGGTATGC

AGGGGGCCTGGGCACAACTGCCAAGCATAATCAGACACTGTGTGAGAAGAGGAAGTAAGTCTAGTCCCAATCACTTAATAAG

TACAGATCTCTTAGGAAGAGGCTCTGGTACAGTATCCTTCCCCCGTCTTAAAGGGACATGGAGTCTCAGCCTCCCAGCAGGA

ATGTCTAGAGAAAAAGTATCTAGCTAATTTTGTGGGCAGGGGTGAGGGAAGGAGAAATATTGTCTGGCTTAGTAAGAGTGTG

GTCTCCACAGTAACACAGATCCCTGATGTGACATTTGAGGCAGCATCCTTTCTGTGTCAAGACTGGTTCCTCCTCCTGCATT

CTGGATCCCTTCCCTGGTGTCTTTTCAGGGCACAATTACCCCACTCTCTCTTACTAGTCAACCCTTTCCTCGCAATCTT

CCCCAAAACACTTAAACAGGCTCAAGCTTTCCCCACCTTAAAAATATCTTCCCTCTACCCCACACTTCCTGCAGCTACAGCA

CTCTCTCCTCCTCCTCACACCCAAAGTTTTCCAGAAAATTACCACCTTGCCACTCCATATGCTCCCCTCCCACTCCTCA

ATTCACCTCGCTCTGTCTTCCACTCCTGTCACAGGCTTTAAAAAGCCACTGCAATCATTAGGTGACCTGTCTATTGCCAAAG

TCTCAGGACATTTTCAATTCACCTTACTTGAAACCTCCGCAGTGTGAAGGTCACTCCTTCCATCTATGCTCCTTCCTGGGT

TCTTGGGGCTCCACAATCTCCTGGGCTTCCTCCACCCACCTGCCTGCTTATTCATTTATTCTGCAGGCTCCTTCTCCCTAC

CCGACATGCCAGAGTTCCTACAAGCTTCAGGAGTCGTCCTTGACTTCTCCCTCTTCCTCACCACTCTCCAATCCAAAACATC

ACCAAATCTTGTTAATTTGGGTCCTTTGGTATTTGTTTATTCTGTCGGTTTTTTTCTGTCTTCACTCCTCTCATTCTCTAAG

AGCTGCTATAGCCTCCTTCACAACAAAGAGAGAGAGCTGCCTAAAGTCACCCAGCTAATGAATGATGACTAGGAGTGGTTCC

CAGATATTTTATCCCTTACTGCTGTGGAGGTTCCTCATCACCCTAATAGAATCACTCTTTATTCACAAAAGTAGAAAATTAA

TTTTGGATACATCATTTATTATCAAGATGTTGTTGAGGAAAAATAGGGTCATGTAAGGTGCCTCTCAGCATCTTCCTTCAAG

TTGCAAGAATTAGAAAAACAGAGACAAGATTCTATGTGTGTCCTCAGAAGACCTTCCTGAGGACCATTCCCCTAGGAACTTA.

AAAAAATTAAGCCTCCAACTCTTTCCATCTTAACTGTGTAACAGAGGAAGGTGATGACAAGAGGAAGGAGACAAGCAAGAGT

CAGACTTCGAAGGCTTGGCAGCCACTGTCAGCAAGAGGTGAGAACAGCAGACAAGACAGCAACACTCCTGAAATAATCAATC

CATACGGACTGCCATGTGAAATGTGGAGCAGACTAGTTCTAAATGGCTCCAGGAGGCAAAATAAGACTCAAGAGAAGTTACT

GGTAGATTTCAACCCAATGTGA US 6,989,232 B2 103 104 The NOV8 nucleic acid was identified on chromosome 3 transport system permease protein SAPB (prdm:35160, by comparing a NOV8 nucleic acid to the human genome. Expect=1.1). Table 8C lists the domain description from Exons were predicted by homology and the intron/exon DOMAIN analysis results against NOV8. This indicates that boundaries were determined using Standard genetic rules. the NOV8 sequence has properties similar to those of other Exons were further selected and refined by means of simi larity determination using multiple BLAST (for example, proteins known to contain this domain. tBlastN, BlastX, and BlastN) searches, and, in some instances, GeneScan and Grail. Expressed Sequences from TABLE 8C both public and proprietary databases were also added when Domain Analysis of NOV8 available to further define and complete the gene Sequence. The DNA sequence was then manually corrected for appar ProDom Protein Domain Analysis ent inconsistencies thereby obtaining the Sequences encod Smallest ing the full-length protein. The NOV8 nucleic acid was Sum further localized to the 3p22 region, a locus associated with High Probability cancer, e.g. esophageal (OMIM 604050), hepatoblastoma 15 Sequences producing High-scoring Segment Pairs: Score P(N) (OMIM 116806), lung (OMIM 604050), and ovarian carci noma (OMIM 116806), and pSuedo-Zellweger syndrome prdm: 2196 p36 (12) ATCD (5) ATCE (4) 52 O.30 (OMIM 604054). NOV8 is useful as a marker for these ATCB (2) - CALCIUM R. . . diseases. prdm:57835 p36 (1) YJK9 YEAST - 68 O.30 HYPOTHETICAL 20O.OKD PR . . . A disclosed NOV8 polypeptide (SEQ ID NO:51) encoded prdm: 15250 p36 (2) G49 (1) G49B (1) - 50 0.44 by SEQ ID NO:50 has 90 amino acid residues and is GLYCOPROTEIN MAST . . . presented in Table 8B using the one-letter amino acid code. prdm: 47898 p36 (1) WNT1 CAEEL - WNT-1 50 0.44 SignalP, Psort and/or Hydropathy results predict that NOV8 PROTEIN PRECURSOR . . . has a Signal peptide and is likely to be Secreted with a prdm: 35160 p36 (1) SAP3 HAEIN - PEPTIDE 55 O.66 certainty of 0.8200. The most likely cleavage site for a 25 TRANSPORT SYSTEM . . . NOV8 peptide is between amino acids 61 and 62, at SLG WM. NOV8 has a molecular weight of 10,474.6 Daltons.

TABLE 8B Encoded NOV8 protein sequence. (SEQ ID NO:51) MLGEIEHYWFLLLLKFPEAFTKIKHQWEAKSFMKLAIDIDPWIMLLFFLLLLSVCISSSLGWMSIGOHGKTMFIDLQFLGAL

The presence of identifiable domains in NOV8, as well as all other NOVX proteins, was determined by searches using TABLE 8C-continued software algorithms such as PROSITE, DOMAIN, Blocks, 40 Pfam, ProDomain, and Prints, and then determining the Domain Analysis of NOV8 Interpro number by crossing the domain match (or numbers) using the Interpro website (maintained by the European BLOCKS Protein Domain Analysis Bioinformatics Institute, Hinxton, Cambridge, UK). AC# Description Strength. Score DOMAIN results for NOV8 as disclosed in Tables 1E, were 45 BLOO456D O Sodium:solute symporter family 1174 1038 collected from the Conserved Domain Database (CDD) with proteins. Reverse Position Specific BLAST analyses. This BLAST BLO1271B O Sodium: sulfate symporter family 1480 1033 analysis Software samples domains found in the Smart and proteins. Pfam collections. BLOO790A O Receptor tyrosine kinase class V 1390 1031 Prodom domain analysis of the NOV8 polypeptide indi 50 proteins. cates that the NOV8 polypeptide has 11 of 23 (47%) BLOO284A O Serpins proteins. 13O8 1029 identical to, and 14 of 23 (60%) positive with, the 40 aa p36 BLO1313A O Lipoate-protein ligase B proteins. 1390 101.8 (12) ATC ATCE(4) ATCB(2)-calcium reticulum calcium PROSITE - Protein Domain Analysis transporting ATPase type hydrolase transport transmem Protein Domain Matches for Gene ID: NOVO8 brane endoplasmic class (prdm:2196. Expect=0.36); 28 of 55 84 (33%) identical to, and 38 of 84 (45%) positive with, the No PROSITE patterns found 1769 aa p36 (1) YJK9 YEAST hypothetical 200.0 kD protein in GZF3-SMEI intergenic region, hypothetical pro In a search of public sequence databases, the NOV8 amino acid Sequence had no hits with the Expect value Set tein (pram:57835, Expect=0.36); 11 of 32 (34%) identical at 1.0. Public amino acid databases include the GenBank to, and 18 of 32 (56%) positive with, the 68 aa p36 (2) 60 databases, SwissProt, PDB and PIR. G49(1) G49B(1)-glycoprotein mast cell Surface precursor Other BLAST results include sequences from the Patp signal transmembrane imminoglobulin fold GP49A database, which is a proprietary database that contains (pram:15250, Expect=0.58); 9 of 23 (39%) identical to, and Sequences published in patents and patent publications. 17 of 23 (73%) positive with, the 41 aa p36 (1) WNTI BLASTP analysis again the NOV8 protein shows that the CAEEL-WNT-1 protein precursor (prdm:47898, Expect= 65 NOV8 protein has 18 of 28 aa residues (64%) identical to, 0.58); and 15 of 46 (32%) identical to, and 26 of 46 (56%) and 18 of 28 aa residues (64%) positive with, the 78 aa Zea positive with, the 89 aa p36 (1) SAPB HAEIN-peptide mays protein fragment SEQ ID NO: 30302 of patent US 6,989,232 B2 105 106 EP1033405-A2 (patp: AAG26008, Expect=0.097); 14 of 30 corresponding residue shown in Table 8B while still encod aa residues (46%) identical to, and 16 of 30 aa residues ing a protein that maintains its Secreted protein-like activi (53%) positive with, the 51 aa Human secreted protein ties and physiological functions, or a functional fragment sequence encoded by gene 65 SEQ ID NO: 188 thereof. (patp:AAY91515, Expect=0.50); 14 of 30 aa residues (46%) The invention further encompasses antibodies and anti identical to, and 16 of 30 aa residues (53%) positive with, body fragments, Such as F, or (F), that bind immuno the 50 aa Human Secreted protein Sequence encoded by gene Specifically to any of the proteins of the invention. 65 SEQ ID NO:329 (patp:AAY91656, Expect=0.50); 21 of The above defined information for this invention Suggests 64 aa residues (32%) identical to, and 32 of 64 aa residues that this secreted protein-like protein (NOV8) may function (50%) positive with, the 997 aa Human shear stress as a member of a Secreted protein family. Therefore, the response protein SEQID NO: 28 (patp:AAB90764, Expect= NOV8 nucleic acids and proteins identified here may be 0.91); 13 of 31 aa residues (41%) identical to, and 19 of 31 useful in potential therapeutic applications implicated in (but aa residues (61%) positive with, the 52 aa Gene 9 human not limited to) various pathologies and disorders as indicated Secreted protein homologous amino acid Sequence #123 below. The potential therapeutic applications for this inven Chlorella vulgaris (patp:AAB34919, Expect=1.0); and 14 of 15 tion include, but are not limited to: cancer research tools, for 43 aa residues (32%) identical to, and 22 of 43 aa residues all tissues and cell types composing (but not limited to) those (51%) positive with, the 46 aa Human secreted protein defined here, including esophagus, liver, lung and ovary. sequence encoded by gene 4 SEQ ID NO:64 The NOV8 nucleic acids and proteins of the invention are (patp:AAB34580, Expect=2.7). Patp results include those useful in potential therapeutic applications implicated in listed in Table 8D. cancer including but not limited to esophageal, liver, lung

TABLE 8D Patp alignments of NOV8

Smallest Suml High Prob. Sequences producing High-scoring Segment Pairs : Score P(N) patp: AAY 91515 Human secreted protein sequence encoded by . . . 59 O 39 patp: AAY91656 Human secreted protein sequence encoded by . . . 59 O 39 patp: AAB90764 Human shear stress-response protein SEQ ID . . . 70 O. 60 patp:AAB34919 Gene 9 human secreted protein homologous am . . . 56 O. 64 patp:AAB34 580 Human secreted protein sequence encoded by . . . 52 O 93

The NOV8 protein domain information and chromosomal and ovary and/or other pathologies and disorders. For mapping Suggest that NOV8 is a cancer-associated Secreted 40 example, a cDNA encoding the Secreted protein-like protein protein. AS Such, it is useful as a diagnostic tool for the onset (NOV8) may be useful in cancer therapy, and the secreted and or progression of cancer, Such as esophageal, protein-like protein (NOV8) may be useful when adminis hepatoblastoma, lung, and Ovarian carcinoma. tered to a Subject in need thereof. By way of nonlimiting The disclosed NOV8 nucleic acid encoding a secreted example, the compositions of the present invention will have protein includes the nucleic acid whose Sequence is provided 45 efficacy for treatment of patients Suffering from cancer in Table 8A, or a fragment thereof. The invention also including but not limited to esophageal, hepatic, lung and includes a mutant or variant nucleic acid any of whose bases ovarian cancer. The NOV8 nucleic acid encoding secreted may be changed from the corresponding base shown in protein-like protein, and the Secreted protein-like protein of Table 8A while Still encoding a protein that maintains its the invention, or fragments thereof, may further be useful in Secreted protein-like activities and physiological functions, 50 diagnostic applications, wherein the presence or amount of or a fragment of Such a nucleic acid. The invention further the nucleic acid or the protein are to be assessed. includes nucleic acids whose Sequences are complementary NOV8 nucleic acids and polypeptides are further useful in to those just described, including nucleic acid fragments that the generation of antibodies that bind immuno-specifically to the novel NOV8 Substances for use in therapeutic or are complementary to any of the nucleic acids just described. diagnostic methods. These antibodies may be generated The invention additionally includes nucleic acids or nucleic 55 according to methods known in the art, using prediction acid fragments, or complements thereto, whose Structures from hydrophobicity charts as described in the “Anti-NOVX include chemical modifications. Such modifications include, Antibodies' section below. The disclosed NOV8 protein has by way of nonlimiting example, modified bases, and nucleic multiple hydrophilic regions, each of which can be used as acids whose Sugar phosphate backbones are modified or an immunogen. In one embodiment, a contemplated NOV8 derivatized. These modifications are carried out at least in 60 epitope is from about amino acids 1 to 30 In another part to enhance the chemical Stability of the modified nucleic embodiment, a NOV8 epitope is from about amino acids 18 acid, Such that they may be used, for example, as antisense to 35. In additional embodiments, NOV8 epitopes are from binding nucleic acids in therapeutic applications in a Subject. about amino acids 65 to 90. These novel proteins can be used The disclosed NOV8 protein of the invention includes the in assay Systems for functional analysis of various human Secreted protein-like protein whose Sequence is provided in 65 disorders, which will help in understanding of pathology of Table 8B. The invention also includes a mutant or variant the disease and development of new drug targets for various protein any of whose residues may be changed from the disorders. US 6,989,232 B2 107 108 NOV9 beginning with an ATG initiation codon at nucleotides 1-3 and ending with a TAA codon at nucleotides 1708-1710. A A disclosed NOV9 nucleic acid of 3111 nucleotides (also putative untranslated region downstream from the termina referred to as 6CS060) encoding a novel Kelch-like protein tion codon is underlined in Table 9A, and the start and stop is shown in Table 9A. An open reading frame was identified codons are in bold letters.

TABLE 9A NOV 9 nucleotide sequence. (SEQ ID NO: 52) AGAATGCCACCAGATCTGAAGAGCAGTTCCATGTTATAAACCACGCAGAGCAAACTCTTCGTAAAATGGAGAACTACTTG

AAAGAGAAACAACTATGTCATGTGCTACTGATTGCAGGACACCTCCGCATCCCAGCCCATAGGTTGGTTCTCAGCGCAGTG

TCTGATTATTTTGCTGCAATGTTTACTAATGATGTGCTTGAAGCCAAACAAGAAGAGGTCAGGATGGAAGGAGTAGATCCA

AATGCACTAAATTCCTTGGTGCAGTATGCTTACACAGGAGTCCTGCAATTGAAAGAAGATACCATTGAAAGTTTGCTGGCT

GCAGCTTGTCTTCGCAGCTGACCAGGTCATTGATGTTTGCTCCAATTTTCTCATAAAGCAGCTCCATCCTTCAAACTGC

TTAGGGATTCGATCATTTGGAGATGCCCAAGGCTGTACAGAACTTCTGAACGTGGCACACAAATACACTATGGAACACTTC

ATTGAGGTAATAAAAAACCAAGAATTCCTCCTGCTTCCAGCTAATGAAATTTCAAAACTTCTGTGCAGTGATGACATTAAT

GTGCCTGATGAAGAGACCATTTTTCATGCTCTAATGCAGTGGGTGGGGCATGATGTGCAGAATAGGCAAGGAGAACTGGGG

ATGCTGCTTTCTTACATCAGACTGCCATTACTCCCACCACAGTTACTGGCAGATCTTGAAACCAGTTCCATGTTTACTGGT

GATCTTGAGTGTCAGAAGCTCCTGATGGAAGCTATGAAGTATCATCTTTTGCCTGAGAGAAGATCCATGATGCAAAGCCCT

CGGACAAAGCCTAGAAAATCAACTGTGGGGGCACTTTATGCTGTAGGAGGCATGGATGCTATGAAAGGTACTACTACTATT

GAAAAATATGACCTCAGGACCAACAGTTGGCTACATATTGGCACCATGAATGGCCGTAGGCTTCAATTTGGAGTCGCAGTT

ATTGATAATAAGCTCTATGTCGTGGGAGGAAGAGACGGTTTAAAAACTTTGAATACAGTGGAATGTTTTAATCCAGTTGGC

AAAATCTGGACTGTGATGCCTCCCATGTCAACACATCGGCACGGCTTAGGTGTAGCCACTCTTGAAGGACCAATGTATGCT

GTAGGTGGTCATGATGGATGGAGCTATCTAAATACTGTAGAAAGATGGGACCCTGAGGGACGACAGTGGAATTACGTAGCC

AGTATGTCAACTCCTAGAAGCACAGTTGGTGTTGTTGCATTAAACAACAAATTATATGCTATTGGTGGACGTGATGGAAGT

TCCTGCCTCAAATCAATGGAATACTTTGACCCACACACTAACAAGTGGAGTTTGTGTGCTCCAATGTCCAAAAGACGTGGA

GGTGTGGGAGTTGCCACATACAATGGATTCTTATATGTTGTAGGGGGGCATGATGCCCCTGCTTCCAACCATTGCTCCAGG

CTTTCTGACTGTGTGGAACGGTATGATCCAAAAGGTGATTCATGGTCAACTGTGGCACCTCTGAGTGTTCCTCGAGATGCT

GTTGCTGTGTGCCCTCTTGGAGACAAACTCTACGTGGTTGGAGGATATGACGGACATACTTATTTGAACACAGTTGAGTCA

TATGATGCACAGAGAAATGAATGGAAAGAGGAAGTTCCTGTTAACATTGGAAGAGCTGGTGCATGTGTTGTAGTGGTGAAG

CTACCCAAAGCTATCTATCTTTATCAAATGGAATGAAACTAGATAATTTCAAGAAACTGAGTAGGACAAAGGGAGAAAGA

AATACAGTTCTTTTTCCTGCAATTAATAACAGACTGGAAAATTGTTGTATCATTTTAATTTGTAGTTACAATTGCTTTC

ATTCGTGAAGCCGAAACGTTTTTAAACATGAATTACATATGAATTATTAAGCATATGTGCTTTCGCAGCTGATAATATAAA

AGGAAATCCCACAGTCTAGATATAGCCCCATTACTACAAAATGCTAAAATATTTAATGAAAATTGATGGTGGCCACAGTGT

GCAGGTTATAAAAGCATTAATACATTTCAAGGTAAGAGCCTTAAAAGTTAAAAACATTTTCAGTTTTTTTTTAAAAAACGT

ACTCTTATTATCTGGAACATAGAAATATAAAAGGTAACATCTAAAGCTTAGAATAGTGTGATTTTTAGTAAGCCATTATTC

TCCTATTCAAATAATATCCCAAAGAGCTAAACAATTCCTTACATTTACCAAGAGGAAAGCTTTTACTGTGTTGAAGCTAAA

AAAATAATGGCTCTTTGACAAAACTTGTTATGTTGATCGCGGTATGTCAAAATTTTTACAGGTTTGCTCATCTGCCAGAGC

ACACATATAAATTTGGTATTTCTTAACATATTATCTTGTTAGATTTGTTACCAGTAAAATATTACTGTAATTTCATATACA

CAGTCTATACAATGAAATAATGAATATTTATCATATTGATACAAACTGTGACCTCAGCTTCAGAGTGTCAGGGCCTCACTT

GTATAGAATGTAATGTTCTCCTCAAACATTTATGTTAACTCTATAAACAAATATCGTTAAGTTAAACAAGTTTTCAAAAAC

AAAACAATTTTTAAAGTACCTTAAAATTGAGGATGTTACTCAGTGTTAACACATGGGAACACCAAAATATTCAATAAGCCT

GGTCAATTCTATAGTTATCTTTTTTGACCAACACAGCTTTTCTGTTACTGTTATATTATCCAGTAGAAAATGTTAGGAT

ATGTGTGCTATATAAAAAAAAAAAAGACTTGTTAAGTTTTAAAATAACAAAAAATGGCTAGTTGAATAGTATTTTATGTGT US 6,989,232 B2 109 110

TABLE 9A-continued NOV 9 nucleotide sequence.

AATTCTTCCATTTATTCTGTTTAATTATACAACTAAGATGAAATATTGAAAAACCCTTTGTGAAAGTAACTTTTCAAGTAA

ATGCACAACTTTAGAATTTCTACAAATAAGTTCTTTTAAACAGTCTTTTTATTGTGGATTGTGAAATCAAAATCTGGAGAA

ATGCTTATAAAATATACTACTAGCTTTTAAGTTTTAAGAAAGAAGAACGTAAGTTGTACAAAGATATTTGTACTTTGACAA

ACTGAATTTAAATAAACTTTATTTCCTCTCAAA

The NOV9 nucleic acid was identified on the human X 314580) and/or other diseases/disorders. NOV9 is a useful chromosome by comparing the NOV9 nucleic acid to the 15 marker for these and/or other diseases/disorders. human genome. Exons were predicted by homology and the In a search of public sequence databases, the NOV9 intron/exon boundaries were determined using Standard nucleic acid sequence has 2751 of 2767 bases (99%) iden tical to a human Kelch-4 cDN (Accession No. XMO39746). genetic rules. Exons were further Selected and refined by Public nucleotide databases include all GenBank databases means of similarity determination using multiple BLAST and the GeneSeq patent database. (for example, tRlastN, BlastX, and BlastN) searches, and, in A disclosed NOV9 polypeptide (SEQ ID NO:53) encoded Some instances, GeneScan and Grail. Expressed Sequences by SEQ ID NO:52 has 569 amino acid residues and is from both public and proprietary databases were also added presented in Table 9B using the one-letter amino acid code. when available to further define and complete the gene SignalP, Psort and/or Hydropathy results predict that NOV9 Sequence. The DNA sequence was then manually corrected does not contain a known signal peptide and is likely to be for apparent inconsistencies thereby obtaining the Sequences 25 localized endoplasmic reticulum (membrane with a certainty encoding the full-length protein. The NOV9 nucleic acid of 0.6000. In alternative embodiments, the NOV9 protein is was further mapped to the q13 region of the X chromosome. localized to a microbody (peroxisome) with a certainty of This locus is associated with Menkes disease (OMIM 0.3000; the mitochondrial inner membrane with a certainty 300011), myoglobinuria?hemlolysis due to PGK deficiency of 0.1000; or the plasma membrane with a certainty of (OMIM 311800), Wieacker-Wolff syndrome (OMIM 0.1000. NOV9 has a molecular weight of 63292.0 Daltons.

TABLE 9B Encoded NOV9 protein sequence. (SEQ ID NO:53) MNATRSEEQFHWINHAEQTLRKMENYLKEKOLCDVLLIAGHLRIPAHRLVLSAVSDYFAAMFTNDWLEAKQEEWRMEGWDP

NALNSLVOYAYTGVLQLKEDTIESLLAAACLLQLTOVIDWCSNFLIKQLHPSNCLGIRSFGDAQGCTELLNVAHKYTMEHF

IEWIKNOEFLLLPANEISKLLCSDDINVPDEETIFHALMQWVGHDWQNROGELGMLLSYIRLPLLPPOLLADLETSSMFTG

DLECQKLLMEAMKYHLLPERRSMMQSPRTKPRKSTVGALYAVGGMDAMKGTTTIEKYDLRTNSWLHIGTMNGRRLOFGVAV

IDNKLYWWGGRDGLKTLNTVECFNPVGKIWTVMPPMSTHRHGLGVATLEGPMYAVGGHDGWSYLNTVERWDPEGROWNYWA

SMSTPRSTWGWWALNNKLYAIGGRDGSSCLKSMEYFDPHTNKWSLCAPMSKRRGGWGWATYNGFLYWWGGHDAPASNHCSR

LSDCVERYDPKGDSWSTWAPLSWPRDAVAVCPLGDKLYWWGGYDGHTYLNTVESYDAQRNEWKEEWPVNIGRAGACVWWWK

LP

50 The reverse complement for NOV9 is presented in Table 9.C.

TABLE 9C NOV9 reverse complement (SEQ ID NO:54) TTTGAGAGGAAATAAAGTTTATTTAAATTCAGTTTGTCAAAGTACAAATATCTTTGTACAACTTACGTTCTTCTTTCTTAA

AACTTAAAAGCTAGTAGTATATTTTATAAGCATTTCTCCAGATTTTGATTTCACAATCCACAATAAAAAGACTGTTTAAAA

GAACTTATTTGTAGAAATTCTAAAGTTGTGCATTTACTTGAAAAGTTACTTTCACAAAGGGTTTTTCAATATTTCATCTTA

GTTGTATAATTAAACAGAATAAATGGAAGAATTACACATAAAATACTATTCAACTAGCCATTTTTGTTATTTTAAAACTTA

ACAAGTCTTTTTTTTTTTTTATATAGCACACATATCCTAACATTTTCTACTGGATAATATAACAGTAACAGAAAAGCATGT US 6,989,232 B2 111 112

TABLE 9 C-continued NOV9 reverse complement

GTTGGTACAAAAAAGATAACTATAGAATTGACCAGGCTTATTGAATATTTTGGTGTTCCCATGTGTTAACACTGAGTAACA

TCCTCAATTTTAAGGTACTTAAAAATTGTTTTGTTTTTGAAAACTTGTTTAACTAACGATATTGTTTATAGAGTTAAC

ATAAATGTTTGAGGAGAACATTACATTCTATACAAGTGAGGCCCTGACACTCTGAAGCTGAGGTCACAGTTTGTATCAATA

TGATAAATATTCATTATTTCATTGTATAGACTGTGTATATGAAATTACAGTAATATTTTACTGGTAACAAATCTAACAAGA

TAATATGTTAAGAAATACCAAATTTATATGTGTGCTCTGGCAGATGAGCAAACCTGTAAAAATTTTGACATACCGCGATCA

ACATAACAAGTTTTGTCAAAGAGCCATTATTTTTTTAGCTTCAACACAGTAAAAGCTTTCCTCTTGGTAAATGTAAGGAAT

TGTTTAGCTCTTTGGGATATTATTTGAATAGGAGAATAATGGCTTACTAAAAATCACACTATTCTAAGCTTTAGATGTTAC

CTTTTATATTTCTATGTTCCAGATAATAAGAGTACGTTTTTTAAAAAAAAACTGAAAATGTTTTTAACTTTTAAGGCTCTT

ACCTTGAAATGTATTAATGCTTTTATAACCTGCACACTGTGGCCACCATCAATTTTCATTAAATATTTTAGCATTTTGTAG

TAATGGGGCTATATCTAGACTGTGGGATTTCCTTTTATATTATCAGCTGCGAAAGCACATATGCTTAATAATTCATATGTA

ATTCATGTTTAAAAACGTTTCGGCTTCACGAATGAAAGCAATTGTAACTACAAATTAAAATGATACAACAATTTTCCAGTC

TGATTATTAATTGCAGGAAAAAGAACATGTATTTCTTTCTCCCTTTGTCCTACTCAGTTTCTTGAAATTACTAGTTTCAT

TCCATTTGATAAAGATAGATAGCTTTAGGGTAGCTTCACCACTACAACACATGCACCAGCTCTTCCAATGTTAACAGGAAC

TTCCTCTTTCCATTCATTTCCTGGCATCATAGACCAACGGTTCAAATAAGTATGTCCGTCATATCCTCCAACCAC

GTAGAGTTTGTCTCCAAGAGGGCACACAGCAACAGCATCTCGAGGAACACTCAGAGGTGCCACAGTTGACCATGAATCACC

TTTTGGATCATACCGTTCCACACAGTCAGAAAGCCTGGAGCAATGGTTGGAAGCAGGGGCATCATGCCCCCCTACAACATA

TAAGAATCCATTGTATGTGGCAACTCCCACACCTCCACGTCTTTTGGACATTGGAGCACACAAACTCCACTTGTTAGTGTG

TGGGTCAAAGTATTCCATTGATTTGAGGCAGGAACTTCCATCACGTCCACCAATAGCATATAATTTGTTGTTTAATGCAAC

AACACCAACTGTGCTTCTAGGAGTTGACATACTGGCTACGTAATTCCACTGTCGTCCCTCAGGGTCCCATCTTTCTACAGT

ATTTAGATAGCTCCATCCATCATGACCACCTACAGCATACATTGGTCCTTCAAGAGTGGCTACACCTAAGCCGTGCCGATG

TGTTGACATGGGAGGCATCACAGTCCAGATTTTGCCAACTGGATTAAAACATTCCACTGTATTCAAAGTTTTTAAACCGTC

TCTTCCTCCCACGACATAGAGCTTATTATCAATAACTGCGACTCCAAATTGAAGCCTACGGCCATTCATGGTGCCAATATG

TAGCCAACTGTTGGTCCTGAGGTCATATTTTTCAATAGTAGTAGTACCTTTCATAGCATCCATGCCTCCTACAGCATAAAG

TGCCCCCACAGTTGATTTTCTAGGCTTTGTCCGAGGGCTTTGCATCATGGATCTTCTCTCAGGCAAAAGATGATACTTCAT

AGCTTCCATCAGGAGCTTCTGACACTCAAGATCACCAGTAAACATGGAACTGGTTTCAAGATCTGCCAGTAACTGTGGTGG

GAGTAATGGCAGTCTGATGTAAGAAAGCAGCATCCCCAGTTCTCCTTGCCTATTCTGCACATCATGCCCCACCCACTGCAT

TAGAGCATGAAAAATGGTCTCTTCATCAGGCACATTAATGTCATCACTGCACAGAAGTTTTGAAATTTCATTAGCTGGAAG

CAGGAGGAATTCTTGGTTTTTTATTACCTCAATGAAGTGTTCCATAGTGTATTTGTGTGCCACGTTCAGAAGTTCTGTACA

GCCTTGGGCATCTCCAAATGATCGAATCCCTAAGCAGTTTGAAGGATGGAGCTGCTTTATGAGAAAATTGGAGCAAACATC

AATGACCTGAGTCAGCTGCAGAAGACAAGCTGCAGCCAGCAAACTTTCAATGGTATCTTCTTTCAATTGCAGGACTCCTGT

GTAAGCATACTGCACCAAGGAATTTAGTGCATTTGGATCTACTCCTTCCATCCTGACCTCTTCTTGTTTGGCTTCAAGCAC

ATCATTAGTAAACATTGCAGCAAAATAATCAGACACTGCGCTGAGAACCAACCTATGGGCTGGGATGCGGAGGTGTCCTGC

AATCAGTAGCACATCACATAGTTGTTTCCTTTCAAGTAGTTCTCCATTTTACGAAGAGTTTGCTCTGCGTGGTTTATAAC

ATGGAACTGCTCTTCAGATCTGGTGGCATTCAT

In a search of public sequence databases, the NOV9 1. Public amino acid databases include the GenBank amino acid Sequence has 431 of 569 amino acid residues databases, SwissProt, PDB and PIR. 65 It was also found that NOV9 had homologvgy to the amino (76%) identical to, and 500 of 569 residues (88%) positive acid sequences shown in the BLASTP data listed in Table with, the 569 amino acid residue human Kelch-like protein- 9D. US 6,989,232 B2 113 114

TABLE 9D

BLAST results for NOV9 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect Q9COH6; ABO51474; KIAA1687 PROTEIN 728 569/569 569/569, O.O BAB21778.1 (FRAGMENT). homo (100%) (100%) Sapiens. 6/2001 Q9Y3J5; ALO35424; DA22012.1. hono 569 569/569 569/569, O.O CAB39994.1 Sapiens. 6/2001 (100%) (100%) KHL1 HUMAN: KELCH-LIKE PROTEIN 748 431/569 500/569, O.O AF252283; AAF81719.1 1. homo Sapiens. (76%) (88%) 10/2OOO KHL1 MOUSE; KELCE-LIKE PROTEIN 751. 430/569 497/569, O.O AF252281: AAF81717.1 1. nus musculus. (76%) (87%) 10/2OOO Q9H955; AKO23057; CDNA FLJ12995 FIS, 411 411/411 411/411, O.O BAB1438.2.1 CLONE NT2RP3000233, (100%) (100%) weakly similar to ring canal protein. homo Sapiens. 6/2001

A multiple Sequence alignment is given in Table 9E, with the NOV9 protein of the invention being shown on lines 1 in a ClustalW analysis comparing NOV9 with related pro- 25 tein sequences of Table 9D.

TABLE 9E

Information for the ClustalW préteins:

1. SEQ ID NO:53, NOV9 2. SEQ ID NO:55, Q9COH6 KIAA1687 PROTEIN (FRAGMENT). homo sapiens. 6/2001 3. SEQ ID NO:56, Q9Y3 J5 DA22D12. 1. homo sapiens. 6/2001 4. SEQ ID NO:57, KHL1 HUMAN KELCH-LIKE PROTEIN 1... homg5 sapiens. 10/2000 5. SEQ ID NO:58, KHL1 MOUSE KELCH-LIKE PROTEIN 1... mus musculus. 10/2000 6. SEQ ID NO:59, Q9H955 CDNA FLJ12995 FIS. homo sapiens. 6/2001

NOW9 EKAFWFPPATMSWSGKKEFDWKQILRLRWRWFSHIP--FOGSRNRGSCLQQ Q9COH6 EKAFWFPPATMSWSGKKEFDWKQILRLRWRWFSHIP--FOGSRNRGSCLQQ Q9Y3J5 1. ------i------KHL1 HUMAN ---MSGSGRKDFDWKHILRLRWKLFSHPSPSTGGPAGGGCLQQD-GSGSFEHW 49 KHL 1 MOUSE ---MSGSGRKDFDWKHILRLRWKLFSHPSPASSSPAGGSCLQQDSGGGSFEHW 50 Q9H955 1 ------

NOW9 55 GTPWQGRLKSHSRD-- PGPAPAHQRA 97 Q9COH6 55 GTPWOGRLKSHSRD- PGPAPAHQRA 97 Q9Y3J5 1. ------KHL1 HUMAN 50 GPSQSRLLKSQERSGVSTFWKKPSSSSSSSSSPSSSSSS--FNPLNGTLLPVATRLQQGA 107 KHL 1 MOUSE 51 GPSQSRLLKNQEKGSVSAFWKKPSSSSSSSSSSSSSASSSPFNPLNGRLLPVATRLQQGA 110 Q9H955 1 ------

NOW9 98 WONLQOHNLIVHFQANEDTPKSWPEKNLFKEACEK-RAQDLEMMADDNIEDS----TAR 151 Q9COH6 98 WONLQOHNLIVHFQANEDTPKSWPEKNLFKEACEK--RAQDLEMMADDNIEDS----TAR 151 Q9Y3J5 1. ------KHL1 HUMAN 108 PGQGTQQPARTLFYVESLEEEVVPGMD-FPGPQDKGLALKELQAEPASSIQATGWGCGHR 166 KHL 1 MOUSE 111 PGQGTQQPARTLFQVESLEEEVVTGMD-FPGPQDKGLALKELQAEPASSIQATGEGCGHR 169 Q9H955 1 ------

NOW9 52 LD-RQHS----E 2O6 Q9COH6 52 LD-TQHS----E 2O6 Q9Y3J5 1. ------47 KHL1 HUMAN 167 LSSTGHSMTPQS 226 KHL 1 MOUSE 170 LTSTNHSLTPQS i 229 Q9H955

NOW9 2O7 VLLKEDTIESLL 266 Q9COH6 2O7 VLSLKEDTIESLL 266 Q9Y3J5 48 VLLKEDTIESLL 107 KHL1 HUMAN 227 LL, 286 KHL1 MOUSE 230 289 US 6,989,232 B2 115 116

TABLE 9E-continued Information for the ClustalW proteins: NOW9 267 K. 326 8: 326 K. 167 KHL1 HUMAN 287 R. 346 KHL 1 MOUSE 290 R. 349

NOW9 327 386 386 227 KHL1 HUMAN 347 4O6 KHL 1 MOUSE 350 4.09 69

NOW9 387 MQSPRTKPRKSTW 4 46 MQSPRTKPRKSTW 4 46 MQSPRTKPRKSTW 287 KHL1 HUMAN 407 QSPRTKPRKSTW 466 KHL 1 MOUSE 410 QSPRTKPRKSTW 469 MQSPRTKPRKSTW 129

NOW9 447 GTTTIEKYDLRTNSWE 506 Q9COH6 447 GTTTIEKYDLRTNS 506 Q9Y3J5 288 GTTTIEKYDLRTNS 347 KHL1 HUMAN 467 GATTIEKYDLRTN 526 KHL 1 MOUSE 470 E. 529 Q9H955 130 GTTTIEKYDLRTNSW 189

NOW9 507 566 566 4O7 KHL1 HUMAN 527 586 KHL 1 MOUSE 530 589 249

NOW9 567 626 626 4.67 KHL1 HUMAN 587 646 KHL 1 MOUSE 590 649 309

NOW9 627 686 686 527 KHL1 HUMAN 647 7O6 KHL 1 MOUSE 650 709 369

NOW9 687 GYDGH 728 728 Q9Y3J5 528 GYDGH 569 KHL1 HUMAN 707 GYDGQ 748 KHL 1 MOUSE 710 GYDGQ 751. Q9H955 370 GYDGH 411

ProDom analysis indicates that the NOV9 polypeptide has ger activator (prdm:716, Expect=3.1e-12); and 29 of 115 aa 66 of 164 aa residues (40%) identical to, and 99 of 164 aa residues (25%) identical to, and 57 of 115 aa residues (49%) residues (60%) positive with, the 170 aa p36 (1) KELC positive with, the 148 aa p36 (4) VA55(2) VCO2(2) protein DROME-ring canal prptein () repeat 55 early A55 C2 (pram:6493, Expect=5.7e-07). Pfam query for NOV9 indicates that NOV9 has high (prdm:36769, Expect=2.0e-27); 64 of 191 aa residues homology to two Interpro protein motifs, including the (33%) identical to, and 98 of 191 aa residues (51%) positive Kelch (Score=233.9, E-value=2.3e-66) and the with, the 265 aa p36 (36) SCRB(3) YC81(2) KELC(2)- BTB/POZ domain (Score=114.0, E-value=2.9e-30). protein repeat chromosome Scruin EGF-like domain inter PROSITE-software analysis indicates that NOV9 has one genic region cytoskeleton precursor (prodm:569, Expect= 60 N-glycosylation site (Pattern-ID: ASN glycosylation 2.9e-19); 50 of 201 aa residues (24%) identical to, and 99 PS00001 (Interpro)); one cAMP- and coMP-dependent pro of 201 aa residues (49%) positive with, the 263 aa p36 (3) tein kinase phosphorylation site (Pattern-ID: CAMP VFO3(2) VC13(1)-protein F3 C13, (prdm:9161, Expect PHOSPHO SITE PS00004 (Interpro)); six Protein kinase 8.5e-16); 41 of 116 aa residues (35%) identical to, and 65 C phosphorylation sites (Pattern-ID: PKC PHOSPHO of 116 aa residues (56%) positive with, the 220 aa p36 (30) 65 SITE PS00005 (Interpro)); three Casein kinase II phospho BAC 1 (2) BCL6(2) Z151(5) protein transcription nuclear rylation sites (Pattern-ID: CK2 PHOSPHO SITE DNA-binding regulation zinc-finger metal-binding Zinc fin PS00006 (Interpro)); one Tyrosine kinase phosphorylation US 6,989,232 B2 117 118 site (Pattern-ID: TYR PHOSPHO SITE PS00007 Table 9F lists the domain description from other domain (Interpro)); eleven N-myristoylation sites (Pattern-ID: analyses results against NOV9. This indicates that the MYRISTYL PS00008 (Interpro)); and one Amidation site NOV9 sequence has properties similar to those of other (Pattern-ID: AMIDATION PS00009 (Interpro)). proteins known to contain this domain.

TABLE 9F Domain Analysis of NOV9 Prodom

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs : Score P(N) pram: 36769 p36 (1) KELC DROME-RING CANAL PROTEIN (KELC . . . 306 2. Oe-27 pram: 569 p36 (36) SCRB (3) YC81 (2) KELC (2)-PROTEIN R . . . 231 5.8e-20 pram: 91.61 p36 (3) VF03 (2) VC13(1)-PROTEIN F3 C13, 26 . . . 199 8.5e-16 pram: 716 p36 (30) BAC1 (2) BCL6 (2) Z151 (2)-PROTEIN T . . . 166 3.1e-12 pram: 6493 p36 (4) VA55 (2) VCO2 (2)-PROTEIN EARLY A55 . . . 117 5. 7 e-07 BLOCKS Protein Domain Analysis AC# Description Strength Score

BLOO913B O Iron-containing alcohol denydrogenases protei 1389 1043

BLOO 115S O Eukaryotic RNA polymerase II heptapeptide rep 1762 1040

DLOO655C O Glycosyl hydrolases family 6 proteins. 1384 1037

BLO 1092Q O Adenylate cyclases class-I proteins. 1997 1035

BLO 1066D O Uncharacterized protein fanily UPFOO15 protei 1584 1029 BLOCKS Protein Domain Analysis NOV9 aa position

Pattern D: ASN GLYCOSYLATION PS00001 (Interpro) 2 Pattern-D E: N-glycosylation site, Pattern: NIPI IST IP

Pattern D: CAMP PHOSPHO SITE PSOOOO4 (Interpro) 275

Pattern-D E: cAMP- and coMP-dependent protein kinase phosphorylation site Pattern RK {2} . STI Pattern D: PKC PHOSPHO SITE PSOOOO5 (Interpro) 19, 269 362, 409, 445, 455

Pattern-D E: Protein kinase C phosphorylation site Pattern STRK

Pattern D: CK2 PHOSPHO SITE PSOOOO6 (Interpro) 4, 140, 295

Pattern-D E: Casein kinase II phosphorylation site

Pattern ST. {2} DE

Pattern D: TYR PHOSPHO SITE PSOOOO7 (Interpro) 249

Pattern-D E: Tyrosine kinase phosphorylation site

Pattern IRE {2, 3} IDE {2, 3}Y

Pattern D: MYRISTYL PSOOOO8 (Interpro) 78 218, 280 288, 311, 333,

Pattern-D E: N-myristoylation site 366, 380, 427, 460, 527 Pattern: GEDRKHPFYW. {2} STAGCNIP)

Pattern D: AMIDATION PS00009 (Interpro) 314

Pattern-D E: Amidation site, Pattern : . GRK RK US 6,989,232 B2 119 120 Other BLAST results include sequences from the Patp includes a mutant or variant nucleic acid any of whose bases database, which is a proprietary database that contains may be changed from the corresponding base shown in Sequences published in patents and patent publications. Table 9A while still encoding a protein that maintains its BLASTP analysis of the patp database shows that NOV9 has Kelch-like activities and physiological functions, or a frag 569 of 569 aa residues (100%) identical to, and 569 of 569 ment of Such a nucleic acid. The invention further includes aa residues (100%) positive with, the 569 aa Human protein nucleic acids whose Sequences are complementary to those sequence SEQ ID NO:14569 (patp:AAB94214, Expect= just described, including nucleic acid fragments that are complementary to any of the nucleic acids just described. 2.8e-314); 411 of 411 aa residues (100%) identical to, and The invention additionally includes nucleic acids or nucleic 411 of 411 aa residues (100%) positive with, the 411 aa acid fragments, or complements thereto, whose Structures Human protein sequence SEQ ID NO: 14985 include chemical modifications. Such modifications include, (patp:AAB94406, Expect=7.3e-229); 381 of 508 aa resi by way of nonlimiting example, modified bases, and nucleic dues (75%) identical to, and 439 of 508 aa residues (86%) acids whose Sugar phosphate backbones are modified or positive with, the 508 aa Human protein sequence SEQ ID derivatized. These modifications are carried out at least in NO: 13220 (patp:AAB93678, Expect=9.8e-218); 380 of 508 part to enhance the chemical Stability of the modified nucleic aa residues (74%) identical to, and 438 of 508 aa residues 15 acid, Such that they may be used, for example, as antisense (86%) positive with, the 508 aa Human protein sequence binding nucleic acids in therapeutic applications in a Subject. SEQ ID NO:12231 (patp:AAB93233, Expect=8.8e-217); The disclosed NOV9 nucleic acid is useful as a marker for and 242 of 554 aa residues (43%) identical to, and 349 of Menkes disease, myoglobinuria/hemolysis due to PGK defi 554 aa residues (62%) positive with, the 609 aa Human ciency. Wieacker-Wolff syndrome and/or other diseases/ protein sequence SEQ ID NO:11635 (patp:AAB92953, disorders. Expect=2.9e-122). Patp results include those listed in Table Based on the tissues in which NOV9 is most highly 9GF. expressed; including uterus, brain breast, and Stomach;

TABLE 9G Patp alignments of NOV9

Smallest Sum High Prob. Sequences producing High-scoring Segment Pairs : Score P(N) patp: AAB94214 Human protein sequence SEQ ID NO: 14569 Ho . . . 3015 2.8e-314 patp: AAH94406 Human protein sequence SEQ ID NO: 14985 Ho . . . 2209 7.3e-229 patp: AAB93678 Human protein sequence SEQ ID NO: 13220 Ho . . . 2104 9.8e-218 patp: AAB93233 Human protein sequence SEQ ID NO: 12231 Ho . . . 2095 8.8e-217 patp: AAB92953 Human protein senuence SEQ ID NO: 11635 Ho . . . 1203 2.9e-122

40 The kelch motif was discovered as a sixfold tandem Specific uses include developing products for the diagnosis element in the sequence of the Drosophila kelch ORFI or treatment of a variety of diseases and disorders. Addi protein. The repeated kelch motifs predict a conserved tional disease indications and tissue expression for NOV9 is tertiary Structure, a beta-propeller. This module appears in presented in Example 2. many different polypeptide contexts and contains multiple The disclosed NOV9 protein of the invention includes the potential protein-protein contact Sites. Members of this 45 Kelch-like protein whose sequence is provided in Table 9B. growing Superfamily are present throughout the cell and The invention also includes a mutant or variant protein any extracellularly and have diverse activities. of whose residues may be changed from the corresponding The Drosophila kelch protein is a structural component of residue shown in Table 9B while still encoding a protein that ring canals and is required for oocyte maturation. Recently, maintains its Kelch-like activities and physiological a new human homologue of kelch, KLHL3, was cloned. At 50 functions, or a functional fragment thereof. the amino acid level, KLHL3 shares 77% similarity with The invention further encompasses antibodies and anti Drosophila kelch and 89% similarity with Mayven body fragments, Such as F, or (F), that bind immuno (KLHL2), another human kelch homolog. Like kelch and Specifically to any of the proteins of the invention. KLHL2, the KLHL3 protein contains a poxyirus and Zinc The above defined information for this invention Suggests finger domain at the N-terminus and Six tandem repeats 55 that this Kelch-like protein (NOV9) may function as a (kelch repeats) at the C-terminus. Various KLHL3 isoforms member of a “Kelch family”. Therefore, the NOV9 nucleic result from alternative promoter usage, alternative polyade acids and proteins identified here may be useful in potential nylation sites and alternative splicing. The KLHL3 gene is therapeutic applications implicated in (but not limited to) mapped to human chromosome 5, band q31, contains 17 various pathologies and disorders as indicated below. The exons, and spans approximately 120 kb of genomic DNA. 60 potential therapeutic applications for this invention include, KLHL3 maps within the smallest commonly deleted seg but are not limited to: leukemia research tools, for all tissues ment in myeloid leukemias characterized by a deletion of and cell types composing (but not limited to) those defined 5q; however, no inactivating mutations of KLHL3 were here. detected in malignant myeloid disorders with loSS of 5q. The NOV9 nucleic acids and proteins of the invention are The disclosed NOV9 nucleic acid encoding a Kelch-like 65 useful in potential therapeutic applications implicated in protein includes the nucleic acid whose Sequence is provided cancer including but not limited to leukemias and/or other in Table 9A, or a fragment thereof. The invention also pathologies and disorders. For example, a cDNA encoding US 6,989,232 B2 121 122 the Kelch-like protein (NOV9) may be useful in disease NOV10 therapy for Menkes disease, myoglobinuria/hemolysis due to PGK deficiency, and Wieacker-Wolff syndrome, and the NOV10 includes three novel Type IIIb plasma membrane Kelch-like protein (NOV9) may be useful when adminis like proteins disclosed below. The disclosed NOV10 pro tered to a Subject in need thereof. By way of nonlimiting teins have been named NOV10a, NOV10b and NOV10c. example, the compositions of the present invention will have NOV10a efficacy for treatment of patients Suffering from neurological A disclosed NOV10a nucleic acid of 1339 nucleotides disorders including but not limited to Menkes disease. The NOV9 nucleic acid encoding Kelch-like protein, and the (also referred to as 100340173; 1373975; 1373976; 1373977 Kelch-like protein of the invention, or fragments thereof, and 1373978) encoding a novel hypothetical Y305 SYNY3 may further be useful in diagnostic applications, wherein the 22.2 kDa protein SLR0305-like protein/Type IIIb plasma presence or amount of the nucleic acid or the protein are to membrane-like proteins is shown in Table 10A. An open be assessed. reading frame was identified beginning with an ATG initia NOV9 nucleic acids and polypeptides are further useful in tion codon at nucleotides 367-369 and ending with a TGA the generation of antibodies that bind immuno-specifically 15 codon at nucleotides 925-927. A putative untranslated to the novel NOV9 substances for use in therapeutic or region upstream from the initiation codon and downstream diagnostic methods. These antibodies may be generated from the termination codon is underlined in Table 10A, and according to methods known in the art, using prediction the Start and Stop codons are in bold letters.

TABLE 1 OA NOV10a nucleotide sequence. (SEQ ID NO: 60) CACGGTCCGCCCAGAGGCTTCGGAGCTGCCGGAGCCGGGCGGGGCCTTGGCGGGCGGCCCCGGGAGTGGCGCCGGCGGCGTG

GTGGTCGGCGTGGCTGAGGTGAGAAACTGGCGCTGCGGCTGCCTCGGAGCACCTGTTGGTGCCGGAGCCTCGTGCTGGTCTG

CGTGTTGGCCGCCCTGGCTTCGCTTCCCTGGCCCTGGTCCGCCGCTACCTTCACCACCTCCTGCTGTGGGTGGAGAGCCTT

GACTCGCTGCTGGGGGTCCTGCTCTCGTCGTGGGCTTCACGTGGTCTCTTTCCCCTGCGGCTGGGGCTACATCGGCTCA

ACGTGGCCGCTGGCTACCTGTACGGCTTCGTGCTGGGCAGGGTCTGATGATGGTGGGCGTCCTCATCGGCACCTTCATCGC

CCATGTGGTCTGCAAGCGGCTCCTCACCGCCTGCGTGGCCGCCAGGATCCAGAGCAGCGAGAAGCTGAGCGCGGTTATTCGC

GTAGTGGAGGGAGGAAGCGGCCTGAAAGTGGTGGCGCTGGCCAGACTGACACCCATACCTTTTGGGCTTCAGAATGCAGTGT

TTTCGATTACTGATCTCTCATTACCCAACTAPCTGATGGCATCTTCGGTTCGACTGCTTCCTACCCAGCTTCTGAATTCTTA

CTTGGGTACCACCCTGCGGACAATGGAAGATGTCATTGCAGAACAGAGTGTTAGTGGATATTTTGTTTTTTGTTTACAGATT

ATTATAAGTATAGGCCTCATGTTTTATGTAGTTCATCGAGCTCAAGTGGAATTGAATGCAGCTATTGTAGCTTGTGAAATGG

AACTGAAATCTTCTCTGGTTAAAGGCAATCAACCAAATACCAGTGGCTCTTCATTCTACAACAAGAGGACCCTAACATTTTC

TGGAGGTGGAATCAATGTTGTAGATTCTAATGACATACGTGATTGTCAAGAGCCTAGTGTGCTATCTAAGGTCTAGCAGTC

ACTTCACTAGTGGGCAGAGACAAGTTCTAATTGTATTACAGCACAAACAAAACTGACTAGTTTTTAAATTGCACAATTTTTT

TTTTTTTAAGCAAGAATCATTTTCGGGTACTAAGTGTAAATGTAGATGCAAATTTGGCTGCACCCTTTATCATGCCTGT

ATTGGCCTATAGGTCTGCACTTAGTGTTTTTAATGTTTTATTTCTGTGTATTACGAACAGAGAAAAACCAAATAT

ATTTCTGCTTAGTGTCTTTATTTATAAAGCCCATGAGTAGTTTGTATGCATCTTTCCTACTTGTAAAGATGAGTAAAAGTAT

GCAGTTTTAAATTTAAAAAAAAAAAAA from hydrophobicity charts, as described in the “Anti A disclosed NOV10a polypeptide (SEQ ID NO:61) NOVX Antibodies' Section below. The disclosed NOV9 encoded by SEQID NO:60 has 186 amino acid residues and protein has multiple hydrophilic regions, each of which can 55 is presented in Table 10B using the one-letter amino acid be used as an immunogen. In one embodiment, a contem code. plated NOV9 epitope is from about amino acids 1 to 40. In another embodiment, a NOV9 epitope is from about amino SignalP, Psort and/or Hydropathy results predict that acids 60–95. In additional embodiments, NOV9 epitopes are NOV10 has a signal peptide and is likely to be localized from about amino acids 130 to 220, from about amino acids 60 endoplasmic reticulum (membrane) with a certainty of 240-320, from about amino acids 330 to 370, from about 0.6850. In alternative embodiments, the NOV10a protein amino acids 380 to 415, from about amino acids 425 to 460, localizes to the plasma membrane with a certainty of 0.6400; from about amino acids 470 to 510 and from about amino a Golgi body with a certainty of 0.4600; or the endoplasmic acids 520 to 569. These novel proteins can be used in assay reticulum (lumen) with a certainty of 0.1000. The most Systems for functional analysis of various human disorders, 65 likely cleavage site for a NOV10a peptide is between amino which will help in understanding of pathology of the disease acids 19 and 20, at: VVC-KR. NOV10a has a molecular and development of new drug targets for various disorders. weight of 19946.3 Daltons. US 6,989,232 B2 123 124

TABLE 1 OB Encoded NOV10a protein sequence. (SEQ ID NO: 61) MGLMMWGVLIGIFIAHVWCKRLLTAWWAARIQSSEKLSAVIRVVEGGSGLKVVALARLTPIPFGLONAVFSITDLSLPNYLM

ASSVGLLPTOLLNSYLGTTLRTMEDWIAEQSWSGYFWFCLQIIISIGLMFYWWHRAQWELNAAIVACEMELKSSLWKGNQPN

TSGSSFYNKRTILTFSGGGINWW

NOV1Ob NOV1 Oc A disclosed NOV10 nucleic acid of 512 nucleotides (also referred to as CG56409-02) encoding a novel hypothetical 15 22.2 kDa prtotein SLR0305-like, Type IIIb Plasma A disclosed NOV10c nucleic acid of 1339 nucleotides Membrane-like, protein is shown in Table 10C. The (also referred to as CG56409-03) encoding a novel hypo Sequence was derived by laboratory cloning of cDNA frag thetical 22.2 kDa prtotein SLR0305-like protein is shown in ments and by in Silico prediction of the Sequence. An open Table 10E. An open reading frame was identified beginning reading frame was identified beginning with an ATG initia with an ATG initiation codon at nucleotides 1-3 and ending tion codon at nucleotides 108-110 and ending with a TGA with a TGA codon at nucleotides 649–651. A putative codon at nucleotides 510–512. A putative untranslated untranslated region downstream from the termination codon region upstream from the initiation codon is underlined in is underlined in Table 10F, and the start and stop codons are Table 10C, and the start and stop codons are in bold letters. in bold letters.

TABLE 10C NOV10b nucleotide sequence. (SEQ ID NO: 62) GGGTCCTGCTCTCGTCGGGGCTTCACGTGGCTCTTTCCCCTGCGGCTGGGGCTACACGTGCTCAACGGGCCG

CTGGCTACCTGTACGGCTTCGTGCTGGGCAGGGTCTGATGATGGTGGGCGTCCTCATCGGCACCTTCATCGCCCATG

TGGTCTGCAAGCGGCTCCTCACCGCCTGGGTGGCCGCCAGGATCCAGAGCAGCGAGAAGCTGAGCGCGGTTATTCGCG

TAGTGGAGGGAGGAAGCGGCCTGAAAGTGGTGGCGCTGGCCAGACTGACACCCATACCTTTTGGGCTTCAGAATGCAG

TGTTTTCGATTATTATAAGTATAGGCCTCATGTTTTATGTAGTTCATCGAGCTCAAGTGGAATTGAATGCAGCTATTG

AGAGGACCCTAACATTTTCTGGAGGTGGAATCAATGTTGTAGA

A disclosed NOV10b polypeptide (SEQ ID NO:63) encoded by SEQ ID NO:62 has 134 amino acid residues and is presented in Table 10D using the one-letter amino acid 45 code. SignalP, Psort and/or Hydropathy results predict that NOV10b has a signal peptide, cleavage Site and localization results analogous to those listed for NOV10a and NOV10c. Additional software analysis suggests that NOV10b has an 50 INTEGRAL likelihood of -6.74 for a predicted transmem brane region at aa3-aa19(1-20) and an INTEGRAL likeli hood of -5.47 for a predicted transmembrane region at aa88-aa84 (63–86), and that it is likely a Type IIIb mem brane protein (Nexo Ccyt). NOV10b has a molecular weight 55 of 14249.2 Daltons.

TABLE 1 OD Encoded NOV10b protein sequence. (SEQ ID NO: 63) MGLMMWGVLIGTFIAHVWCKRLLTAWWAARIQSSEKLSAVIRVVEGGSGLKVVALARLTPIPFGLONAVFSIIISIGLMFYV

WHRAQWELNAAIVACEMELKSSLWKGNQPNTSGSSFYNKRTLTFSGCGTNVW US 6,989,232 B2 125 126

TABLE 1 OE NOV10c nucleotide sequence. (SEQ ID NO: 64) AGGGCTTCATCGTGGTCTCTTTCCCCTGCGGCTGGGGCTACATCGTGCTCAACGTGGCCGCTGGCTACCTGTACGGC

TTCGTGCTGGGCATGGGTCTGATGATGGTGGGCGTCCTCATCGGCACCTTCATCCCCCATGTGGTCTGCAAGCGGCTC

CTCACCGCCTGGGTCGCCGCCAGGATCCAGAGCAGCGAGAAGCTGAGCGCGGTTATTCGCGTAGTGCAGGGAGGAAGC

GGCCTGAAAGTGGTGGCGCTGGCCAGACTGACACCCATACCTTTTGGGCTTCAGAATGCGGTGTTTTCGATTACTGAT

CTCTCATTACCCAACTATCTGATGGCATCTTCGGTTGGACTGCTTCCTACCCAGCTTCTGAATTCTTACTTGGGTACC

ACCCTGCGGACAATGGAAGATGTCATTGCAGAACAGAGTGTTAGTGGATATTTTGTTTTTTGTTTACAGATTATTATA

AGTATAGGCCTCATGTTTTATGTAGTTCATCGAGCTCAAGTGGAATTGAATGCAGCTATTGTAGCTTGTGAAATGGAA

CTGAAATCTTCTCTGGTTAAAGGCAATCAACCAAATACCAGTGGCTCTTCATTCTACAACAAGAGGACCCTAACATTT

TCTGGAGGTGGAATCAATGTTGTAGATTCTAATGAGATACGTGATTGTTAAGAGCCTAGTGTGTA

A disclosed NOV10c polypeptide (SEQ ID NO:65) INTEGRAL likelihood of -8.12 for a predicted transmem encoded by SEQ ID NO:64 has 216 amino acid residues and brane region at aa149-aa165 (142-167) and an INTEGRAL is presented in Table 10F using the one-letter amino acid likelihood of -6.74 for a predicted transmembrane region at 25 aa33-aa49 (22–50), and that it is likely a Type IIIb mem code. SignalP, Psort and/or Hydropathy results predict that brane protein (Nexo Ccyt). The most likely cleavage site for NOV10c has a signal peptide, cleavage Site and localization a NOV10c peptide is between amino acids 49 and 50, at: results analogous to those listed for NOV10a and NOV10b. VVC-KR. NOV10c has a molecular weight of 23141 Dal Additional software analysis suggests that NOV10c has an tonS.

TABLE 1 OF Encoded NOV10c protein sequence. (SEQ ID NO: 65) MGFIVWSFPCGWGYIVLNWAAGYLYGFWLGMGLMMWGVLIGTFIAHVWCKRLLTAWWAARIQSSEKLSAVIRVVEGGSGLKW

WALARLTPIPFGLONAVFSITDLSLPNYLMASSVGLLPTOLLNSYLGTTLRTMEDWIAEQSWSGYFWFCLQIIISIGLMFYV

WHRAQWELNAAIVACEMELKSSLWKGNQPNTSGSSFYNKRTLTFSGGGINVW

NOV10a, NOV10b and NOV10c polypeptides are related to each other as shown in the ClustaIW alignment in Table 10G.

TABLE 1 OG

ClustalW of NOV10 Wariants

NOW10a ------MGLMMWG WILIGTFIAHIWWCK 20 NOW 10b ------MGLMMWG WILIGTFIAHIWWCK 20 NOW1 Oc MGFIWWSFPCGWGYIWLNWAAGYLYGFWLGMGLMMWGWILIGTFIAHWWCK 50

NOW10 a RLLTAWWAARIQSSEKLSAWIRWWEGGSGLKWWALARLTPIPFGLONAWF 70 NOW 10b RLLTAWWAARIQSSEKLSAWIRWWEGGSGLKWWALARLTPIPFGLONAWF 70 NOW10c RLLTAWWAARIQSSEKLSAWIRWWEGGSGLKWWALARLTPIPFGLONAWF 100

NOW10a SITDLSLPNYLMASSWGLLPTOLLNSYLGTTLRTMEDWIAEQSWSGYFWF 120 NOW 10b 71. NOW10c SITDLSLPNYLMASSWGLLPTOLLNSYLGTTLRTMEDWIAEQSWSGYFWF 150

NOW10a QIIISIGLMFYWWHRAQWELNAAIWACEMELKSSLWKGNQPNTSGSSF 170 NOW 10b ---IIISIGLMFYWWHRAQWELNAAIWACEMELKSSLWKGNOPNTSGSSF 118 NOW10c CLQIIISIGLMFYWWHRAQWELNAAIWACEMELKSSLWKGNQPNTSGSSF 200

NOW1 Oa. YNKRTITFSGGGINWW 186 NOW 10b. YNKRTITFSGGGINWW 134 NOW1 Oc YNKRTITFSGGGINWW 216 US 6,989,232 B2 127 128 Additional NOV10 SNP and coding variant sequences are to agb:GenBank-ID:MFU72744|acc:U72744.1 mRNA from described in Example 3. Mycobacterium fortuitum (Mycobacterium fortuitum nitrite In a Search of Sequence databases, it was found, for extrusion protein gene, complete cds). The full NOV10c example, that the NOV10b nucleic acid sequence has 156 of amino acid Sequence of the protein of the invention was 245 bases (63%) identical to agb:GenBank-ID:MFU727441 acc:U72744.1 mRNA from Mycobacterium fortuitum found to have 52 of 170 amino acid residues (30%) identical (Mycobacterium fortuitum nitrite extrusion protein gene, to, and 96 of 170 amino acid residues (56%) similar to, the complete cds). The full NOV10b amino acid sequence was 209 amino acid residue ptnr: SwissProt-ACC:Q55909 pro found to have 29 of 80 amino acid residues (36%) identical tein from Synechocystis sp. (strain PCC 6803) (hypothetical to, and 45 of 80 amino acid residues (56%) similar to, the 22.2 kDa protein SLR0305). 209 amino acid residue ptnr:SwissProt-ACC:Q55909 pro In an additional Search of public protein databases, the tein from Synechocystis sp. (strain PCC 6803) (hypothetical NOV10a amino acid Sequences have homology to the amino 22.2 kDa protein SLR0305). In a search of sequence acid sequences shown in the BLASTP data listed in Table databases, it was found, for example, that the NOV10c 10H. Public amino acid databases include the GenBank nucleic acid sequence has 156 of 245 bases (63%) identical databases, SwissProt, PDB and PIR.

TABLE 1 OH

BLAST results for NOV10a Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect Y305 SYNY3; D64005; HYPOTHETICAL 22.2 209 46/154 86/154, 3e-12 BAA10672.1; Q55909 KDA PROTEIN SLRO305. (30%) (56%) synechocystis sp. (strain pcc 6803). 11/1997 Q9VNR8: AEOO3598; CG11367 PROTEIN. 834 28/81 56/81, 6e-10 AAFS1854.2 drosophila (35%) (69%) melanogaster. 3/2001 Q97.VS7; ACOO5278; F15K9.14. 269 41/153 82/153, 7e-09 AAC72122.1 arabidopsis (27%) (54%) thaliana. 5/1999 Q9RPT3:. AF148265; HYPOTHETICAL 225 40/144 73/144, 2e-05 AAD55929.1 TRANSMEMBRANE (28%) (51%) PROTEIN. uncultured bacterium ah1. 5/2OOO

The homology of these and other Sequences is shown graphically in the ClustalW analysis shown in Table 10I. In the ClustalW alignment of the NOV10 proteins, as well as all other ClustalW analyses herein, the black outlined amino acid residues indicate regions of conserved sequence (i.e., regions that may be required to preserve Structural or functional properties), whereas non-highlighted amino acid residues are less conserved and can potentially be mutated to a much broader extent without altering protein Structure or function.

TABLE 10 ClustalW Analysis of NOV10 1) NOV10a (SEQ ID NO: 61) 2) NOV10b (SEQ ID NO: 63) 3) NOV10 c (SEQ ID NO: 65) 4) Y305 SYNY3 (SEQ ID NO: 66) 5) Q9VNR8 (partial sequence) (SEQ ID NO : 67) 6) Q92VS7 (SEQ ID NO: 68) 7) Q9RPT3 (SEQ ID NO: 69)

NOW10a ------1 NOW 10b ------1 NOW10c ------1 Y305 SYNY3 ------MA- - - - - DYLLN 7 Q9WNR8 ...HNRKRNSCWGRAHSFLTRNWYLGCLVPATILGALVFIGWATRDYARQ 150 Q92WS7 ------MSFTPSTFRIAISLLLLWAWSAWIFL----- PKLKD 32 Q9RPT3 ------MWS.----- PWLPE 8 US 6,989,232 B2 129 130

TABLE 1 OI-continued ClustalW Analysis of NOV10

MG 2

iKEDLGPFGPLALALA FAGWVHS-LGWWAPIAFWAASIAWWW-E

ATAAFLLGRA. Grit. IGR

RIQSSE IR RIQSSE IR RIQSSE IR K

DHVIGE-DGLKEvFLME

EYVIGSLG---MIP SELLSTP---VRGEMLAEWL--GMMCCDAPGGQFA:EMSEYLRSDPRPD ALSITR---VRLREFFIGELG----L

DCLEPGAALS ETLED-LSD:

IAEQSVSGYEVFCE---QIIISIGL 49 --IIISIGLS IAEQSVSGYEVFCE---QIIISIGL 79 TATNQANPTEQWT:ERIVGFIATVAVTIYVTKIARKA 204 LTRSRKRNTGALLFLSQDVDSQLST3FSHRHYVDDV 450 THGWHEVSVERWV:EMMVGVALAVILIICITRVAKSSLE 232 ADGSAAVTPIMFTA---GIVVTVLLGLLLAKIVKA 201

174 122 204 209 AKEPMGNPRYILQYTR:VKTSR 5 OO DGKKNDDASVLPIAEPPEDLQEPL 260 DAEP------ETPEVLPTPI 221

186 134 216 209 LRALRRANATAADSMAEVIAQHHQIPQELAADFDYKCRLRHARPDVT. . . 550 WERIDPSNT------269

55 The presence of identifiable domains in NOV10a, and to NUCB-AROD DNAI-THRS (prdm:3727, Expect=2.7e NOV10b and NOV10c in analogous regions, was deter 08); 14 of 36 aa residues (38%) identical to, and 21 of 36 aa mined. DOMAIN results for NOV10 as disclosed in Tables residues (58%) positive with, the 68 aa p36 (1) NU2M 10J, were collected from the Conserved Domain Database HANWI-NADH-ubiquinone oxidoreductase chain 2 (EC (CDD) with Reverse Position Specific BLAST analyses. 60 1.6.5.3)(prdm:21748, Expect=0.27); 13 of 30 aa residues This BLAST analysis software samples domains found in (43%) identical to, and 18 of 30 aa residues (60%) positive the Smart and Pfam collections. with, the 41 aa p36 (1) SODE DIRIM-extracellular ProDom analysis of NOV10a shows homology to various Superoxide dismutase precursor (CU-ZN) (EC 1.15.1.1) domains. Specifically, NOV10a has 32 of 124 aa residues (EC-SOD)(prdm:27499, Expect=0.27); 15 of 54 (27%) (25%) identical to, and 67 of 124 aa residues (54%) positive 65 identical to, and 23 of 54 (42%) positive with, the 69 aa p36 with, the 208 aa p36 (7) protein transmembrane intergenic (1) RL37 TETTH-ribosomal protein L37 (P1 TYPE) region CY20H10.06CSLRO305 CY277.13C XTHA-GDHA (prdm:21871, Expect=0.74); and 14 of 31 aa residues (45%) US 6,989,232 B2 131 132 identical to, and 20 of 31 aa residues (64%) positive with, (patp:AAB56667, Expect=30e-42); 45 of 144 aa residues the 158 aa p36 (1) YIK5 YEAST hypothetical 78.0 KD (31%) identical to, and 80 of 144 aa residues (55%) positive protein in MOB1-SGA1 intergenic region (prdm:55957, with, the 280 aa Arabidopsis thaliana protein fragment SEQ Expect=1.3). Table 10J lists various domain description ID NO: 12140 (patp:AAG12863, Expect=1.6e-12); 39 of from domain software analysis results against NOV10. This 130 aa residues (30%) identical to, and 66 of 130 aa residues indicates that the NOV10 sequence has properties similar to (50%) positive with, the 174aa Arabidopsis thaliana protein those of other proteins known to contain this domain. fragment SEQ ID NO:64446 (patp:AAG50824, Expect=

TABLE 1 OJ Domain Analysis of NOV10 PFAM HMM Domain Analysis of NOV10

Model Domain seq-f seq-t himm-f himm-t SCOe E-value no hits above thresholds ProDom analysis

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs : Score P(N) pram : 3727 p36 (7)-PROTEIN TRANSMEMBRANE INTERGENIC R . 129 2 fe-08 pram: 21748 p36 (1) NU2M HANWI-NADH-UBIQUINONE OXIDORED . 58 023 pram: 27499 p36 (1) SODE-DIRIM-EXTRACELLULAR SUPEROXIDE . 58 023 pram: 21871 p36 (1) RL37 TETTH-RIBOSOMAL PROTEIN L37 (P. 54 052 pram:55957 p36 (1) YIK5 YEAST-HYPOTHETICAL 78.0 KD PRO . 68 03

BLOCKS Protein Domain Analysis AC# Description Strength Score BLOO 495E O Apple domain proteins. 1844 1049 BL00505C O Phosphoenolpyruvate carboxykinase (GTP) prote 1787 101.9 BLOO853C O Beta-eliminating lyases pyridoxal-phosphate a 1544 101.7 BLO 1235B O Uncharacterized protein family UPF0019 protei 2114 1016 PROSITE Analysis Pattern-ID : ASN GLYCOSYLATION PS00001 (Interpro) one N-glycosylation site Pattern-ID: GLYCOSANINOGLYCAN PS00002 (Interpro) one Glycosaminoglycan attachment site Pattern-ID: PKC PHOSPHO SITE PS00005 (Interpro) two Protein kinase C phosphorylation sites Pattern-ID: CK2 PHOSPHO SITE PS00006 (Inrerpro) two Casein kinase II phosphorylation sites Pattern-ID: MYRISTYL PS00008 (Interpro) five N-myristoylation sites

Other BLAST results include sequences from the Patp 60 3.0e-06); 39 of 130 aa residues (30%) identical to, and 66 database, which is a proprietary database that contains of 130 aa residues (50%) positive with, the 204; aa Arabi Sequences published in patents and patent publications. In a dopsis thaliana protein fragment SEQ ID NO: 37254 (patp:AAG31071, Expect=9.5e-06); and 39 of 130 aa resi BLASTP analysis of the patp database, NOV10 was found dues (30%) identical to, and 66 of 130 aa residues (50%) to have 93 of 102 aa residues (91%) identical to, and 95 of 65 positive with, the 204 aa Arabidopsis thaliana protein frag 102 aa residues (93%) positive with, the 111 aa Human ment SEQ ID NO: 64445 (patp:AAG50823, Expect=9.5e prostate cancer antigen protein sequence SEQ ID NO:1245 06). Patp results include those listed in Table 10K. US 6,989,232 B2 133

TABLE 1 OK Patp alignments of NOV10

Smallest Suml High Prob. Sequences producing High-scoring Segment Pairs : Score P(N) patp:AAB56667 Human prostate cancer antigen protein seque . 4 48 3. Oe- 42 patp: AAG12863 Arabidopsis thaliana protein fragment SEQ I . 169 1. 6e-12 patp:AAG50824 Arabidopsis thaliana protein fragment SEQ I . 118 3. Oe-O 6 patp:AAG31071 Arabidopsis thaliana protein fragment SEQ I . 118 9.5 e-O 6 patp:AAG50823 Arabidopsis thaliana protein fragment SEQ I . 118 9.5 e-O 6

The Type IIIb Plasma Membrane-like NOV10 disclosed protein (NOV10) may function as a member of a “Type IIIb in this invention maps to chromosome 8q13 and 8q21. This plasma membrane-like protein family'. Therefore, the assignment was made using mapping information associated NOV10 nucleic acids and proteins identified here may be with genomic clones, public genes and ESTS Sharing useful in potential therapeutic applications implicated in (but Sequence identity with the disclosed Sequence and CuraGen not limited to) various pathologies and disorders as indicated Corporation's Electronic Northern bioinformatic tool. below. The potential therapeutic applications for this inven The disclosed NOV10 nucleic acid encoding a novel tion include, but are not limited to: Type IIIb plasma hypothetical 22.2 kDa prtotein SLR0305-like protein 25 membrane-related research tools, for all tissues and cell includes the nucleic acid whose Sequence is provided in types composing (but not limited to) those defined herein. Table 10A, or a fragment thereof. The invention also The NOV10 nucleic acids and proteins of the invention includes a mutant or variant nucleic acid any of whose bases are useful in potential therapeutic applications implicated in may be changed from the corresponding base shown in cancer including but not limited to disorderS Such as neural, Table 10A while still encoding a protein that maintains its immune, muscular, reproductive, gastrointestinal, novel hypothetical 22.2 kDa prtotein SLR0305-like protein pulmonary, cardiovascular, renal, and proliferative activities and physiological functions, or a fragment of Such disorders, wounds, and infectious diseases, and/or other a nucleic acid. The invention further includes nucleic acids pathologies and disorders. For example, a cDNA encoding whose Sequences are complementary to those just described, the SLR0305-like NOV10 protein may be useful in gene and including nucleic acid fragments that are complementary to 35 protein therapy, and the SLR0305-like protein (NOV10) any of the nucleic acids just described. The invention may be useful when administered to a Subject in need additionally includes nucleic acids or nucleic acid thereof. By way of nonlimiting example, the compositions fragments, or complements thereto, whose Structures of the present invention will have efficacy for treatment of include chemical modifications. Such modifications include, patients Suffering from Type IIIb plasma membrane-related by way of nonlimiting example, modified bases, and nucleic 40 disorders including but not limited to those described in the acids whose Sugar phosphate backbones are modified or Examples. The NOV10 nucleic acid encoding the SLR0305 derivatized. These modifications are carried out at least in like protein, and the SLR0305-like protein of the invention, part to enhance the chemical Stability of the modified nucleic or fragments thereof, may further be useful in diagnostic acid, Such that they may be used, for example, as antisense binding nucleic acids in therapeutic applications in a Subject. applications, wherein the presence or amount of the nucleic In the mutant or variant nucleic acids, and their 45 acid or the protein are to be assessed. complements, up to about 37% percent of the bases may be The protein similarity information, eXpression pattern, So changed. cellular localization, and map location for the protein and The disclosed NOV10 protein of the invention includes nucleic acid disclosed herein Suggest that this Type IIIb the novel hypothetical 22.2 kDa prtotein SLR0305-like Plasma Membrane-like NOV10 protein may have important protein whose sequence is provided in Table 10B. The 50 Structural and/or physiological functions characteristic of invention also includes a mutant or variant protein any of the Type IIIb Plasma Membrane family. whose residues may be changed from the corresponding The NOV10 nucleic acids and proteins of the invention residue shown in Table 10B while still encoding a protein have applications in the diagnosis and/or treatment of Vari that maintains its novel hypothetical 22.2 kDa prtotein ous diseases and disorders. For example, the NOV10 com SLR0305-like activities and physiological functions, or a 55 positions of the present invention will have efficacy for the functional fragment thereof. In the mutant or variant protein, treatment of patients suffering from: ACTH deficiency; up to about 64% percent of the residues may be So changed. familial febrile convulsions 1; Duane Syndrome, congenital The Type IIIb Plasma Membrane-like NOV10 gene dis Adrenal hyperplasia due to 11-beta-hydroxylase deficiency; closed in this invention is expressed in at least in peripheral glucocorticoid-remediable Aldosteronism; congenital blood tissues. Expression information was derived from the 60 Hypoaldosteronism due to CMO I deficiency; congenital tissue Sources of the Sequences that were included in the Hypoaldosteronism due to CMO II deficiency; Nijmegen derivation of the Sequence, as provided in Example 1. breakage Syndrome, Susceptibility to Low renin hyperten The invention further encompasses antibodies and anti Sion; Anemia, Ataxia-telangiectasia, Autoimmume disease, body fragments, Such as F, or (F), that bind immuno Immunodeficiencies as well as other diseases, disorders and Specifically to any of the proteins of the invention. 65 conditions. The above defined information for this invention Suggests These materials are further useful in the Generation of that this novel hypothetical 22.2 kDa prtotein SLR0305-like antibodies that bind immunospecifically to the novel Sub US 6,989,232 B2 135 136 stances of the invention for use in diagnostic and/or thera In a further embodiment, a contemplated NOV10c epitope is peutic methods. from about amino acids 50 to 75, from about amino acids NOV10 nucleic acids and polypeptides are further useful 120 to 145 and from about amino acids 180 to 216. These in the generation of antibodies that bind immuno novel NOV10 proteins can be used in assay systems for specifically to the novel NOV10 substances for use in functional analysis of various human disorders, which will therapeutic or diagnostic methods. These antibodies may be help in understanding of pathology of the disease and generated according to methods known in the art, using development of new drug targets for various disorders. prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies' section below. The disclosed NOV11 NOV10a protein has multiple hydrophilic regions, each of 10 A disclosed NOV11 nucleic acid of 6540 nucleotides (also which can be used as an immunogen. In one embodiment, a referred to as 8793.8450) encoding a novel transposase-like contemplated NOV10a epitope is from about amino acids 18 protein is shown in Table 11A. An open reading frame was to 25. In another embodiment, a NOV10 epitope is from identified beginning with an ATG initiation codon at nucle about amino acids 30 to 50. In additional embodiments, otides 758-760 and ending with a TGA codon at nucleotides NOV10a epitopes are from about amino acids 100 to 120 15 1175-1177. A putative untranslated region upstream from and from about amino acids 135 to 186. In another the initiation codon and downstream from the termination embodiment, a contemplated NOV10b epitope is from about codon is underlined in Table 11A, and the Start and Stop amino acids 25 to 45 and from about amino acids 100 to 134. codons are in bold letters.

TABLE 11A NOV11 nucleotide sequence. (SEQ ID NO: 70) CTGGAGTTCCTTTATTCTGGGGATAGCTCAAGTCCACTGCCAATGGCTGACAGTCATTAATACACAGGCAGAAAAAAGAA

ATAAGCTGCTGTGTCTGCAGTTGGGAGGGGAGCACTGGGAAGGACAGAATGGAAGTTACTGTATCCAGATACCAGCGGCC

TTTACATTTTAAACATGGAGAGGAAGGAACAGGCAGATTAAAAAGTGAAAAATGGCAGTTTACAGAGAAGGCCTAACTGT

TGGAGAATGAGTACGAGATGAAGGGAAGCAGCTTTGATAGCAAACCAGGGGAATAAGGCAGTTATCTGCCAGTATCTACT

GCTTCAAAGAGAAGCTCAAGCATCATCTAAGTAGTTTTACACAGGGAGTGAGACTGAGTTTGGTGGGGATTTCATTGAGT

AATGGGATAAAAATTCAGGCACTGCTCATTCAGTTCCAAGGTTCTCTTGCAACCCAGTTTTGAGCTGGAGGGAATTGTGT

TTTGGTACATATTTATGTTTGAATGCAAGCCAGCCCACATTCGACAGGCACGGAGCTCTTTCATGCTCAGAAAAGGGAAA

AAAAAGTTCCTGTTCTTGTATATTCTTTCATCCTAAACCTGAGACACTTAACAAGAAGCCGGTGTTGGCAAAGGTGTGTG

TGTGTGTGTGTGTCTGTGTGTGTGTGTCCTAACGAAATGCACATATTTGCTGCAGTGAAGGAGCCAGTTTTTCCATAAAT

GGCTAACAGGAATTTGATGAAGTGTTTGCAACATTAAAGTGTTGTGGGTCACGTTGTAACTTACATTGTTCCCCAGCCT

CCACTTTTCCTTGTTTCCTAACCAACCTCCATCCCGCCCCACAGCCACATTCATCCAGGCCTTCAATAGGTCTGCTGTC

AGTTCCCATAAACTGGCTCAGGTTGTAGAAATGGTTAGTGAAGTCGGGCATCTCAGCCATTCCCACCTCTTACTTCCCAA

GGTGTCTCATGTCACCAAATTACAAATCATCCACAAGCAGAAGATCAAATCCAGGCTGACTAAAGCCATGTGGAATGTGG

ACACTTGGGGGCAGTTAAATACCTTACAGGTTTCTGCTGTAAGATTTGAAGCTTTGAAGGCAGAAATCAATGGCCAGATT

TTCAAAGGAAAAGGTTACAGGTGTGTCCAGGTGAGCCCCAGACAGATGGATCTGGAAAGCAAGTGCCTGTGCAGGTGCA

GTGACTGCTCTGGCCATATGTCCTGTACAGACATGGGCTGCAGAGGAAGGAACAAGACTGTGAGTCAAAGAAGACAGGCC

CGTGCAGCCATCCGTGCCTTACTTGTCTCCAGGTATATGGGGCAGATCTGTAAGTAGAGAATAAGAACAGCAGATGGGAT

TTTCCATGGGGACTCTACTTCCTACTCCAAGGCATTCAGAAACATGGCTAAAATGAAACCAGTGAATTTGGGGCCATAGA

GCTATCTCAAAACCAAGAGAATGAACTGCCAGGATGCATGAAGAGGGATGCCGAAGGCAGGCAGTAAGGGAGGGGGAAAC

TGAGTGGGCTCTGAATGTCACCTGCACGGTGTAGGCCCTCACGGCATCTTTCTGACCTCTAAATGTTGGAACACCCCAAC

AGGCCTCGGTCCTGCCTCCCCGTCCCCTCTGCCACACTCTCTCGGGTGAGCCACCAGCCCCACGCCTTACACCC

ATTTATGCACTGATCGCTCCTAACTCTAAATCTCCACCCCGACCCTTCTCCTGAGCTCCCGATTCAAAACTTAGGCCT

GTTCATCCTCTTGGATATCTAATAGAGCTCCCAAAGTTAATGTGTCCAAACCTGAACCCCAGATTCGCCACTATGTTCCC

AAATCCCACTATGGGTTAGTCTCCCCCATCTCAGAAAAGTAACCCTCCATTACCCAAGTGGTCTGGACAAAAGTTTGGGA

TTATCCTCAATTCTTTTCTTTATCTCACATCCCGCATCTAATCCATCAGCAAGTTTCGTCAGCTCTCCCTGTAAAATGCA

TCCCATTCCTACTTTTCATTGCTCCACCACTACCAGCCCTGTTCAAAGCAACACCCTTTCTTTCCTTGATGACTGCAA US 6,989,232 B2 137 138

TABLE 11A-continued NOV11 nucleotide sequence.

GTTGTTGAGCTGACTGCCTTGACCCAGCCTGCCACCTTGTGTCTTGTCTCCACACGGAAACTCAAGTGACTTTTTAAA

AGTATAAATTAGATTAGCCTCCTTTCTTGCTCAAAAACTTCTGCTGGTATTTCCTACTTTTAAAATGAAGTTCAAAGTCC

TAAAAAGCCTAACCTCTATTACCACCCCCACCCCACCCCTTCTATCTCCCTTTGCCATTCCACCCACACCAACCTC.

CTGATCACCCTTCAAAAACACACCTTGTCCCTCTGTGGCATCTTGATATTGTTCCTGTATCCACCTGGAAATCTTT

CACATTGCTCGTTCCCCTGATGCACTCAAAACTCTCTAATCCCACGTTCATCTTTGCAAAGAAGTCTTTCCTGACCACAG

ATTCTAAAGGAGACCAACCACCATCCAGCTCTTGGATCCTCCTCTTCTCTTCCCTTCTCCTGTTCCACGCATAGGGCACA

TTGATCATGGTTTTTGGCTACCCAGTGTATTTTAACATTCTTGTCCTATTTGAGAAAATTTGAGACTCCCCAAAGCAGAA

GGCAGTATAGTGAGTTTAATAGTGTTTCCCCTGATGTACATCTACCCACAGCCTCAGAATATGACCTTAATTGGAAATAG

GTTCTTTGCAGCTATAATTAGTTAAGGAGTGGAAGATGAAGTCATCCTGAATTTAGGGTGGGCCCTAATTCCAATGACTG

GCATCCTTATGACATAATGGAGAAGGAGATTTGGACACAGACATGAAGACATGCAGGAAAGAAGGCCACCTAGTAATGGA

GGCAGGGTGACTCATGGAGCCACAAGCCAACGGACATCAAGTACCACTGGCCCCCATCAAAACTTTAAAAAGGCAGGGGA

AAGGTTCTTCTCTAGAGCCTCCAGAGGGAACAGGACTCTGTTAACACCTCAATCTCAGCCTTCCAGCCTCCAGACTGTGA

GAGAATAAAGCCATCAAGTTTGTGGTTATTAGTTACAGCAGGCTTAGGAAACTAATACAGCCAAACATTTCTCTAGATGC

TCAGTAACCAGGGCACAAGACAGAGACCCACACCCCCCAGTCAGATGATTCTGCATGAGACTTCCATTGTACATCTGAGT

GCATTGAGGAGCTCACCCCCAGCAGTTCCTATCATCCCAGCTCAGGCCTCAGACATCAAGAAGCAGGAGACAAGCCATCT

CTGTGTGTCCTGTCCAAACCCTGAGCCATAGACTTCATGGGCATAACAAAATGGTTTGTGTTTGAGCCCATAAAAGTTGG

AGTCCTTTGTTGTACAGCAATAGTAACTGCAACAAAAATCAAAATAATTCCTCTCTGATGGTGGGGCATGGGGAAGATGA

AGGAAAGAGATATAGTGAATCACATCTTTGTCAGAAAGACAGTGCCTTCATTTGAGTAGTTGGATTATGTATTTCCCACA

GCCATCTCTCAGGATAAACCTAAGCTTCTTCAGGATACAAGGAAATTTCCTGGAATCCTAAACATTTAGAAAAACATTTC

AAAAAACCTCGGTGTGGTACACTTGAAAGAATCTTCAGTTTCCTTGCCACGATAACAAATTAGCCACATATATCAACACT

GCACCAGGCATCTCCATAGTCACAGTTTGATGCAAGTTTCCAAATACCTCTGCAAAGCAGGCATTACTGTTACTATTTTA

CAAATGATGCCTGGAGAATATAGAAATTTCAACTCATGCTTTGAATCCTGAAAACCACTTGAAGGCCCAAATTCGGATGG

TCCATCTCCCAGAGTTGTCTCTAATAACAACACTGTGTAGAATGAGAAGGCTGAAATGCCAAGTGATCTCAGTGACCCCC

CTTTCATGATATTTTAAGACTACTGCCAAGAACAATGTTGTCTTACAGGCAGCATAGGGTAGTTATCAATGTAAGAGAAA

ACTGCCAGGATGCCTGCAAAGCCCACAATCGGAAGTTCAGAGCGGCAGGTCATAAATTATTTTTATAAGAGAAAAGGCCA

AGCAAGGGGCCGTTCTAACAGCCGTCTGGCATCCCTATCCTGCAACCTGGGCTGAGTTTGTACCGAATTTCTGCTTTGGG

GCAGAAATTCATACCAGAAAAATGTTTCGTGATGCATTTTGTTCAGTTGAATAGAGCCCAAGAATTTGTTCTAATTTAAA

TTAGATCACCTCTGAGCTGATATACTATAAAAATATTAATCAAGTAACCCCAGCAAATACTGATAGGGCTATCACCAGGG

ACT CAATGATATCACCAGGATGAAAAGAGACGGTGGCCTTTTTGGCTGCTATGATCCATAATTCCCACATAATCCACGTC

TATAAGTTAGAGAGAATTCTCAAGTACAGTTCAGTGCTAACCTGGAAACAAATACCCCTTATAAGGCTGCTAATCCACTT

AAAATAATCAGTTCCAGATTATTAATTTGGCACCCTCCCAAGGATACTACGAGGATCTGTCAGATTTCATGAACATATAG

GCAACAATAGACCAATACCCTAAACCCCAGAATCTAGATATGAAAGCTATGTAGAATCATACCCTTTCTAGTCCCCCACT

GCTTCATAATACAAATGACAAAAATTCAGCTCATGAGGATTAAGGGACTTTTCAGTGGGGCATCAGCTCACGGTTGCATA

CAGCCAGTCTTTTTTTTTTTTTTGAGACAGGGTCTTACTCTGCTACCCAGGCCACAGTGCACTGGGGCCATCTTGGCTC

ACTGCAGCCTCAACCTCCTGGGCTCAAGCAATCCTCCCACCTTAGCTTCCCAAATAGCTGAGATGACAGGTGCACACAAC

CATGCCTGGCTAATTTTTTATTTTTTGAAGAGATAGGGCCTCACTATGTTGCCCAGGCTGGAGCCCAGTCTTCAGAGATG

GAAACACATGCGTCTATGTCATTTACGAGTTTCATGGCCTGTGTCAAGCTAATTCTACCCCCTGAGCCTCAGCTTGTTTC US 6,989,232 B2 139 140

TABLE 11A-continued NOV11 nucleotide sequence.

TTCTTTTCAAAAATGAAGATGCCAGTGGTTCTCACCTCATATTGTTGCAGGAATGGAACAATGGGTGTGAGGGCACCTGG

TGTAGAGTAGGTGCTCAGTCACATGTAGTTGCTGTTGTTCTTCCCCAGATTATACAAACAAATTCTTGCTAAGCCAGGAT

GAAAACCCAGGTTTCAGGACTCTCAGGCTGATACTCATACCATGCCACTCCATCAAAGAGAAGGGCATTTTCCACCTCTA

GAAAACCCAGGTTTCAGGACTCTCAGGCTGATACTCATACCATGCCACTCCATCAAAGAGAAGGGCATTTTCCACCTGTA

TCCCTGGGTCTGTGTTCCAATCATTCTAAACTCTGACCAGCGCCTCATAAGTTGAATGAAATATAAACGACTTCAATAAA

TCTCTTTTTTTCCAAATAAATGAAGTTTATCAAGCTGTCCCATAACCCCGTGCTAAATCTATAAACTGTAGGCAGCTTCC

TTTGGGACCAACATTTCCTGGCTAATTAAAATGAATGTTGTATCGATGAAAGATTATTTTAAAATGGCACTGATAGTGTT

TAGACATTGTCATAACATCAGCCGGGTGGATCACTAATTTGCAAATTTTACTAAAGATCTTGCCAATTAAACCCCTTCTA

GACACTCTCAAACACACTGTCAGTGACAGCTGAGAGACCACATGGTAAAGACATGATCACATTAAATTCACACAAGACTG

TTCTCCCTGGAACGGCTGAGGGAGAGAGACGGCCGCACGTCCCCATAGCAGGTGCCACTGAGTCAACCCAGCCAGACTGT

CATAAGAGAAAAGCAAATTTTTGGGTTTTATTTTACCCTAACTGCTTTCCAAAACAAACAGTGGAAATTCTTCTAAAAAT

CTGTAGGAAATTATCCTGAAAAATTGTGTTTCTCTTTGAGAGACAAGTGAAGAGAAGTGAATCTCTGAACCAATCTGAAA

CTCGCCAAGGTACAAGTTGGCTCACCTGGGAGGTGGTGGGCTTTAGCCCAGAGTCTTCTGGGACAGTTTGTCCCTCTCCA

GGGGTTGCAGAAGCGGCAACAATAGTGATGAGTCTGTCTCTGGGAAGTCACCTCAATTAACAGCCACAGTGAATTCCTTT

AAAAGTTAACTTTACAACCTCTGCCCAGCAGTGGGTCACTGGCGGAAATTTTCCAGATTTGAAAGTCAAGGTAGCATGAC

ATGGCATGTATTTAAATGATCAGATTTCATGCAGATAACCCTAACAGCCAACACTTATTAAGGGCCTACCATGTGCATGA

TGTCATTTATTCATTACAACAATCCTATAAGATTGGTGCTATTATTATCCCCGAAGGACAGATGAGAAAATTAAGACTCA

GAGATATTGCAACTCATCCTTGTACACAGAGTTGCTATGCAATATAGCTGGAATTCTAAACCCGGTCCCACTGAGGGCCG

TGACCCTGGTGGTGAAACTCCACAGTGTGACAGGCCTTATCCCTGAGATTTGTGGTCTATCCACATACCAGTCCATGGGA

GATTATGGTCTTTTCTGATATCCATGTGTAATATTTCTCCACCACTGAGATATCCGGA

In a search of public sequence databases, the NOV11 PROSITE analysis of NOV11 predicts that the NOV11 nucleic acid Sequence has no hits using, an Expect value of 40 protein has one N-glycosylation site (Pattern-ID: ASN 1.0. Public nucleotide databases include all GenBank data GLYCOSYLATION PS00001 (Interpro)); two Protein bases and the GeneSeq patent database. kinase C phosphorylation sites (Pattern-ID: PKC A disclosed NOV11 polypeptide (SEQ ID) NO:71) PHOSPHO SITE PS00005 (Interpro)); and two encoded by SEQ ID NO:70 has 139 amino acid residues and N-myristoylation sites (Pattern-ID: MYRISTYL PS00008 is presented in Table 11B using, the one-letter amino acid 45 (Interpro), code. SignalP, Psort and/or Hydropathy results predict that NOV11 has no known signal peptide and is likely to be Table 11C lists the domain description from DOMAIN localized to the mitochondrial matrix space with a certainty analysis results against NOV11. This indicates that the of 0.4344. In alternative embodiments, the NOV11 protein 50 is localized to a microbody (peroxisome) with a certainty of NOV11 sequence has properties similar to those of other 0.3191; a lysosome (lumen) with a certainty of 0.1589; or proteins known to contain this transposase 17 domain. the mitochondrial inner membrane with a certainty of 0.1162. NOV11 has a molecular weight of 15546.1 Daltons.

TABLE 11B Encoded NOV11 protein sequence. (SEQ ID NO: 71.) MCCGSRCNLHCSPASTFPCFLTNLHPAPHATFTQAFNRSAVSSHKLAQVVEMVSEVGHLSHSHLLLPKVSHVTKLQIIH

KQKTKSRLTKAMWNWDTWGQLNTLOWSAVRFEALKAEINGQIFKGKGYRCWOWSPRQMDL US 6,989,232 B2 141 142

TABLE 11C Domain Analysis of NOV11 PFAM HblM Domain Analysis of NOV11 Model Description Score E-value Transposase 17 (InterPro) Transposase IS200 like - 42.6 95 PRODOM. Domain Analysis of NOV11

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs : Score P(N) pram: 29481 p36 (1) AADR RHOPA-ANAEROBIC AROMATIC DEGRA. . . 51 O. 61 pram: 20370 p36 (1) YVAU VACCC-HYPOTHETICAL 8.8 KD PROT. . . 49 O 80 pram: 44.828 p36 (1) YM91. SCHPO-HYPOTHETICAL 91 KD PROTE. . . 49 O 80 pram: 28458 p36 (1) PR1 MEDTR-PATHOGENESIS-RELATED PROT. . . 47 O 93 pram: 29 156 p36 (1) POL SMRVH-POL POLYPROTEIN (REVERSE. . . 46 O 97 BLOCKS Protein Domain Analysis AC# Description Strength Score BLO 1280E O Glucose inhibited division protein. A family p 1592 1031 BLOO884D 0 Osteopontin proteins. 1466 102.7 BLOO 130E O Oracil-DNA glycosylase proteins. 1320 1 OO6 BLOO 441E O Chalcone and stilbene synthases proteins. 20 40 1 OOO

Table 11D provides percent homology to the domains According to a BlastP analysis, NOV11 has 38 of 64 aa identified in Table 11C. residues(59%) identical to, and 49 of 64 (76%) positive

TABLE 11D

ProDom BLASTP results for NOW11

ProDom Length Identity Positive Identifier Protein/Organism (aa) (%) (%) Expect pram: 29481 p36 (1) AADR RHOPA-DNA- 47 12/31 13/31 O 95 binding ANAEROBIC AROMATIC (38%) (41%) DEGRADATION REGULATOR pram: 20370 p36 (1) YVAU VACCC 53 10/20 12/20 1 .. 6 HYPOTHETICAL 88 KO PROTEIN (50%) (60%) pram: 44.828 p36 (1) YM91 SCHPO HYPOTHETICAL 91. KD PROTEIN IN 34 8/20 13/20 1 .. 6 COB INTRON. HYPOTHETICAL (40%) (65%) PROTEIN; MITOCHONDRION pram: 28458 p36 (1) PR1 MEDTR- 45 12/26 16/26 2.7 PATHOGENESIS-RELATED PROTEIN (46%) (61%) PR-1 PRECURSOR pram: 29156 p36 (1) POL SMRVH-POL POLY PROTEIN (REVERSE 34 9/28 14/28 35 TRANSCRIPTASE (EC 2.7. 7.49); (32%) (50%) ENDONUCLEASE)

NOV11 polypeptide Sequence produced no hits in a with, the 102 aa Human protein sequence SEQ ID BLASTPsearch for homology (Expect value setting=1.0) to NO:18455 from PN=EP1074617-A2 (patp: AAB95670, the GenBank and EMBL public databases. Other BLAST Expect=4.6e-16); 35 of 58 aa residues(60%) identical to, results did find homologous Sequences from the Patp 65 and 42 of 58 (72%) positive with, the 101 aa an secreted database, which is a proprietary database that contains protein, SEQ ID NO: 4718 from PN=EP1033401-A2 (patp: Sequences published in patents and patent publications. AAG00637, Expect=5.1e-15);20 of 61 aa residues(32%) US 6,989,232 B2 143 144 identical to, and 35 of 61 (57%) positive with, the 136 aa protein any of whose residues may be changed from the Arabidopsis thaliana protein fragment SEQ ID NO: 42276 corresponding residue shown in Table 11B while still encod (patp:AAG34708, Expect=0.51); 20 of 61 (32%) identical ing a protein that maintains its transposase-like activities and to, and 35 of 61 (57%) positive with, the 150 aa Arabidopsis physiological functions, or a functional fragment thereof. In thaliana protein fragment SEQ ID NO: 42275 the mutant or variant protein, up to about 60% percent of the (patp:AAG34707, Expect=0.71); 20 of 61 (32%) identical residues may be So changed. to, and 35 of 61 (57%) positive with, the 162 aa Arabidopsis The invention further encompasses antibodies and anti thaliana protein fragment SEQ ID NO: 42274 body fragments, Such as F, or (F), that bind immuno (patp:AAG34706. Expect=0.89); 20 of 61 (32%) identical Specifically to any of the proteins of the invention. to, and 35 of 61 (57%) positive with, the 270 aa Arabidopsis The above defined information for this invention Suggests thaliana protein fragment SEQ ID NO: 21878 that this transposase-like protein (NOV11) may function as (patp:AAG19901, Expect=2.5); 13 of 36 (36%) identical to, a member of a “transposase family'. Therefore, the NOV11 and 17 of 36 (47%) positive with, the 66 aa Human nucleic acids and proteins identified here may be useful in endometrium tumour EST encoded protein 343 potential therapeutic applications implicated in (but not (patp:AAY60283, Expect=4.3); 10 of 26 (38%) identical to, 15 limited to) various pathologies and disorders as indicated and 18 of 26 (69%) positive with, the 64 aa Gene 8 human below. The potential therapeutic applications for this inven Secreted protein homologous amino acid Sequence #113 tion include, but are not limited to: transposase related Bos taurus (patp: AAB39364, Expect=5.6); and 10 of 26 research tools, for all tissues and cell types composing (but (38%) identical to, and 18 of 26 (69%) positive with, the 64 not limited to) those defined herein. aa Human Secreted protein Sequence encoded by gene 8 The protein similarity information, eXpression pattern, SEQ ID NO:114 (patp:AAB39365, Expect=5.6). Patp cellular localization, and map location for the protein and results include those listed in Table 11E. nucleic acid disclosed herein Suggest that this novel intra

TABLE 11E Patp alignments of NOV11

Smallest Suml High Prob. Sequences producing High-scoring Segment Pairs : Score P(N) patp:AAG34708 Arabidopsis thaliana protein fragment SEQ I . . . 74 O - 40 patp:AA03 4707 Arabidopsis thaliena protein fragment SEQ I . . . 74 0.51 patp:AAG3 4706 Arabidopsis thaliana protein fragment SEQ I . . . 74 O59 patp: AAG19901 Arabidopsis thaliana protein fragment SEQ I . . . 74 O 91 patp: AAY 60283 Human endometrium tumour EST encoded protei . . . 53 O99

The disclosed NOV11 nucleic acid encoding a cellular transposase domain containing protein-like NOV11 transposase-like protein includes the nucleic acid whose protein may have important Structural and/or physiological Sequence is provided in Table 11A, or a fragment thereof. functions characteristic of the novel transposase domain The invention also includes a mutant or variant nucleic acid 45 containing protein family. Therefore, the NOV11 nucleic any of whose bases may be changed from the corresponding acids and proteins are useful in potential diagnostic and base shown in Table 11A while still encoding a protein that therapeutic applications and as a research tool. These include Serving as a specific or Selective nucleic acid or maintains its transposase-like activities and physiological protein diagnostic and/or prognostic marker, wherein the functions, or a fragment of Such a nucleic acid. The inven presence or amount of the nucleic acid or the protein are to tion further includes nucleic acids whose Sequences are 50 be assessed. These also include potential therapeutic appli complementary to those just described, including nucleic cations Such as the following: (i) a protein therapeutic, (ii) acid fragments that are complementary to any of the nucleic a Small molecule drug target, (iii) an antibody target acids just described. The invention additionally includes (therapeutic, diagnostic, drug targeting/cytotoxic antibody), nucleic acids or nucleic acid fragments, or complements (iv) a nucleic acid useful in gene therapy (gene delivery/ thereto, whose structures include chemical modifications. 55 gene ablation). (V) an agent promoting tissue regeneration in Such modifications include, by way of nonlimiting example, vitro and in Vivo, and (vi) a biological defense weapon. The NOV11 nucleic acids and proteins of the invention modified bases, and nucleic acids whose Sugar phosphate are useful in potential therapeutic applications including but backbones are modified or derivatized. These modifications not limited to those provided in Example 2, and/or other are carried out at least in part to enhance the chemical pathologies and disorders. For example, a cDNA encoding stability of the modified nucleic acid, such that they may be 60 the transposase-like protein (NOV11) may be useful in gene used, for example, as antisense binding nucleic acids in and protein therapy, and the transposase-like protein therapeutic applications in a Subject. In the mutant or variant (NOV11) may be useful when administered to a subject in nucleic acids, and their complements, up to about 60% need thereof. The NOV11 nucleic acid encoding the percent of the bases may be So changed. transposase-like protein, and the transposase-like protein of The disclosed NOV11 protein of the invention includes 65 the invention, or fragments thereof, may further be useful in the transposase-like protein whose Sequence is provided in diagnostic applications wherein the presence or amount of Table 11B. The invention also includes a mutant or variant the nucleic acid or the protein are to be assessed. US 6,989,232 B2 145 146 NOV11 nucleic acids and polypeptides are further useful help in understanding of pathology of the disease and in the generation of antibodies that bind immuno development of new drug targets for various disorders. specifically to the novel NOV11 Substances for use in NOV12 therapeutic or diagnostic methods. These antibodies may be generated according to methods known in the art, using A disclosed NOV12 nucleic acid of 2760 nucleotides prediction from hydrophobicity charts, as described in the (also referred to as 87917235 or 13373979) encoding a "Anti-NOVX Antibodies' section below. The disclosed novel Novel Leucine Zipper Containing Type II membrane NOV11 protein has multiple hydrophilic regions, each of like protein-like protein is shown in Table 12A. An open which can be used as an immunogen. In one embodiment, a reading frame was identified beginning with an ATG initia contemplated NOV11 epitope is from about amino acids 25 tion codon at nucleotides 1789-791 and ending with a TGA to 45. In additional embodiments, NOV11 epitopes are from codon at nucleotides 2101-2103. A putative untranslated about amino acids 70 to 105 and from about amino acids 1 region upstream from the initiation codon and downstream to 139. These novel proteins can be used in assay systems for from the termination codon is underlined in Table 12A, and functional analysis of various human disorders, which will the Start and Stop codons are in bold letters.

TABLE 12A NOV12 Nucleotide Sequence. (SEQ ID NO: 72) TCTGCCTCCTGGGTTGAAGCGATTCTTCTGCCTCAGCCTCCTGAGTAGCTGGGACTACAAGCAGGCGCCACTACGCCTGGC

TAATTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATATTGGCCAGGATGGTTTCAAACTCCTGACCTCATGATCTGCCC

ACCTAGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCTGGTGAGGACTCCATTTTCTACCCCTAGGCTAA

AGAGCCTGGAGGATTATAGCTTACAGAGCAGAGAAGAACTCTGATACTCATAGGTGCATAGTGCTAGCTAGTCAGTAGACA

ATACTTAGATAATTCATTTTCTGATTTCTGACATTAGGAGAGGGGGGTTTTGTTTGTTTAATAACAGCCTTCATAG

ATCTTTGCAAACAGCCTTGAATGAGGAATGTCCTTATGTTTCAGGGAACATATCAGGCCTGGAAGCAGCTTTTTTAGGATA

AAGCTCACTCATTGAACTTCAAATGCACTGACTCCAACCATTTCCTAAAATAAGGAAAATCTGTCTGCACAGACGGCATTT

TCACTCTCCTGAATGTTTTCTGTTGGTTGGTGGTTGGTTGGTTTTATTGGTTGGTTGGTTTTGATACAGAGTGAACAAT

ATCATGAAGAATATTAGTCAGAAATGGGGCACAGGTCTCAAGCAGGTCTTGGGACCTTGGGCTATTAATCTTTCTGGGCCT

TAATTTACTTATCTATAACATAAAAGGACCTTAATATATGATTGAGAAGGCCCAAACCACCTTTAAAATTTAGATCTGTGT

CTCCCCATCAGACCTCTCTGGAGACACAGGATCTTATTCAACCTCACACAGATTCTTGGGTTTCTGCCATTCACATCTACA

TTGAAAATTCTCCCATAAACTTTATACAAGTCCTTATGGAATCATTAAAGCTTTGCAAGAAAACAACAGTACCCATTATAA

AAGCCCAAGAAACAGAGAAGAAAATCATGTTTTATAACCCAAGAAATCTGTCCAAATCCTAGAATTTTTCTTCAGAGTACA

TCACAAGAAGGAACAGTCTCTTCCTTCCTAGTGGGAAAGTCAGGGTTTCTTTCATTTCCACCTTGTTCGCTTGTAACCGCT

CTCACCAGGCAAAGTTCTGAGCAAGTGAGATGGACTCATCTCGGAACTCCAGGCTGTGTTTACATAATTGGTAAAAGAAAC

ATTCCAATCCCATTCCTTCGTCAGCTCCGACAGACCAACCAGCATCCCCCTCCCACTTGCCACTTTGATAGGGGTGACTGG

TATCTCCATCTCCTTACTTTGTTGATCATGTTTCGGGTTTCCAATTGCGTCAATTTAACGGTTGCCAATAATTCTGTC

ATCTGAGGGGAAAGCAGAATCTCAACTGAACATGCAGATGTCCTATTGAGACTTTGCCCATAAGGGAGCGTCTTTGGTGCT

TAAAATTCCATCTTTTGGACCTCATATCAGTTGATGTTTTTAGTTGCATCGGAAACCAACTCTAAGTGATTTAAGCAGGAG

AGAAAGTTATTTAAGGATATTTATAGTTCACAGAATCTCTGGAGGAGCGGGGGGCTAGAAAACCAGACTTGAAGACTACAC

AGAGAGACTCCGAGTCCCCCTGGGACTGACCTGAGATGACCAGGGAGCTGGTATTTTTAGCTTCCAGAGGTAAATAACAGC

CTTCACTTCCATCAAAACTCATTAGGTAGAAAACACACCAAACATGGGAAAGGCGTTCCGGAGCTGGGCTACCAAAGAGAA

TAATAAATTCACTATAGTTTCACTCTAGTTTTGACCACCCTGAAACATTTTCTTTTTCCTCCAGGAGCCCAAAA

TTACAGTTAAGTCTACAGTCAGACAGAAGGAAACTGGCATTTATTAAACACCAACTTTGTGCCTGGAAGATTCACTTACAA

TATCATAACTTACAATAACTCTGCAATAGGATCTCATTATCAGCATTCTTTTTTTGTTTGTTTGGTTGGTTGGTTTTG

GTGGTTTTAGTGTCAGGGTCTCACTCTGTTGCTCAGGCTGGAGCATGGTGGCATGATCATAACTCACTGCAGCCTTGAACT US 6,989,232 B2 147 148

TABLE 12A-continued NOV12 Nucleotide Sequence.

CCTGGAATCAAATGATCCTCCCACCTCACCTCCAAGTAGCTGGGACTACAGGCATGCACCATCATGCCCAGCTAATTTTCT

TTTTCTTTTTTTTTAAGAGGTAGGATCTTGCTATAATGCCCAGGGGTCCAAACTCCTGGTATCAAGTGATCCTCCCAT

CTTGGCCTCCCAAAGTGCGGGAATTACAGGTGTGAACCACTGCACCCAACCTCATTCTCAGCATTCTTATTATGTTTTGTC

TTATTATCCTCCAAGGATAGGTTAAGTAATTGTTATGGGTTGAATTGGGTCTCCCCAAAATTCCTATGTTAAAGTCCTAAT

CCCAGTATCTCAAAATGAAGGTAAGGTCTTTATAGAGGTAATCAAGTTAAAATGATGTTATTAGGATGGGCATTAATTCAA

TATGACTAGTCTCCTTATAAAAAGCAGACATTCACACACAAGGACACATGCACACAGGGAATATGATACCTGAGATTAGGG

TGATGCGTCTGCAGGCCAAAGAATGCCAAAGACTGCCAGCACACCACCAGAAACTGGGGGAGAGGCATGGAACGGATTCTT

CTTCACAGCTCTCAGAAAGAACCATGCTGCTGACACCTTGATCTTGGAATTCTAGCCACTGGAACTGTAAAACAATAAATT

TCTATT

The NOV12 nucleic acid was identified on chromosome In a Search of Sequence databases, it was found, for 17 as run against the Genomic Daily Files made available by example, that the nucleic acid Sequence of this invention has GenBank or from files downloaded from the individual 168 of 252 bases (66%) identical to a gb:GENBANK Sequencing centers. Exons were predicted by homology and 25 ID:HS435C23|acc:Z92844.1 mRNA from Homo sapiens the intron/exon boundaries were determined using Standard (Human DNA sequence from PAC 435C23 on chromosome genetic rules. Exons were further Selected and refined by X. Contains ESTs). No sequences were found in the EMBL, means of similarity determination using multiple BLAST PIR or GenBank databases that had homology to the NOV12 (for example, tRlastN, BlastX, and BlastN) searches, and, in polypeptide in an unfiltered BLASTP search (expectation Some instances, GeneScan and Grail. Expressed Sequences value=1.0 for input parameter). from both public and proprietary databases were also added when available to further define and complete the gene Table 12C lists the domain description from DOMAIN Sequence. The DNA sequence was then manually corrected analysis results against NOV12. This indicates that the for apparent inconsistencies thereby obtaining the Sequences 35 encoding the full-length protein. The NOV12 nucleic acid NOV12 sequence has properties similar to those of other was further mapped to the p11 region of chromosome 17, a proteins known to contain this domain. locus associated with prostate cancer (OMIM 176807) and congenital slow-channel myosthenic syndrome (OMIM 601462). 40 A disclosed NOV12 polypeptide (SEQ ID NO:73) encoded by SEQ ID NO:72 is 104 amino acid residues and is presented using the one-letter amino acid code in Table 12B. SignalP, Psort and/or Hydropathy results predict that 45 NOV12 does not contain a known signal peptide and in the likely to be localized in the cytoplasm with a certainty of 0.8387 predicted by PSORT. In alternative embodiments, NOV12 is likely to be localized to the mitochondrial inner membrane with a certainty of 0.8387, to a microbody 50 (peroxisome) with a certainty of 0.7480, the plasma mem brane with a certainty of 0.4400, or the mitochondrial intermembrane space with a certainty of 0.3751. The NOV12 hydropathy profile is characteristic of the leucine 55 Zipper gene family. A NOV12 polypeptide has a molecular Height of 11855.7 Daltons.

TABLE 12B Encoded NOV12 protein sequence. (SEQ ID NO: 73) MFTIVSSSSFWPSLKHFLFPPGASKLQLSLQSDRRKLAFIKHQLCAWKIHLQYHNLYNNSAIWISLSAFFFCLFGWLVLV

WLWSGSHSWAQAGAWWHDHNSLQP US 6,989,232 B2 149 150

TABLE 12C Domain Analysis of NOV12 PRODOM. Analysis pram: 49789 p36 (1) RED1 HUMAN//DOUBLE-STRANDED RNA-SPECIFIC EDITASE 1 (DSRNA ADENOSINE DEAMINASE) (RNA EDITING ENZYME 1). RNA EDITING; HYDROLASE; ZINC; RNA BINDING; REPEAT; ALTERNATIVE SPLICING 55 aa. Expect = 0.012, Identities = 12/23 (52%), Positives = 15/23 (65%) for aa of Query: 82 to 104, Sbjct: 1 to 23 prom:5031 p36 (5) NU4M (5) //OXIDOREDUCTASE NADH-UBIQUINONE CHAIN NAD UBIQUINONE MITOCHONDRION, 43 aa. Expect = 0. 63, Identities = 9/22 (40%), Positives = 12/22 (54%) for aa of Query: 56 to 77, Sbjct: 20 to 41 pram: 22836 p36 (1) NU1C SYNY3//NADH-PLASTOQUINONE OXIDOREDUCTASE CHAIN 1 (EC 1. 6.5.3). OXIDOREDUCTASE; NAD; PLASTOQUINONE; TRANSMEMBRANE, 28 aa. Expect = 0.83, Identities = 10/19 (52%), Positives = 14/19 (73%) for aa of Query: 8 to 26, Sbjct: 9 to 27 PROSITE Analysis

Pattern Name Pattern Position of NOV12 ASN GLYCOSYLATION PS00001 (Interpro) PDOC00001 NPSTP) 58 PKC PHOSPHO SITE PSOOOO5 (Interpro) PDOC00005 ST. RK 13, 32 LEUCINE ZIPPER PS00029 (Interpro) PDOCOOO29 L. {6}L. {6}L. {6}L 30 BLOCKS Analysis AC# Description Strength Score BLOO 435D Peroxidases proximal heme-ligand proteins. 1230 1101 BLOO 6O4C Synaptophysin/synaptoporin proteins. 1917 1030

BLOO 439D Acyltransferases ChoActase/COT/CPT family 1332 1029 BLOO 177C DNA topoisomerase II proteins. 1219 1021 BLOO 48 H IMP dehydrogenase/GMP reductase proteins. 1405 1016 PFam Analysis no hits above thresholds

Patp BLAST results for NOV12 include those listed in exist in an alpha-helical conformation. The leucine Side Table 12D. chains extending from one alpha-helix interact with those

TABLE 12D Patp alignments of NOV12

Smallest Sequences producing High-scoring Segment Pairs : Score Sum Prob. patp:AAG03340 Human secreted protein, SEQ ID NO: 7421-H . . . 68 OOOO28 patp:AAY27571 Human secreted protein encoded by gene No . . . 92 OOOO1 patp: AAB95648 Human protein sequence SEQ ID NO: 18400-Ho . . . 85 OOO 10 patp: AAB42720 Human ORFX ORF2484 polypeptide sequence SEQ . . . 81 OOO23 patp: AAG00591 Human secreted protein, SEQ ID NO: 4672-H . . . 81 OOO23

A structure, referred to as the “leucine Zipper, has been from a similar alpha helix of a Second polypeptide, facili proposed to explain how Some eukaryotic gene regulatory tating dimerization; the Structure formed by cooperation of proteins work. The leucine Zipper consist of a periodic these two regions forms a coiled coil. repetition of leucine residues at every seventh position over 65 The leucine Zipper pattern is present in many gene regu a distance covering eight helical turns. The Segments con- latory proteins, e.g. the CCATT-box and enhancer binding taining these periodic arrays of leucine residues seem to protein (C/EBP), the cAMP response element (CRE) bind US 6,989,232 B2 151 152 ing proteins (e.g. CREB, CRE-BPI, ATFs), the Jun/AP1 its Leucine Zipper Containing Type 11 membrane like family of transcription factors, the yeast general control protein-like activities and physiological functions, or a func protein GCN4, the fos oncoene and the fos-related proteins tional fragment thereof. In the mutant or variant protein, up fra-1 and foS B. the C-myc, L-myc and N-myc oncogenes, to about 37% of the NOV12 amino acid residues may be so and the octamer-binding transcription factor 2 (Oct-2/OTF changed. 2). Thus, leucine Zipper-like proteins are involved in cell The NOV12 nucleic acids and proteins of the invention proliferation, migration and differentiation. Leucine Zipper are useful in potential therapeutic applications implicated in like proteins may thus be implicated in the onset and/or cancer, e.g. prostate cancer, diabetes, abnormal wound maintenance of diseases including cancer, e.g. prostate healing, congenital slow-channel myosthenic Syndrome, cancer, diabetes, abnormal wound healing, congenital slow inflammation and/or other pathologies and disorders. For channel myosthenic Syndrome, inflammation and/or other example, a cDNA encoding the leucine Zipper-like NOV12 diseases and disorders. The consensus pattern for leucine protein may be useful in detecting proState cancer, and the Zipper-like proteins is: L-X(6)-L-X(6)-L-X(6)-L. leucine Zipper-like protein may be useful when administered The above defined information for this invention Suggests to a Subject in need thereof. By way of nonlimiting example, that these Leucine Zipper Containing Type II membrane 15 the compositions of the present invention will have efficacy protein-like proteins (NOV12) may function as a member of for treatment of patients Suffering from prostate cancer or a “leucine zipper family”. Therefore, the NOV12 nucleic congenital slow-channel myosthenic syndrome. The NOV12 acids and proteins identified here may be useful in potential nucleic acid encoding leucine Zipper-like protein, and the therapeutic applications implicated in (but not limited to) leucine Zipper-like protein of the invention, or fragments various pathologies and disorders as indicated herein. The thereof, may further be useful in diagnostic applications, potential therapeutic applications for this invention include, wherein the presence or amount of the nucleic acid or the but are not limited to: cancer, e.g. prostate cancer, diabetes, protein are to be assessed. Additional disease indications and abnormal wound healing, congenital Slow-channel myOS tissue expression for NOV12 is presented in Example 2. thenic Syndrome, inflammation and/or other diseases and disorders. 25 NOV12 nucleic acids and polypeptides are further useful The novel nucleic acid encoding a Leucine Zipper Con in the generation of antibodies that bind immuno taining Type II membrane like protein-like NOV12 protein specifically to the novel NOV12 substances for use in includes the nucleic acid whose Sequence is provided in therapeutic or diagnostic methods. These antibodies may be Table 12A, or a fragment thereof. The invention also generated according to methods known in the art, using includes a mutant or variant nucleic acid any of whose bases prediction from hydrophobicity charts, as described in the may be changed from the corresponding base shown in “Anti-NOVX Antibodies' section below. For example the Table 12A while still encoding a protein that maintains its disclosed NOV12 proteins have multiple hydrophilic Leucine Zipper Containing Type II membrane like protein regions, each of which can be used as an immunogen. In one like activities and physiological functions, or a fragment of embodiment, a contemplated NOV12 epitope is from about Such a nucleic acid. The invention further includes nucleic 35 amino acids 20 to 40. In additional embodiments, NOV12 acids whose Sequences are complementary to the Leucine epitopes are from about amino acids 20 to 25 and from about Zipper Containing Type II membrane like protein-like amino acids 30 to 40. This novel protein also has value in NOV12 nucleic acid Sequence, including nucleic acid frag development of powerful assay Systems for functional ments that are complementary to any of the nucleic acids just analysis of various human disorders, which will help in described. The invention additionally includes nucleic acids 40 understanding of pathology of the disease and development or nucleic acid fragments, or complements thereto, whose of new drug targets for various disorders. Structures include chemical modifications. In the mutant or NOV13 variant NOV12 nucleic acids, and their complements, up to A disclosed NOV13 nucleic acid of 1183 nucleotides (also about 34% of the bases may be so changed. referred to as 87919652) encoding a novel tyrosine kinase The novel protein of the invention includes the Leucine 45 like protein is shown in Table 13A. An open reading, frame Zipper Containing Type II membrane like protein-like was identified beginning with an ATG initiation codon at NOV12 protein whose sequence is provided in Table 12B. nucleotides 398-400 and ending with a TAG codon at The invention also includes a mutant or variant protein any nucleotides 1181-1183. A putative untranslated region of whose residues may be changed from the corresponding upstream from the initiation codon is underlined in Table NOV12 residue while still encoding a protein that maintains 13A, and the Start and Stop codons are in bold letters.

TABLE 13A NOV13 nucleotide sequence. (SEQ ID NO: 74) AGCTAGAGCTCCAAGGACCCCACGCCTGTGTCTCTGTGACAGAGCTCAAAGGGCCCTGGGCCTTCCCTCCCTGGCTCGGC

TGTGCTTGGGAGGGTTCCCCAGTCCAGAATCCCTAAGGAGCATGGGGCAGCTGATCCATCCCTGGTGTACAAACTGCTGA

CTGCAGACAGATGCTGAGCTACCCAAACCAACACCTAGCCTCTCCCTGAAGATCCTCCCAGGCTGAGAGAGTTCTGGGTG

TCCTAGGACCAAGGACACTGGCAGACTTCCAGAAGGGCCCCCAAAGCCCTAACCTGTCCAGCCAGAGCATGCGTCTCAGC

AGAGCTGTCTTCCCAAGCCTTTGATGACAAACCAATTTCCCTCGATGATGTGCTTCTGAGTGCTCTGCTGAGGAACAAG

GGAAGTCTGCCCAGCAGAAGAAAATCTCTGCCAAGCCCAAGCTTGAGTTCCTCTGTCCAAGGCCAGGGACCTGTGACCAT US 6,989,232 B2 153 154

TABLE 13A-continued NOV13 nucleotide sequence.

GGAAGCAGAGAGAAGCAAGGCCACAGCCGTGGCCCTGGGCAGTTTCCCGGCAGGTGGCCCGGCCGAGCTGTCGCTGAGAC

TCGGGGAGCCATTGACCATCGTCTCTGAGGATGGAGACTGGTGGACGGTGCTGTCTGAAGTCTCAGGCAGAGAGTATAAC

ATCCCCAGCGTCCACGTGGGCAAAGTCTCCCATGGGTGGCTGTATGAGGGCCTGAGCAGGGAGAAAGCAGAGGAACTGCT

GTTGTTACCTGGGAACCCTGGAGGGGCCTTCCTCATCCGGGAGAGCCAGACCAGGAGAGGCTCTTACTCTCTGTCAGTCC

GCCTCAGCCGCCCTGCATCCTGGGACCGGATCAGACACTACAGGATCCACTGCCTTGACAATGGCTGGCTGTACATCTCA

CCGCGCCT CACCTTCCCCTCACTCCAGGCCCTGGTGGACCATTACTCTGAGCTGGCGGATGACATCTGCTGCCTACT CAA

GGAGCCCTGTGTCCTGCAGAGGGCTGGCCCGCTCCCTGGCAAGGATATACCCCTACCTGTGACTGTGCAGAGGACACCAC

TCAACTGGAAAGAGCTGGACAGCTCCCTCCTGTTTTCTGAAGCTGCCACAGGGGAGGAGTCTCTTCTCAGTGAGGGTCTC

CGGGAGTCCCTCAGCTTCTACATCAGCCTGAATGACGAGGCTGTCTCTTTGGATGATGCCAG

The NOV13 nucleic acid was identified on chromosome Sistencies thereby obtaining the Sequences encoding the 20 by comparing it to the human genome database. Exons full-length protein. were predicted by homology and the intron/exon boundaries A disclosed NOV13 polypeptide (SEQ ID NO:75) were determined using Standard genetic rules. Exons were 25 encoded by SEQID NO:74 has 261 amino acid residues and further selected and refined by means of similarity determi is presented in Table 13B using the one-letter amino acid nation using multiple BLAST (for example, tElastN, code. SignalP, Psort and/or Hydropathy results predict that BlastX, and BlastN) searches, and, in Some instances, NOV13 does not have a known signal peptide and is likely GeneScan and Grail. Expressed Sequences from both public to be localized in the mitochondrial matrix Space with a and proprietary databases were also added when available to certainty of 0.4737. In an alternative embodiment, NOV13 further define and complete the gene Sequence. The DNA is likely to be localized in the cytoplasm with a certainty of Sequence was then manually corrected for apparent incon O4500.

TABLE 13B Encoded NOV13 protein sequence. (SEQ ID NO: 75) MGSLPSRRKSLPSPSLSSSWOGQGPWTMEAERSKATAVALGSFPAGGPAELSLRLGEPLTIVSEDGDWWTVLSEWSGREYN

IPSVHVGKVSHGWLYEGLSREKAEELLLLPGNPGGAFLIRESQTRRGSYSLSWRLSRPASWDRIRHYRIHCLDNGWLYISP

RLTFPSLOALWDHYSELADDICCLLKEPCVLQRAGPLPGKDIPLPWTWQRTPLNWKELDSSLLFSEAATGEESLLSEGLRE

SLSFYISLNDEAWSLDDA

The reverse complement for NOV13 is presented in Table 13C.

TABLE 13C NOV13 reverse complement. (SEQ ID NO: 76) CTAGGCATCATCCAAAGAGACAGCCTCGTCATTCAGGCTGATGTAGAAGCTGAGGGACTCCCGGAGACCCTCACTGAGAA

GAGACTCCTCCCCTGTGGCAGCTTCAGAAAACAGGAGGGAGCTGTCCAGCTCTTTCCAGTTGAGTGGTGTCCTCTGCACA

GTCACAGGTAGGGGTATATCCTTGCCAGGGAGCGGGCCAGCCCTCTGCAGGACACAGGGCTCCTTGAGTAGGCAGCAGAT

GTCATCCGCCAGCTCAGAGTAATGGTCCACCAGGGCCTGGAGTGAGGGGAAGGTGAGGCGCGGTGAGATGTACAGCCAGC

CATTGTCAAGGCAGTGGATCCTGTAGTGTCTGATCCGGTCCCAGGATGCAGGGCGGCTGAGGCGGACTGACAGAGAGTAA

GAGCCTCTCCTGGTCTGGCTCTCCCGGATGAGGAAGGCCCCTCCAGGGTTCCCAGGTAACAACAGCAGTTCCTCTGCTTT

CTCCCTGCTCAGGCCCTCATACAGCCACCCATGGGAGACTTTGCCCACGTGGACGCTGGGGATGTTATACTCTCTGCCTG

AGACTTCAGACAGCACCGTCCACCAGTCTCCATCCTCAGAGACGATGGTCAATGGCTCCCCGAGTCTCAGCGACAGCTCG US 6,989,232 B2 15S 156

TABLE 13C-continued NOV13 reverse complement.

GCCGGGCCACCTGCCGGGAAACTGCCCAGGGCCACGGCTGTGGCCTTGCTTCTCTCTGCTTCCATGGTCACAGGTCCCTG

GCCTTGGACAGAGGAACTCAAGCTTGGGCTTGGCAGAGATTTTCTTCTGCTGGGCAGACTTCCCATTGTTCCTCAGCAGA

GCACTCAGAAGCACATCATCGAGGGAAATTGGTTTGTCATCAAAGGCTTGGGAAGACAGCTCTGCTGAGACGCATGCTCT

GGCTGGACAGGTTAGGGCTTTGGGGGCCCTTCTGGAAGTCTGCCAGTGTCCTTGGTCCTAGGACACCCAGAACTCTCTCA

GCCTGGGAGGATCTTCAGGGAGAGGCTAGGTGTTGGTTTGGGTAGCTCAGCATCTGTCTGCAGTCAGCAGTTTGTACACC

AGGGATGGATCAGCTGCCCCATGCTCCTTAGGGATTCTGGACTGGGGAACCCTCCCAAGCACAGCCGAGCCAGGGAGGGA

AGGCCCAGGGCCCTTTGAGCTCTGTCACAGAGACACAGGCGTGGGGTCCTTGGAGCTCTAGCT

In a search of public sequence databases, the NOV13 amino acid Sequence has 175 of 197 amino acid residues (89%) identical to, and 175 residues (89%) positive with, the 20 197 amino acid residue human protein tyrosine kinase (Accession No. Q9H135). Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. It was also found that NOV13 had homology to the amino as acid sequences shown in the BLASTP data listed in Table 13D.

TABLE 13D

BLAST results for NOV13 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect CDNA: FLJ21992 FIS, 261 260/261 260/261, 1e-149 BAB152O1.1 CLONE HEPO6554. On (100%) (100%) Sapiens. 6/2001 DJ977B1.1 (NOVEL 197 196/197 196/197, 1e-113 CABFS365.1 PROTEIN TYROSINE (99%) (99%) KINASE WITHSRC HOMOLOGY DOMAIN 2DOMAINS) (FRAGMENT). homo Sapiens. 6/2001 A93OOO9E21RIK 179 148/181 159/181, 8e-79 BAB32223.1 PROTEIN. S. (82%) (88%) musculus. 6/2001 Q60898; U29056; SRC-LIKE ADAPTER 281 106/253 148/253, 2e-47 AAA82756.1 PROTEIN. S. (42%) (58%) musculus. 6/2001 PUTATIVESRC-LIKE 276 96/219 135/219, 1e-46 AAC50357.1; ADAPTER PROTEIN (44%) (62%) AAC27662.1; (SLAP). homo Sapiens. BAA 13758.1 6/2001

The homology of these sequences listed in Table 13D is shown graphically in the ClustalW analysis shown in Table 13E.

TABLE 1.3E

Information for the ClustalW proteins

1) NOV13 (SEQ ID NO: 75) (SEQ ID NO: 77) (SEQ ID NO: 78) (SEQ ID NO: 79) (SEQ ID NO: 80) 6) Q13239 (SEQ ID NO: 89) US 6,989,232 B2 157 158

TABLE 1.3E-continued Information for the ClustalW proteins

NOW13 1. MGSLPSRRKSLPSPSLSSSWQGQGPWTMEAERSKATAVALGSFPAGGPAELSLRLGEPLT 60 Q9H6Q3 1 MGSLPSRRKSLPSPSLSSSWQGQGPWTMEAERSKATAVALGSFPAGGPAELSLRLGEPLT 60 Q9H135 1. ------1 Q9D129 1 ------1 Q6 0898 1 MG---NSMKSTSPPS------ERPLSSSEGLESDFLAWLTDYPSPDISPPIFRRGEKLR 50 Q13239 1 MR---NSMKSTPAPA------ERPLPNPEGLDSDFLAWLSDYPSPDISPPIFRRGEKLR 50

NOW13 61 I 120 Q9H6Q3 61 I 120 Q9H135 1 56 Q9D129 1 39 Q6 0898 51 I 110 Q13239 51 I 110

NOW13 21 DNGL A 180 Q9H6Q3 121 DNGL g Q9H135 57 DNGL e 116 Q9D129 40 DNGL Q6 0898 111 PNNWY D 164 Q13239 111 PNNWY ED 164

NOW13 81 iRTPLN LDSSLLFSEA-ATG-- 232 RTPLN Dssits re- 232 Q9H135Q9H6Q3 117181 Ript.NETDssits.Arc 168 Q9D129 40 fissiNIDRStripse 152 Q6 0898 111 KiriwiRwsRiocese NPL 223 Q13239 111 QKTWDWRRVSRLQEDPEG-TENPL 218

NOW13 232 261 Q9H6Q3 232 261 Q9H135 168 197 Q9D129 152 179 Q6 0898 224 RKKKSLSLMYTGSKRKSSFFSAPQYFED 281 Q13239 219 RKKKSISLMYGGSKRKSSFFSSPPY FED 276

Table 13F lists the domain description from DOMAIN analysis results against NOV13. This indicates that the 40 TABLE 13F-continued NOV13 sequence has properties similar to those of other proteins known to contain this domain. Domain Analysis of NOV13 TABLE 1.3F PKC PHOSPHO SITE PS00005 ISTRK 6 (Interpro) PDOC00005 Domain Analysis of NOV13 CK2 PHOSPHO SITE PS00006 (ST-2}|DE 4 (Interpro) PDOC00006 PFAM Analysis BLOCKS Analysis Model Description Score E-value o 50 SH2 (InterPro) Src homology domain 2 110.5 4.6e-37 AC# Description Strength. Score SH3 (InterPro) SH3 domain 26.3 O.OOO12 BLOO512B Alpha-galactosidase proteins. 1411 1054 PRODOM Analysis BLOO439A Acyltransferases ChoActase/COT/CPT 1390 1031 S High Smallest S BLOO543A HlyD family secretion proteins. 14O2 1029 producingequences High-scoring Segment Pairs: Score1S malestProbability Sum 55 BLOO535B Respiratory chain NADH dehydrogenase 1555 1025 BLOO564G Argininosuccinate synthase proteins. 1440 1023 prdm: 64 p36 (157) SRC (10) KSYK (8) YES (7) 214 2.4e-18 BLO1276C Peptidase family U32 proteins. 1425 1023 If DOMAIN KI. . . BLOO481F Thiol-activated cytolysins proteins. 1675 1022 prdm: 46 p36 (181) SRC (10) YES (7) GRB2 (6) 77 O.OO38 BLOO117A Galactose-1-phosph. uridyl transferase 1843 102O If DOMAIN SH . . . 60 PROSITE Analysis Pattern Name Pattern Number in NOV13 Other BLAST results include sequences from the Patp CAMP PHOSPHO SITE RK {2}ST 2 65 database, which is a proprietary database that COntainS PS00004 (Interpro) PDOC00004 Sequences published in patents and patent publications. Patp results include those listed in Table 13G. US 6,989,232 B2 159 160

TABLE 1.3G Patp alignments of NOV13

Smallest High Sum Prob. Sequences producing High-scoring Segment Pairs : Score P(N) patp: AAB42993 Human OREX ORF2757 polypeptide seguence SEQ . 1269 3. Oe-129 patp: AAY 49 420 PKA substrate, Src-family protein-Homo sa . 342 6.9 e-31 patp: AAB37700 Human lymphocyte kinase-Homo sapiens, 508 . 33 4 5 - 9 e-30 patp: AAY 29 668 Human src-family kinase laloo prorein-Hom . 300 3.8e-26 patp:AAY24 421 Human yes 1 protein-Homo sapiens, 543 aa. 30 O 58 e-26

Receptor tyrosine kinases (RTKs) and their associated The invention also includes a mutant or variant nucleic acid Signaling pathways are critical to proper cell function, and any of whose bases may be changed from the corresponding perturbations in these pathways contribute to the onset and base shown in Table 13A while still encoding a protein that progression of diseases, e.g. cancer. Given the Strong evi maintains its receptor tyrosine kinase-like activities and dence that linkS Signaling by certain families of RTKs to the physiological functions, or a fragment of Such a nucleic acid. progression of breast cancer, it is not Surprising that the The invention further includes nucleic acids whose expression profile of key downstream signaling intermedi Sequences are complementary to those just described, ates in this disease has also come under Scrutiny, particularly including nucleic acid fragments that are complementary to because Some exhibit transforming potential or amplify 25 any of the nucleic acids just described. The invention mitogenic Signaling pathways when they are overexpressed. additionally includes nucleic acids or nucleic acid Reflecting the diverse cellular processes regulated by RTKS, fragments, or complements thereto, whose Structures it is now clear that altered expression of Such Signaling include chemical modifications. Such modifications include, proteins in breast cancer may influence not only cellular by way of nonlimiting example, modified bases, and nucleic proliferation (e.g. Grb2) but also the invasive properties of acids whose Sugar phosphate backbones are modified or the cancer cells (e.g. EMS1/cortactin). derivatized. These modifications are carried out at least in SH2 domains are discrete structural motifs common to a part to enhance the chemical stability of the modified nucleic variety of critical intracellular Signaling proteins. Inhibitors acid, Such that they may be used, for example, as antisense of Specific SH2 domains have become important therapeutic binding nucleic acids in therapeutic applications in a Subject targets in the treatment and/or prevention of restenosis, The disclosed NOV13 protein of the invention includes cancers (including Small cell lung), cardiovascular disease, 35 the receptor tyrosine kinase-like protein whose Sequence is Osteoporosis, apoptosis among others. Considering the provided in Table 13B. The invention also includes a mutant Social and economic impact of these diseases significant or variant protein any of whose residues may be changed attention has been focused on the development of potent and from the corresponding residue shown in Table 13B while selective inhibitors of specific SH2 domains. In particular, Still encoding a protein that maintains its receptor tyrosine considerable research has been performed on Src, PI 40 kinase-like activities and physiological functions, or a func 3-kinase, Grb2 and Lck. tional fragment thereof. Receptor tyrosine kinases are also important in diabetes. The invention further encompasses antibodies and anti Diabetes mellitus is commonly considered as a disease of a body fragments, Such as F, or (F), that bind immuno Scant beta-cell mass that fails to respond adequately to the Specifically to any of the proteins of the invention. functional demand. Tyrosine kinases may play a role for 45 The above defined information for this invention Suggests beta-cell replication, differentiation (neoformation) and Sur that this receptor tyrosine kinase-like protein (NOV13) may vival. Transfection of beta-cells with DNA constructs coding function as a member of a “receptor tyrosine kinase family'. for tyrosine kinase receptors yields a ligand-dependent Therefore, the NOV13 nucleic acids and proteins identified increase of DNA synthesis in beta-cells. Several tyrosine here may be useful in potential therapeutic applications kinase receptors, such as the VEGFR-2 (vascular endothelial 50 implicated in (but not limited to) various pathologies and growth factor receptor 2) and c-Kit, are present in pancreatic disorders as indicated below. The potential therapeutic appli duct cells. Because ducts are thought to harbor beta-cell cations for this invention include, but are not limited to: precursor cells, these receptorS may play a role for the cancer and diabetes research tools, for all tissues and cell neoformation of beta-cells. The Src-like tyrosine kinase types composing (but not limited to) those defined here, e.g. mouse Gtk (previously named Bsk/lyk) is expressed in islet normal and cancerous tissue and pancreatic tissue. cells, inhibits cell proliferation. Furthermore, Gtk confers 55 Based on the tissues in which NOV13 is most highly decreased viability in response to cytokine exposure. Shb is expressed; including Spleen and pituitary; Specific uses a Src homology 2 domain adaptor protein which participates include developing products for the diagnosis or treatment in tyrosine kinase signaling. Transgenic mice overexpress of a variety of diseases and disorders. ing Shb in beta-cells exhibit an increase in the neonatal The NOV13 nucleic acids and proteins of the invention beta-cell mass, an improved glucose homeostasis, but also 60 are useful in potential therapeutic applications implicated in decreased Survival in response to cytokines and StreptoZo cancer including but not limited to breast cancer and dia tocin. Thus, tyrosine kinase signaling may generate multiple betes and/or other pathologies and disorders. For example, a responses in beta-cells, involving proliferation, Survival and cDNA encoding the receptor tyrosine kinase-like protein differentiation. (NOV13) may be useful in cancer therapy, and the receptor The disclosed NOV13 nucleic acid encoding a receptor 65 tyrosine kinase-like protein (NOV13) may be useful when tyrosine kinase-like protein includes the nucleic acid whose administered to a subject in need thereof. By way of Sequence is provided in Table 13A, or a fragment thereof. nonlimiting example, the compositions of the present inven US 6,989,232 B2 161 162 tion will have efficacy for treatment of patients Suffering to 10. In another embodiment, a NOV13 epitope is from from cancer including but not limited to breast cancer. The about amino acids 25 to 40. In additional embodiments, NOV13 nucleic acid encoding receptor tyrosine kinase-like NOV13 epitopes are from about amino acids 100 to 110 protein, and the receptor tyrosine kinase-like protein of the from about amino acids 120 to 130 and from about amino invention, or fragments thereof, may further be useful in acids 250 to 255. These novel proteins can be used in assay diagnostic applications, wherein the presence or amount of Systems for functional analysis of various human disorders, the nucleic acid or the protein are to be assessed. Additional which will help in understanding of pathology of the disease disease indications and tissue expression for NOV13 is and development of new drug targets for various disorders. presented in Example 2. NOV14 NOV13 nucleic acids and polypeptides are further useful A disclosed NOV14 nucleic acid of 5193 nucleotides in the generation of antibodies that bind immuno (also referred to as 87919652) encoding a novel multidrug specifically to the novel NOV13 substances for use in resistance-associated transporter-like protein is shown in therapeutic or diagnostic methods. These antibodies may be Table 14A. An open reading frame was identified beginning generated according to methods known in the art, using with an ATG initiation codon at nucleotides 71-73 and prediction from hydrophobicity charts, as described in the 15 ending with a TGA codon at nucleotides 4652-4654. A "Anti-NOVX Antibodies' section below. The disclosed putative untranslated region upstream from the initiation NOV13 protein has multiple hydrophilic regions, each of codon and downstream from the termination codon is under which can be used as an immunogen. In one embodiment, a lined in Table 14A, and the start and stop codons are in bold contemplated NOV13 epitope is from about amino acids 1 letters.

TABLE 1A NOV14 nucleotide sequence. (SEQ ID NO: 82) CTCCGGCGCCCGCTCTGCCCGCCGCTGGGTCCGACCGCGCTCGCCTTCCTTGCAGCCGCGCCTCGGCCCCAGGACGCCC

TGTGCGGTTCCGGGGAGCTCGGCTCCAAGTTCTGGGACTCCAACCTGTCTGTGCACACAGAAAACCCGGACCTCGCTCCC

GCTCCAGAACTCCCTGCTGGCCTGGGGCCCTGCATCTACCTGTGGGTCGCCCTGCCCTGCTACTTGCTCTACCTGCG

GCACCATTGTCGGGCTACATCATCCTCTCCCACCTGTCCAAGCTCAAGATGGTCCTGGGTGTCCTGCTGTGGTGCGTCT

CCTGGGCGGACCTTTTTTACTCCTTCCATGGCCTGGTCCATGGCCGGGCCCCTGCCCCTGTTTTCTTTGTCACCCCCTTG

GTGGTGGGGGTCACCATGCTGCTGGCCACCCTGCTGATACAGTATGAGCGGCTGCAGGGCGTACAGTCTTCGGGGGTCCT

CATTACTTCTGGTTCCTGTGTGTGGTCGCGCCATCGTCCCATTCCGCTCCAAGATCCTTTTAGCCAAGGCAGAGGGTG

AGATCTCAGACCCCTTCCGCTTCACCACCCTACACCACTTTGCCCTGGTACTCTCTGCCCTCATCTTGGCCTCGTTC

AGGGAGAAACCTCCATTTTTCTCCGCAAAGAATGTCGACCCTAACCCCTACCCTGAGACCAGCGCTGGCTTTCTCTCCCG

CCTGTTTTTCTGGTGGTTCACAAAGATGGCCATCTATGGCTACCGGCATCCCCTGGAGGAGAAGGACCTCTGGTCCCTAA

AGGAAGAGGACAGATCCCAGATGGTGGTGCAGCAGCTCCTGGAGGCATGGAGGAAGCAGGAAAAGCAGACGGCACGACAC

AAGGCTTCAGCAGCACCTGGGAAAAATGCCTCCGGCGAGGACGAGGTGCTGCTGGGTGCCCGGCCCAGGCCCCGGAAGCC

CTCCTTCCTGAAGGCCCTGCTGGCCACCTTCGGCTCCAGCTTCCTCATCAGTGCCTGCTTCAAGCTTACCAGGACCGC

TCTCCTTCACAATCCACAGCTGCTCAGCATCCTGATCAGGTTTATCTCCAACCCCATGGCCCCCTCCTCGTGGGGCTTC

CTGGTGGCTGGGCTGATGTTCCTGTGCTCCATGATGCAGTCGCTGATCTTACAACACTATTACCACTACATCTTTGTGAC

TGGGGTGAAGTTTCGTACTGGGATCATGGGTGTCATCTACAGGAAGGCTCTGGTTATCACCAACTCAGTCAAACGTGCGT

CCACTGTGGGGGAAATTGTCAACCTCATGTCAGTGGATGCCCAGCGCTTCATGGACCTTGCCCCCTTCCTCAATCTGCTG

TGGTCAGCACCCCTGCAGATCATCCTGGCGATCTACTTCCTCTGGCAGAACCTAGGTCCCTCTGTCCTGGCTGGAGTCGC

TTTCATGGTCTTGCTGATTCCACTCAACGGAGCTGTGGCCGTGAAGATGCGCGCCTTCCAGGTAAAGCAAATGAAATTGA

AGGACTCGCGCATCAAGCTGATGAGTGAGATCCTGAACGGCATCAAGGTGCTGAAGCTGTACGCCTGGGAGCCCAGCTTC

CTGAAGCAGGTGGAGGGCATCAGGCAGGGTGAGCTCCAGCTGCTGCGCACGGCGGCCTACCTCCACACCACAACCACCTT

CACCTGGATGTGCAGCCCCTTCCTGGTGACCCTGATCACCCTCTGGGTGTACGTGTACGTGGACCCAAACAATGTGCTGG

ACGCCGAGAAGGCCTTTGTGTCTGTGTCCTTGTTTAATATCTTAAGACTTCCCCTCAACATGCTGCCCCAGTTAATCAGC

AACCTGACTCAGGCCAGTATGTCTCTGAAACGGATCCAGCAATTCCTGAGCCAAGAGGAACTTGACCCCCAGAGTGTGGA

AAGAAAGACCATCTCCCCAGGCTATGCCATCACCATACACAGTGGCACCTTCACCTGGGCCCAGGACCTGCCCCCCACTC

TGCACAGCCTAGACATCCAGGTCCCGAAAGGGGCACTGGTGGCCGTGGTGGGGCCTGTGGGCTGTGGGAAGTCCTCCCTG

US 6,989,232 B2 165 166

TABLE 1A-continued NOV14 nucleotide sequence.

AGAAGACAGCTGCTGGGTCAGGCCACCCCTAGGAACTCAGTCCTGTACTCTCGGGTGCTGCCTGAATCCATTAAAAATGG

GAGTACTGATGAAATAAAACTACATGGTCA ACAGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

The NOV14 nucleic acid was identified on chromosome (Accession No. XM 038002). Public nucleotide databases 17 by comparing it to the human genome Sequence. Exons include all GenBank databases and the GeneSeq patent were predicted by homology and the intron/exon boundaries database. were determined using Standard genetic rules. Exons were A disclosed NOV14 polypeptide (SEQ ID NO:83) further selected and refined by means of similarity determi encoded by SEQ ID NO:82 has 1527 amino acid residues nation using multiple BLAST (for example, tElastN, 15 and is presented in Table 14B using the one-letter amino acid BlastX, and BlastN) searches, and, in Some instances, code. SignalP, Psort and/or Hydropathy results predict that GeneScan and Grail. Expressed Sequences from both public NOV14 has a signal peptide and is likely to be localized to and proprietary databases were also added when available to the plasma membrane with a certainty of 0.8000. The most further define and complete the gene Sequence. The DNA likely cleavage site for a NOV14 peptide is between amino Sequence was then manually corrected for apparent incon acids 53 and 54 of SEQ ID NO.28, i.e. at CYL-LY.

TABLE 1.4B Encoded NOV14 protein sequence. (SEQ ID NO: 83) MDALCGSGELGSKFWDSNLSWHTENPDLTFCFQNSLLAWWPCIYLWVALPCYLLYLRHHCRGYIILSHLSKLKMVLGVLLW

CWSWADLFYSFHGLVHGRAPAPWFFWTPLVWGWTMLLATLLIQYERLQGWQSSGVLIIFWFLCVWCAIVPFRSKILLAKAE

GEISIDPFRFTTFYIHFALWLSALILACFREKPPFFSAKNWDPNPYPETSAGFLSRLFFWWFTKMAIYGYRHPLEEKDLWSL

KEEDRSQMWWQQLLEAWRKQEKQTARHKASAAPGKNASGEDEWLLGARPRPRKPSFLKALLATFGSSFLISACFKLIQDLL

SFINPOLLSILIRFISNPMAPSWWGFLWAGLMFLCSMMQSLILOHYYHYIFWTGVKFRTGIMGWIYRKALWITNSWKRAST

WGEIWN LMSWDAQRFMDLAPFLNLLWSAPLQIILAIYFLWQNLGPSVLAGWAFMVL LIPLNGAVAVKMRAFQWKQMKLKDS

RIKLMS EILNGIKVLKLYAWEPSFLKQWEGIRQGELOLLRTAAYLHTTTTFTWMCS PFLWTLTLWWYWYWDPNNWLDAEK

AFWSWS LFNILRLPLNMLPOLISNLTOASWSLKRIQQFLSQEELDPQSWERKTISPGYAITIHSGTFTWAQDLPPTLHSLD

IQVPKGALWAWWGPWGCGKSSLWSALLGEMEKLEGKVHMKGSWAYVPQQAWIONCT LQENWLFGKALNPKRYQQTLEACAL

LADLEM LPGGDOTEIGEKGINLSGGQRQRVSLARAVYSDADIFLLDDPLSAWDSHVAKHIFDHVIGPEGVLAGKTRVLWTH

GISFLPQTDFIIWLADGQVSEMGPYPALLQRNGSFANFLCNYAPDEDQGHLEDSWTALEGAEDK EALLIEDTLSNHTDLTD

NDPWTYWWQKQFMRQLSALSSDGEGOGRPWPRRHLGPSEKVQVTEAKADGALTQEEKAAIGTVE LSWFWDYAKAWGLCTTL

AICLLYWGQSAAAIGANWWLSAWTNDAMADSRQNNTSLRLGWYAALGILQGFLVMLAAMAMAAGGIQAARVLHQAILLHNKI

RSPQSFFDTTPSGRILNCFSKDIYWWDEVLAPWILMLLNSFFNAISTLWWIMASTPLFTWWILP LAWLYTLVORFYAATSR

QLKRLESVSRSPIYSHFSETWTGASWIRAYN RSRDFEIISDTKVDANQRSCYPYIISNRWLSIGWEFWGNCVWLFAALFAV

IGRSSLNPGLVGLSWSYSLQWTFALNWMIRMMSDLESNIVAVERVKEYSRTETEAPWVVEGSRPPEGWPPRGEVEFRNYSW

RYRPGLDLVLRDLSLHWHGGEKVGIWGRTGAGKSSMTLCLFRILEAAKGETRIDGLNWADTGLHDLRSQLTIIPQDPILFS

GTLRMNLDPFGSYSEEDIWWALELSHLHTEWSSQPAGLDFQCSEGGENLSWGQRQLVCLARALLRKSRTLVLDEATAAIDL

ETDNLTQATIRTQFDTCTWLTIAHRLNTIMDYTRVLVLDKGWVAEFDSPANLIAARGIPYGMARDAGLA

Sistencies thereby obtaining the Sequences encoding the In a search of public sequence databases, the NOV14 full-length protein. The NOV14 nucleic acid was further amino acid Sequence has 1527 of 1527 amino acid residues mapped to the 17q21 locus. This locus is associated with 60 (100%) identical to, and 1527 residues (100%) positive with, breast cancer (OMIM 176705, 113705), glycogen storage the 1527 amino acid residue human canicular multispecific disease (OMIM 232200), essential hypertension (OMIM organic anion transporter/multidrug resistance-associated 171190) and/or other diseases/disorders. protein (Accession No. 015438). Public amino acid data bases include the GenBank databases, SwissProt, PDB and In a search of public sequence databases, the NOV14 65 PIR. It was also found that NOV14 had homology to the nucleic acid sequence has 5151 of 5155 bases (99%) iden amino acid sequences shown in the BLASTP data listed in tical to a human ATP-binding cassette, Sub-family C Table 14C. US 6,989,232 B2

TABLE 14C

BLAST results for NOV14 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect MRP3 HUMAN; CANALICULAR MULTISPECIFIC 1527 1527/1527 1527/1527 O.O O15438: ORGANIC ANON TRANSPORTER (100%) (100%) BAA28146.1; 2 (MULTIDRUGRESISTANCE CAA76658.1; ASSOCIATED PROTEIN 3). CAA76658.1; homo Sapiens. 5/2000 AADO1430.1; MRP3 RAT: CANALICULAR MULTISPECIFIC 1522 1194/1528 1334/1528 O.O O88563: ORGANIC ANON TRANSPORTER (78%) (87%) AAC25416.1; 2 (MULTIDRUGRESISTANCE BAA28955.1 ASSOCIATED PROTEIN 3) (MRP-LIKE PROTEIN-2) (MLP 2). rattus norvegicus. 5/2OOO MRP1. HUMAN: MULTIDRUG RESISTANCE- 1531. 872/1538 1131f1538 O.O P33527; ASSOCIATED PROTEIN 1. On (57%) (74%) AAB46616.1: Sapiens. 5/2000 AAB83983.1 Q9UQ99; MULTIDRUG RESISTANCE 1515 870/1529 1128/1529 O.O AF022853; PROTEIN (FRAGMENT). homo (57%) (74%) AAB83979.1 Sapiens. 6/2001 O35379; MULTIDRUG RESISTANCE 1528 859/1540 1117/1540 O.O AF022908; PROTEIN. nius musculus. (56%) (73%) AAB80938.1 6/2001

The alignment and homology of these Sequences is shown graphically in the ClustalW analysis in Table 14D.

TABLE 1.4D

Information for the ClustalW proteins

1) NOV14 (SEQ ID NO: 83) 2) MRP3 HUMAN (SEQ ID NO: 84) 3) MRP3 RAT (SEQ ID NO: 85) 4) MRP1. HUMAN (SEQ ID NO: 86) 5) Q9UQ99 (SEQ ID NO: 87) 6) O35379 (SEQ ID NO: 88)

NOW14 1 AT L HH 59 MRP3 HUMAN 1 AT L HH 59 MRP3 RAT 1 F HH 59 MRP1. HUMAN 1 CF L RH 60 Q9UQ99 1 CF L RH 44 O35379 1 SCFP F RH 60

NOW14 60 119 MRP3 HUMAN 60 119 MRP3 RAT 60 119 MRP1. HUMAN 51 120 Q9UQ99 45 104 O35379 61 120

NOW14 120 179 MRP3 HUMAN 120 179 MRP3 RAT 120 AL 179 MRP1. HUMAN 121 AiRSRITEK 18O Q9UQ99 105 AIRSRAR 164 O35379 121 'E 18O

NOW14 18O 239 MRP3 HUMAN 180 239 MRP3 RAT 18O 239 MRP1. HUMAN 181 240 Q9UQ99 1.65 224 O35379 181 240

US 6,989,232 B2 173 174 Table 14E lists the domain description from DOMAIN diseases/disorders. Multidrug resistance (MDR) describes analysis results against NOV14. This indicates that the the phenomenon of Simultaneous resistance to unrelated NOV14 sequence has properties similar to those of other drugs. The two MDR genes identified in humans to date (the proteins known to contain this domain. MDR-associated protein (MRP) and Pgp genes) are struc 5 turally similar and both are members of the ATP-binding TABLE 1.4E cassette (ABC) transporter family. Although the physiologi cal role of MRP is not yet understood, one Pgp gene (mdr1) Domain Analysis of NOV14 plays an important role in the blood-tissue barrier and the PROSITE other (mdr2/3) is involved in phospholipid transport in the 1O liver. A variety of compounds (chemosensitizing agents) can Pattern Name interfere with Pgp and MRP function; such agents may improve the efficacy of conventional therapy when used in LEUCINE ZIPPER PS00029 (Interpro) PDOCOOO29 2 positions in NOV14 combination with Such regimens. Determining the roles ABC TRANSPORTER PS00211 (Interpro) PDOCOO185 2 sites in cellular MDR mechanisms play in patients response to NOV14 15 chemotherapy is a major challenge. Using Pgp and MRP as molecular markers to detect MDR tumor cells is technically PRODOM demanding, and Solid tumors in particular contain hetero Smallest Sum geneous cell populations. Since MDR requires Pgp or MRP Sequences High Probability gene expression, clinically relevant gene expression thresh producing High-scoring Segment Pairs. Score P(N) 2O olds need to be established; Sequential Samples from indi prdm: 8775 p36 (3) MRP2 (2) MRP1 (1) - 384 7.1e-35 vidual patients are valuable for correlating MDR gene MULTIDRUG PROTEIN . . . expression with the clinical course of disease. Studies in prdm: 1070 p36 (21) CFTR (7) SUR (3) 305 1.9e-26 leukemias, myelomnas, and Some childhood cancerS Show MRP2 (2) - TRANSMEMBR . . . that Pgp expression correlates with poor response to che prdm: 923 p36 (24) CFTR (7) MRP2 (4) SUR 244 5.8e-20 (3) - TRANSMEMBR. . . 25 motherapy. However, in Some cases, inclusion of a reversing prdm: 993 p36 (22) CFTR (7) SUR (3) MRP2 214 90e-17 or chemoSensitizing agent Such as Verapamil or cycloSporin (2) - TRANSMEMBR. . . A has improved clinical efficacy. Such agents may inactivate Pgp in tumor cells or affect Pgp function in normal cells, BLOCKS resulting in altered pharmacokinetics. The ABC transporter AC# Description Strength. Score 30 Superfamily in prokaryotes and eukaryotes is involved in the transport of Substrates ranging from ions to large proteins. BLOO211B ABC transporters family proteins. 1331. 1326 Of the 15 or more ABC transporter genes characterized in BLO1247C Inosine-uridine preferring nucleoside 1351 1084 hydrola human cells, two (Pgp and MRP) cause MDR. Therefore, it BLOO577B Avidin/Streptavidin family proteins. 1442 1067 would be relevant to determine the number of Such genes BLOO853E Beta-eliminating lyases pyridoxal- 1602 1064 35 present in the human genome; however, extrapolating from phosphate a the number of ABC transporter gyenes in , the BLOOO19E Actinin-type actin-binding domain 1179 1060 proteins. human gene probably contains a minimum of 200 ABC BLOO256 Adipokinetic hormone family proteins. 1358 1057 transporter Superfamily members. Thus, tumor cells can BLOO545B Aldose 1-epimerase proteins. 1282 1056 potentially use many ABC transporters to mount resistance BLOO699A Nitrogenases component 1 alpha and 1357 1056 40 to known and future therapeutic agents. beta subun Members of the multidrug resistance-associated transporter-like protein family are also important in liver Other BLAST results include sequences from the Patp disease. In Several liver diseases the biliary transport is database, which is a proprietary database that contains disturbed, resulting in, for example, jaundice and cholesta Sequences published in patents and patent publications. Patp sis. Many of these symptoms can be attributed to altered results include those listed in Table 1F. regulation of hepatic transporters. Organic anion transport,

TABLE 1.F Patp alignments of NOV14

Smallest Suml High Prob. Sequences producing High-scoring Segment Pairs : Score P(N) patp: AAY 43543 A human MPR-related ABC transporter designa . 7845 O. O. patp: AAW 33363 Human multidrug resistance-associated prote . 7679 O. O. patp: AAR54928 Multidrug resistance protein-Homo sapiens . 4 470 O. O. patp: AAR93153 Multi-drug resistance protein-Homo sapien . 4 470 O. O. patp: AAW57485 Human multidrug resistance-associated prote . 4 470 O. O.

Members of the multidrug resistance-associated 65 mediated by the canalicular multispecific organic anion transporter-like protein family are critical modulators of cell transporter (cmoat), has been extensively studied. The regu physiology, and perturbations are associated with many lation of intracellular vesicular sorting of CMOAT by pro US 6,989,232 B2 175 176 tein kinase C and protein kinase A, and the regulation of resistance-associated transporter family'. Therefore, the cmoat-mediated transport in endotoxemic liver disease, have NOV14 nucleic acids and proteins identified here may be been examined. The discovery that the multidrug resistance useful in potential therapeutic applications implicated in (but protein (MRP), responsible for multidrug resistance in not limited to) various pathologies and disorders as indicated cancers, transports Similar Substrates as cmoat led to the below. The potential therapeutic applications for this inven cloning of a MRP homologue from rat liver, named mrp2. Mrp2 turned out to be identical to cmnoat. At present there tion include, but are not limited to: cancer and liver disease is evidence that at least two mrps are present in hepatocytes, research tools, for all tissues and cell types composing (but the original mrp (mrp1) on the lateral membrane, and mrp2 not limited to) those defined here, e.g. cancerous and normal (cmoat) on the canalicular membrane. The expression of tissue and liver tissue. Additional disease indications and mrp1 and mrp2 in hepatocytes appears to be cell-cycle tissue expression for NOV14 is presented in Example 2. dependent and regulated in a reciprocal fashion. These The NOV14 nucleic acids and proteins of the invention findings show that biliary transport of organic anions and are useful in potential therapeutic applications implicated in possibly other canalicular transport is influenced by the cancer including but not limited to cancer, liver disease entry of hepatocytes into the cell cycle. 15 Further, members of the multidrug resistance-associated and/or other pathologies and disorders. For example, a transporter-like protein family are involved in various leu cDNA encoding the multidrug resistance-associated kaemias. Approximately 15-30% of acute rnyeloid leu transporter-like protein (NOV14) may be useful in liver kaemia (AML) patients are primarily resistant to disease therapy, and the multidrug resistance-associated chemotherapy, and 60-80% of patients who achieve com transporter-like protein (NOV14) may be useful when plete remission will inevitably relapse and Succumb to their administered to a subject in need thereof. By way of disease. The multidrug resistant (MDR) phenotype has been nonlimiting example, the compositions of the present inven Suspected as a major mechanism of therapy failure in AML, tion will have efficacy for treatment of patients Suffering it is one of the best understood mechanisms of resistance to from liver disease and cancer including but not limited to anticancer drugs. The classical MDR phenotype is charac 25 leukemia. The NOV14 nucleic acid encoding multidrug terized by the reduced ability of cells to accumulate drugs as compared to normal cells. The increased drug efflux is due resistance-associated transporter-like protein, and the mul to the activity of a 170 kDa glycoprotein, the P-glycoprotein tidrug resistance-associated transporter-like protein of the (Pgp), a unidirectional drug-efflux pump which is encoded invention, or fragments thereof, may further be useful in by the MDR1 gene. While studies of myeloid leukaemia and diagnostic applications, wherein the presence or amount of myeloma have provided the best evidence for the potential the nucleic acid or the protein are to be assessed. association between Pgp expression and clinical outcome, NOV14 nucleic acids and polypeptides are further useful the lack of standardized methods for MDR detection and in the generation of antibodies that bind immuno perhaps even more importantly, inconsistencies in the inter specifically to the novel NOV14 substances for use in pretation of MDR expression data account for divergent 35 results in the literature. The clinicians Strong interest in therapeutic or diagnostic methods. These antibodies may be MDR stems from the availability of agents capable of generated according to methods known in the art, using interfering with MDR, at least in vitro. If these laboratory prediction from hydrophobicity charts, as described in the results were reproducible in vivo, reversal of MDR would "Anti-NOVX Antibodies' section below. The disclosed offer a rare opportunity to incorporate laboratory experience 40 NOV14 protein has multiple hydrophilic regions, each of into the clinical management of patients. which can be used as an immunogen. In one embodiment, a The NOV14 nucleic acids are useful for screening a test contemplated NOV14 epitope is from about amino acids 200 compound for inhibition of MDR mediated transport, indi to 300. In another embodiment, a NOV14 epitope is from cated by restoration of anticancer drug Sensitivity, which in about amino acids 300 to 400. In additional embodiments, turn causes a reduction of transporter mediated cellular 45 NOV14 epitopes are from about amino acids 900 to 300 and efflux of anticancer agents. The disclosed NOV14 nucleic from about amino acids 1400 to 1500. These novel proteins acid encoding a multidrug resistance-associated transporter can be used in assay Systems for functional analysis of like protein includes the nucleic acid whose Sequence is various human disorders, which will help in understanding provided in Table 14A, or a fragment thereof. The invention of pathology of the disease and development of new drug also includes a mutant or variant nucleic acid any of whose 50 bases may be changed from the corresponding base shown targets for various disorders. in Table 14A while Still encoding a protein that maintains its NOV15 multidrug resistance-associated transporter-like activities NOV15 includes two novel novel intracellular thrombo and physiological functions, or a fragment of Such a nucleic spondin domain containing protein-like proteins disclosed acid. 55 below. The disclosed proteins have been named NOV15a The disclosed NOV14 protein of the invention includes and NOV15b. the multidrug resistance-associated transporter-like protein NOV15a whose sequence is provided in Table 14B. The invention also includes a mutant or variant protein any of whose A disclosed NOV15a nucleic acid of 1794 nucleotides residues may be changed from the corresponding residue 60 (also referred to as 100399281 and 159518754) encoding a shown in Table 14B while still encoding a protein that novel thrombospondin-like protein is shown in Table 15A. A maintains its multidrug resistance-associated transporter partial open reading frame was identified beginning with an like activities and physiological functions, or a functional GGA codon at nucleotides 178-180 and ending with a TAA fragment thereof. codon at nucleotides 1792-1794. A putative untranslated The above defined information for this invention Suggests 65 intronic region upstream from the first in-frame coding that this multidrug resistance-associated transporter-like triplet is underlined in Table 15A, and the start and stop protein (NOV14) may function as a member of a “multidrug codons are in bold letters.

US 6,989,232 B2 179 18O NOV15b. A disclosed NOV15b nucleic acid of 1238 nucleotides (also referred to as CG57356-01) encoding a novel novel intracellular thrombospondin domain containing protein like protein is shown in Table 15C. A partial open reading 5 flame was identified beginning with an ACG codon at nucleotides 3-5 and ending with a TAA codon at nucleotides 1236-1238. A partial codon upstream from the first in-frame coding triplet is italicized in Table 15C, and the start and stop codons are in bold letters. In further embodiments, the 10 NOV15 coding region extends 5' to the sequence disclosed in Table 15C.

TABLE 1.5C NOV15b Nucleotide Sequence (SEQ ID NO: 91) GTACGTGTAGTCCTGAAACCAGCTTTTCTCTCTCCAAAGAAGCACCAAGGGAGCATCTGGACCACCAGGCTGCACACCA.

ACCCTTCCCCAGACCGCGATTCCGACAAGAGACGGGGCACCCTTCATTGCAAAGAGATTTCCCCAGATCCTTTCTCCTT

GATCTACCAAACTTTCCAGATCTTTCCAAAGCTGATATCAATGGGCAGAATCCAAATATCCAGGTCACCATAGAGGTGG

TCGACGGTCCTGACTCTGAAGCAGATAAAGATCAGCATCCGGAGAATAAGCCCAGCTGGTCAGTCCCATCCCCCGACTG

GCGGGCCTGGTGGCAGAGGTCCCTGTCCTTGGCCAGGGCAAACAGCGGGGACCAGGACTACAAGTACGACAGTACCTCA

GACGACAGCAACCTTCCTCAACCCCCCAGGGGGTGGGACCATACAGCCCCAGGCCACCGGACTTTTGAAACCAAAGATC

AGCCAGAATATGATTCCACAGATGGCGAGGGTGACTGGAGTCTCTGGTCTGTCTGCAGCGTCACCTGCGGGAACGGCAA

CCAGAAACGGACCCGGTCTTGTGGCTACGCGTGCACTGCAACAGAATCGAGGACCTGTGACCGTCCAAACTGCCCAGGA

ATTGAAGACACTTTTAGGACAGCTGCCACCGAAGTGAGTCTGCTTGCGGGAAGCGAGGAGTTTAATGCCACCAAACTGT

TTGAAGTTGACACAGACAGCTGTGAGCGCTGGATGAGCTGCAAAGCGAGTTCTTAAAGAAGTACATGCACAAGGGTGAT

GAATGACCTGCCCAGCTGCCCCTGCTCCTACCCCACTGAGGTGGCCTACAGCACGGCTGACATCTTCGACCGCATCAAG

CGCAAGGACTTCCGCTGGAAGGACGCCAGCGGGCCCAAGGAGAAGCTGGAGATCTACAAGCCCACTGCCCGGTACTGCA

TCCGCTCCATGCTGTCCCTGGAGAGCACCACGCTGGCGGCACAGCACTGCTGCTACGGCGACAACATGCAGCTCATCAC

CAGGGGCAAGGGGGCGGGCACGCCCAACCTCATCGGCACCGAGTTCTCCGCGGAGCTCCACTACAAGGTGGACGTCCTG

CCCTGGATTATCTGCAAGGGTGACTGGAGCAGGTATAACGAGGCCCGGCCTCCCAACAACGGACAGGAGTGCACAGAGA

GCCCCTCGCACGAGGACTACATCAAGCAGTTCCAAGAGGCCAGGGAATATAA

45 A disclosed NOV15b polypeptide (SEQ ID NO:92) localized in the cytoplasm with a certainty of 0.4500. In encoded by SEQ ID NO:91 is 411 amino acid residues and alternative embodiments, NOV15b is localized to a micro is presented using, the one-letter amino acid code in Table body (peroxisome) with a certainty of 0.1163; the mitochon 15D. NOV15b is believed to be a mature protein. SignalP, drial matrix space with a certainty of 0.1000; or a lysosome Psort and/or Hydropathy results predict if that NOV15b does (lumen) with a certainty of 0.1000. NOV15b has a molecular not contain a known signal peptide and is likely to be weight of 46743.0 Daltons.

TABLE 15D Encoded NOV15b protein sequence. (SEQ ID NO:92) TCSPETSFSLSKEAPREHLDHQAAHOPFPRPRFRQETGHPSLQRDFPRSFLLDLPNFPDLSKADINGONPNIOWTIEWWDGP

DSEADKDOHPENKPSWSWPSPDWRAWWQRSLSLARANSGDQDYKYDSTSDDSNFLNPPRGWDHTAPGHRTFETKDQPEYDST

DGEGDWSLWSWCSWTCGNGNOKRTRSCGYACTATESRTCDRPNCPGIEDTFRTAATEWSLLAGSEEFNATKLFEWDTDSCER

WMSCKSEFLKKYMHKWMNDLPSCPCSYPTEWAYSTADIFDRIKRKDFRWKDASGPKEKLEIYKPTARYCIRSMLSLESTTLA

AQHCCYGDNMOLITRGKGAGTPNLIGTEFSAELHYKVDVLPWIICKGDWSRYNEARPPNNGOECTESPSDEDYIKQFQEAREY US 6,989,232 B2 181 182 NOV15a and NOV15b are related to each other as shown in the alignment listed in Table 15E.

TABLE 1.5E

ClustalW of NOW15 Wariants

NOW GSCCRLRYCRTCSPETSFSLSKEAPREHLDHQAAHQPFPRPRFROETGHP 5 O NOW TCSPETSFSLSKEAPREHLDHQAAHQPFPRPRFROETGHP 40

NOW SL9RDFPRSFLLDLPNFPDLSKADINGQNPNIQWTIEWWDGPDSEADKDQ 1 OO NOW SL9RDFPRSFLLDLPNFPDLSKADINGQNPNIQWTIEWWDGPDSEADKDQ 9 O

NOW HPENKPSWSWPSPDWRAWWQRSLSLARANSGDQDYKYDSTSDDSNFLNPP 150 NOW HPENKPSWSWPSPDWRAWWQRSLSLARANSGDQDYKYDSTSDDSNFLNPP 1 4 0

NOW RGWDHTAPGHRTFETKDQPEYDSTDGEGDWSLWSWCSWTCGNGNQKRTRS 200 NOW RGWDHTAPGHRTFETKDQPEYDSTDGEGDWSLWSWCSWTCGNGNQKRTRS 190

NOW 5a DGYACTATESRTCDRPNCPACTGFLIWKEAWLGWWWWHWPAPPTGNPSWP 250 NOW DGYACTATESRCDRPNCP------209

NOW 3OO NOW EPEVESTRACERNAPSRTSPsyNGSITQIPINESS 210

NOW 350

LSKERIYSKDYCREARDV:SLLLQWEERCDHKECKHLKEQPGVTCSLKHL NOW IE DTFR TAATESSLLAGSE EFNATKEF-E------238

NOW LWAGCTRGERVSLWPFPDTDSCERWMSFKARFLKKYMHKWMNDLPSCPCS 400 NOW --W------DTDSCERWMSCKSEFLKKYMHKVMNDLPSCPCS 272

NOW YPTEWAYSTADIFDRIKRKDFRWKDASGPKEKLEIYKPTARYCIRSMLSL 450 NOW YPTEWAYSTADIFDRIKRKDFRWKDASGPKEKLEIYKPTARYCIRSMLSL 322

NOW ESTTLAAQHCCYGDNMQLITRGKGAGTPNLISTEFSAELHYKWDWLPWII 5 OO NOW ESTTLAAQHCCYGDNMQLITRGKGAGTPNLIGTEFSAELHYKWDWLPWII 372

NOW CKGDWSRYNEARPPNNGQECTESPSDEDYIKQFQEAREY 539 NOW CKGDWSRYNEARPPNNGQECTESPSDEDYIKQFQEAREY 411

The novel intracellular thrombospondin domain contain In a Search of Sequence databases, it was found, for ing protein-like NOV15 gene maps to chromosome 7. This 40 example, that the NOV15b nucleic acid sequence of this assignment was made using mapping information associated with genomic clones, public genes and ESTS Sharing invention has 373 of 512 bases (72%) identical to a Sequence identity with the disclosed Sequence and CuraGen gb:GENBANK-ID:AF111168|acc:AF111168.2 mRNA from Corporations Electronic Northern bioinformatic tool. Homo Sapiens (Homo Sapiens Serine palmitoyl transferase, Exons were predicted by homology and the intron/exon 45 Subunit II gene, complete cds, and unknown genes). The full boundaries were determined using Standard genetic rules. NOV15b amino acid sequence was found to have 162 of 164 Exons were further selected and refined by means of simi amino acid residues (98%) identical to, and 163 of 164 larity determination using multiple BLAST (for example, amino acid residues (99%) similar to the 361 amino acid tBlastN, BlastX, and BlastN) searches, and, in some residue ptnr:TREMBLNEW-ACC:CAC16127 protein from instances, GeneScan and Grail. Expressed Sequences from both public and proprietary databases were also added when 50 Homo sapiens (Human) (BA149118.1 (NOVELPROTEIN). available to further define and complete the gene Sequence. The DNA sequence was then manually corrected for appar The disclosed NOV15a was found to have homology to ent inconsistencies thereby obtaining the Sequences encod the amino acid sequences shown in the BLASTP data listed ing the full-length protein. in Table 15F.

TABLE 1.5F

BLAST results for NOV15a Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect Q9H599; AL133463: BA149I18.1 (NOVEL 391 189/189, 189/189, 1e-117 CAC16127.2 PROTEIN (100%) (100%) (FRAGMENT) homo Sapiens. Jun. 2001 095432. AF111168; HYPOTHETICAL 72.5 658 102/172 138/172, 2e-63 US 6,989,232 B2 183 184

TABLE 15F-continued

BLAST results for NOV15a Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect AADO9622.1 KDA PROTEIN. homo (59%) (80%) Sapiens. Jun. 2001 Q9BQL4, ALO50320; DJ107712.1 (NOVEL 60 49/49 49/49, 3e-22 CAC36O74.1 PROTEIN) (100%) (100%) (FRAGMENT). homo Sapiens. Jun. 2001 Q23832; U42213; MICRONEMAL TRAP- 660 27/61 33/61, 2e-05 AAC48313.1 C1 PROTEIN (44%) (54%) HOMOLOG (FRAGMENT). cryptosporidium wrairi. Jun. 2001 TSP1 HUMAN: PO7996; THROMBOSPONDIN 1 1170 24/54 31/54, 3e-05 M25631; AAA36741; PRECURSOR. homo (44%) (57%) CAA28370; CAA32889; Sapiens. Oct. 1996 AAA61178; AAB59366

The disclosed NOV15b was found to have homology to the amino acid sequences shown in the BLASTP data listed in Table 15G.

TABLE 1.5G

BLAST results for NOV15b Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect Q9H599; AL133463: BA149I18.1 (NOVEL 391 390/391, 390/.391, O.O CAC16127.2 PROTEIN) (100%) (100%) (FRAGMENT). homo Sapiens. Jun. 2001 095432. AF111168; HYPOTHETICAL 72.5 658 183/392 242,392, 2e-95 AADO9622.1 KDA PROTEIN. homo (47%) (62%) Sapiens. Jun. 2001 Q9BQL4, ALO50320; DJ1077I2.1 (NOVEL 60 49/49 49/49, 2e-22 CAC36O74.1 PROTEIN) (100%) (100%) (FRAGMENT). homo Sapiens. Jun. 2001 TSP1 HUMAN: PO7996; THROMBOSPONDIN 1 1170 24/54 31/54, 2e-05 M25631; AAA36741; PRECURSOR. homo (44%) (57%) CAA28370; CAA32889; Sapiens. Oct. 1996 AAA61178; AAB59366 TSP1 MOUSE: P35441; THROMBOSPONDIN 1 1170 23/54 31/54, 4e-05 AAA5O611; AAA40431 PRECURSOR. mus (43%) (57%) musculus. Oct. 1996

The homology of these Sequences is shown graphically in the ClustalW analysis shown in Table 15H.

TABLE 1.5H

Information for the ClustalW proteins

1) NOV15a (SEQ ID NO: 90) 2) NOV15b (SEQ ID NO:92) 3) Q9H599 (SEQ ID NO: 93) 4) O95432 (SEQ ID NO: 94) 5) Q9BQL 4 (SEQ ID NO: 95) 6) Q23832 (SEQ ID NO: 96) 7) TSP1 HUMAN N-ter fragment (SEQ ID NO: 97) 8) TSP1 MOUSE N-ter fragment (SEQ ID NO: 98)

US 6,989,232 B2 189 190

TABLE 1.5 H-continued Information for the ClustalW proteins

NOW15 a. 391 NOW15 411 Q9H599 391 O954.32 O O O :- 658 Q9BQL4 60 Q23832 ------QPsAALDQDSEYS3EIGPESQNWAS-- 660 TSP1 HUMAN TDNNGEGD IDGEGIENERDNCYWYNVDQRDTDMDGVGD . . . 837 TSP1 MOUSE TDNNGEGDACAVDIDGEGIENERDNCSYVYNVDQRDTDMDGVGD ... 837

Table 15I lists the domain description from DOMAIN instances of this repeat. It has been involved in cell-cell analysis results against NOV15a, and in the analogous 15 interraction, inhibition of angiogenesis, and apoptosis. regions for NOV15b. This indicates that the NOV15a The intron-exon organisation of the properdin gene con Sequence has properties similar to those of other proteins firms the hypothesis that the repeat might have evolved by known to contain this domain. a proceSS involving exon shuffling. A Study of properdin

TABLE 1.5 Domain Analysis of NOV15a PFAM HMM Domain Analysis of NOV15 Model Description Score E-value tsp 1. (InterPro). Thrombospondin type 1 domain 32.5 9.8e-06

Parsed for domains:

Model Domain seq-f seq-t himm-f himm-t SCOe E-value tsp 1. 1/1 178 218 1 54 32.5 9.8e-06

Smallest Suml High Probability ProDom Sequences producing High-scoring Segment Pairs : Score P(N) pram: 1719 p36 (14) FSPO (5) TSP1 (3) TSP2 (2) - PRECURSOR . . . 110 3. Oe-O 6 pram: 873 p36 (25) TSP1 (9) TSP2 (4) PROP (3) - COMPLEMEN . . . 91 OOOO33 pram: 36045 p.36 (1) SSP2 PLAYO - SPOROZOITE SURFACE PROTE . . . 85 O. OO14 pram: 1268 p36 (18) CSP (18) - CIRCUMSPOROZOITE PROTEIN . . . 74 O. O.22 pram:53698 p36 (1) FSPO XENLA - F-SPONDIN PRECURSOR. GLY . . . 62 O35 BLOCKS Protein Domain Analysis AC# Description Strength Score BLOO612B O Osteonectin domain proteins. 1891 1066 BLOO652C O TNFR/NGFR fanily cysteine-rich region protein 1217 1062 BLOO979I O C-protein coupled receptors family 3 proteins 1459 1059 BLOO641E O Respiratory-chain NADH dehydrogenase 75 Kd su 17 OO 1039 BLOO512A. O Alpha-galactosidase proteins. 1403 1035 BLOOO96G O Serine hydroxymethyltransferase pyridoxal-pho 1543 1030

The thrombospondin repeat was first described in 1986 by go structure provides. Some information about the structure of Lawler & Hynes. It was found in the thrombospondin the SEEanalysis ShowsE. I Athat Novis hasS 24Z4 ofO 55 (43%)O Stric in st N N. (A identical to, and 27 of 55 (49%) positive with, the 57 aa p36 pC6, C7, C8A, C8B, C9) as pWell as pextracellular y perain.matrix signal(14) FSPO(5) repeat cell TSP1(3) adhesion TSP2(2)-precursor EGF-like domain thrombospon glycoprotein protein like mindin, F-Spondin, SCO-spondin and even the 65 din calcium binding (prdm: 1719, Expect=30e-06); 15 of circumsporozoite surface protein 2 and TRAP proteins of 35 (42%) identical to, and 18 of 35 (51%) positive with, the Plasmodium have been shown to contain one or more 54 aa p36 (25) TSP1(9) TSP2(4) PROP(3)-complement US 6,989,232 B2 191 192 precursor repeat Signal glycoprotein EGF-like domain path 90) (NOV15b has 155/290 aa (53%) identical, 205/290 aa way thrombospondin cell (prdm:873, Expect=0.00033); 20 (70%) positive). NOV15a has 24 of 54 aa residues (44%) of 68 (29%) identical to, and 28 of 68 (41%) positive with, identical to, and 31 of 54 aa residues (57%) positive with, the 108 aa p36 (1) SSP2 PLAYO-sporozoite surface the 57 aa Human METH1 thombospondin-like domain #3 protein 2 precursor, malaria, Sporozoite, repeat, signal; (patp:AAY49505, Expect=3.2e-06) (NOV15b has 24/54 aa antigen; transmembrane (prdm:36045, Expect=0.0014); 23 (44%) identical, 31/54 aa (57%) positive). NOV15a has 24 of 59 (38%) identical to, and 28 of 59 (47%) positive with, of 54 aa residues (44%) identical to, and 31 of 54 aa residues the 87 aa p36 (18) CSP(18)-circumsporozoite protein (57%) positive with, the 57 aa Homo sapiens TSP1 domain precursor CS malaria Sporozoite repeat Signal (prodm:1268, (patp:AAB50007, Expect=3.2e-06) (NOV15b has 24/54 aa Expect=0.022); and 10 of 21 (47%) identical to, and 13 of (44%) identical, 31/54 aa (57%) positive). The Patp BLAST 21 (61%) positive with, the 59 aa p36 (1) FSPO XENLA results for NOV15a and NOV15b are listed in Table 15J.

TABLE 15 J Patp alignments of NOV15

Smallest Sum Prob. High P(N) P(N) Sequences producing High-scoring Segment Pairs. Score NOW15 a NOW15b patp:AAB41922 Human ORFX ORF1686 polypeptide seque . . . 10 48 7.8e-106 T-8e-106 patp: AAB49765 Human proliferation differentiation . . . 616 1.2e-90 5.2e-95 patp: AAB88393 Human membrane or secretory protein . . . 616 1.2e-90 5.2e-95 patp: AAY 49505 Human METH1 thombospondin-like doma . . . 118 3.2e-O 6 2.1 e-O 6 patp: AAB50007 TSP1 domain #3-Homo sapiens, 57 aa . . . 118 3.2e-O 6 2.1 e-O 6

F-Spondin precursor, glycoprotein; Signal; repeat, cell adhe- 3O The homologies shown above are shared by NOV15b sion (prdm:53698, Expect=0.43). insofar as NOV15b is homologous to NOV15a as shown in Table 15E. PROSITE analysis of NOV15a shows that the NOV15a The novel intracellular thrombospondin domain contain polypeptide has two N-glycosylation sites (Pattern-ID: ing protein-like NOV15 gene disclosed in this invention is ASN glycosylation PS00001 (Interpro)); four Protein expressed in at least the following tissues: lung, testis, b-cell. kinase C phosphorylation sites (Pattern-ID: PKC 35 Expression information was derived from the tissue Sources PHOSPHO SITE PS00005 (Interpro)); eight Casein kinase of the Sequences that were included in the derivation of the II phosphorylation sites (Pattern-ID: CK2 PHOSPHO Sequence, as described in Example 1. SITE PS00006 (Interpro)), one Tyrosine kinase phosphory The above defined information for this invention Suggests lation site (Pattern-ID: TYR PHOSPHO SITE PS00007 that these novel intracellular thrombospondin domain con (Interpro)); and fourN-myristoylation sites (Pattern-ID: 40 taining protein-like NOV15 proteins may function as a member of a “novel intracellular thrombospondin domain MYRISTYL PS00008 (Interpro)). PROSITE analysis of containing protein-like family”. Therefore, the NOV15 NOV15b shows that the NOV15b polypeptide has one nucleic acids and proteins identified here may be useful in N-glycosylation site (Pattern-ID: ASN glycosylation potential therapeutic applications implicated in (but not PS00001 (Interpro)); three Protein kinase C phosphorylation 45 limited to) various pathologies and disorders as indicated sites (Pattern-ID: PKC PHOSPHO SITE PS00005 below. (Interpro)); Seven Casein kinase II phosphorylation sites The protein similarity information, eXpression pattern, (Pattern-ID: CK2 PHOSPHO SITE PS00006 (Interpro)); cellular localization, and map location for the protein and one Tyrosine kinase phosphorylation site (Pattern-ID: nucleic acid disclosed herein Suggest that this novel intra cellular thrombospondin domain containing protein-like TYR PHOSPHO SITE PS00007 (Interpro)); and four 50 N-myristoylation sites (Pattern-ID: MYRISTYL PS00008 NOV15 protein may have important structural and/or physi ological functions characteristic of the novel intracellular (Interpro)). thrombospondin domain containing protein family. In a BlastP analysis of a public database, NOV15a was Therefore, the NOV15 nucleic acids and proteins are useful found to have 185 of 188 aa residues aa residues (98%) in potential diagnostic and therapeutic applications and as a identical to, and 188 of 188 aa residues (100%) positive 55 research tool. These include Serving as a specific or Selective with, the 198 aa Human ORFX ORF1686 polypeptide nucleic acid or protein diagnostic and/or prognostic marker, sequence SEQ ID NO:3372 (patp:AAB41922, Expect= wherein the presence or amount of the nucleic acid or the 7.8e-106) (NOV15b has 185/188 aa (98%) identical, 188/ protein are to be assessed. These also include potential 188 aa (100%) positive). NOV15a has 102 of 172 aa therapeutic applications Such as the following: (i) a protein residues (59%) identical to, and 138 of 172 aa residues 60 therapeutic, (ii) a Small molecule drug target, (iii) an anti (80%) positive with, the 571 aa Human proliferation differ body target (therapeutic, diagnostic, drug targeting/ entiation factor amino acid sequence (patp:AAB49765, cytotoxic antibody), (iv) a nucleic acid useful in gene Expect=1.2e-90) (NOV15b has 155/290 aa (53%) identical, therapy (gene delivery/gene ablation), (v) an agent promot 205/290aa (70%) positive). NOV15a has 102 of 172 aa ing tissue regeneration in vitro and in Vivo, and (vi) a residues (59%) identical to and 138 of 172 aa residues (80%) 65 biological defense weapon. positive with, the 571 aa Human membrane or secretory The NOV15 nucleic acids and proteins have applications protein clone PSECO137 (patp:AAB88393, Expect=1.2e in the diagnosis and/or treatment of various diseases and US 6,989,232 B2 193 194 disorders. For example, the compositions of the present about amino acids 250 to 539. In another embodiment, a invention will have efficacy for the treatment of patients NOV15b epitope is from about amino acids 1 to 60. In Suffering from: Systemic lupus erythematosus, autoimmune further embodiments, NOV15b epitopes are from about disease, asthma, emphysema, Scleroderma, allergy, ARDS, amino acids 65 to 225, from about amino acids 230 to 320 fertility, hypogonadism; immunological disease and disor and from about amino acids 325 to 411. This novel protein derS as well as other diseases, disorders and conditions. also has value in development of powerful assay System for Based on the tissues in which NOV15 is most highly functional analysis of various human disorders, which will expressed; including Thryoid, heart, uterus, mammary help in understanding of pathology of the disease and gland, pituitary gland, lymph node, placenta, brain, development of new drug targets for various disorders. pancreas, and Spleen; Specific uses include developing prod NOV16 ucts for the diagnosis or treatment of a variety of diseases and disorders. Additional disease indications and tissue NOV16 includes two novel FYVE finger-containing expression for NOV15 is presented in Example 2. phosphoinositide kinase-like proteins disclosed below. The NOV15 nucleic acids and polypeptides are further useful disclosed proteins have been named NOV16a and NOV16b. in the generation of antibodies that bind immuno 15 NOV16a specifically to the novel NOV15 substances for use in A disclosed NOV16a nucleic acid of 2760 nucleotides therapeutic or diagnostic methods. These antibodies may be (also referred to as 101330077 and 100391903) encoding a generated according to methods known in the art, using novel FYVE-finger kinase/Transposase-like protein is prediction from hydrophobicity charts, as described in the shown in Table 16A. An open reading frame was identified “Anti-NOVX Antibodies' section below. For example the beginning with an ATG initiation codon at nucleotides disclosed NOV15 proteins have multiple hydrophilic 898-900 to and ending with a TGA codon at nucleotides regions, each of which can be used as an immunogen. In one 1516–1518. A putative untranslated region upstream from embodiment, a contemplated NOV15a epitope is from about the initiation codon and downstream from the termination amino acids 1 to 70. In additional embodiments, NOV15a codon is underlined in Table 16A, and the Start and Stop epitopes are from about amino acids 175 to 230 and from codons are in bold letters.

TABLE 1.6A NOV16a Nucleotide Sequence (SEQ ID NO: 99) CCGGGGGCGCAGCCGCGGGCCCACCTCGGCCTCCCCTGAGCGGACGCCTCCCCGCGCGCACCGGGGGCCCCGGAGACCG

CCTTCCCCGCTCCGAACGCACGCGGCCCGGCCCCGGCGAGGTGCCTGAACGCTACCCGAGCTGCGGCGGGGCTCCCGGG

GTGAGTGCTGCAGCCCCAGGCCCGCCTGCTCCCACAGGCTCGGGCAATGGAGACCCGCGGCCGCCCCCGCCCCTTGACC

CTGCCT CACCCCTCACGCCCGCTGCCGCCCACGACCTCCGACCCCGCTGCCGCCCGGCTCGCAGCCCGGCTCGCAGCCC

GGCTCGGCGGGCCTCACCCCCGCGGGTTCCGCACTCCTCTCCCGCCGTCCTGCTCCTCTCGGCCTTCTCCTCCAATA

GGCGCCTAGCACCCTGAGTGGGCTACACCAATCAGAGACGAAGCGGCGCTAACGTGACTGACTAACTAACCAATCCAAA.

GTCTCAATCTCCCTGAGAGGGGCGGAGCGTACCCGGGCCAGCCCTCGCCGCCGATTGGTGATCGACCTCAGGGTTGCAG

GGGCGGTGCCCTTACACGGATTGGAGAGGGCAGCGATGGGGCGGAGTTCAAGCTCCGATTAGTCCGCGCTCCGTGGCGG

GCTTGGCGATTGGACGCCGGCGCTGTCAGCCGCGCGCGGACCGGGGCGGGGCGGGCGGTGCCCCGGGCTGGGCGAGGGG

CCGGGTGCGGGGCCGCTGGCCGAGAGGCTGAGGCGGCGTCATGTCCTCCGAGGTGTCCGCGCGCCGCGACGCCAAGAAG

CTGGTGCGCTCCCCGAGCGGCCTGCGCATGGTGCCCGAACACCGCGCCTTCGGAAGCCCGTTCGGCCTGGAGGAGCCGC

AGTGGGTCCCGGACAAGGAGGTGGGTGTAGCAGTGTGACGCCAAGTTTGACTTTCTCACCAGAAAGCACCACTGTCGC

CGCTGCGGGAAGTGCTTCTGCGACAGGTGCTGCAGCCAGAAGGTGCCGCTGCGGCGCATGTGCTTTGTGGACCCCGTGC

GGCAGTGCGCGGAGTGCGCCCTGGTGTCCCTCAAGGACGGCGAGTTCTACGACAAGCAGCTCAAAGTGCTCCTGAGCGG

AGCCACCTTCCTCGTCACGTTTGGAAACTCAGAGAAACCTGAAACTATGACTTGTCGTCTTTCCAATAACCAGAGATAC

TTGTTTCTGGATGGAGACAGCCACTATGAAATCGAAATTGTACACATTTCCACCGTGCAGATCCTCACAGAAGGCTTCC

CTCCTGGAGAAAAAGACATTCACGCTTACACCAGCCTCCGGGGGAGCCAGCCTGCCTCTGAAGGAGGCAACGCACGGGC

CACAGGCATGTTCCTGCAGTATACAGTGCCGCGGACGGAGGGTGTGACCCAGCTGAAGCTGACAGTGCTGGAGGACGTG

ACTGTGGGCAGGAGGCAGGCGGTGGCGTGGCTAGTGATCTGCAGGCTGCCAAGCTCCTCTATGAATCTCGGGACCAGTA

ACTCTACGTGGGGCGAGCTTGGAGTACGTGTGGTCACCAGGACTGAGTCGCTTGGAACAGCAGAGCCTGCTCCTTGCG

TACCACAGGGATTAATCCTGCTTGTGCTGGGAAATGCAACT CACTCATGTATTTGGAGAAACAGGAGTGTTCACTTATC

TAGGCAATATGTTCACAGTTTATTAATGCTTTAAACAGCTTCATGTTTTAGAATTTGTGTATTGTCAATACTTAATG