US 2003O170630A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2003/0170630 A1 Alsobrook, II et al. (43) Pub. Date: Sep. 11, 2003

(54) PROTEINS AND NUCLEIC ACIDS (22) Filed: Dec. 21, 2001 ENCODING SAME Related U.S. Application Data (76) Inventors: John P. Also brook II, Madison, CT (US); Velizar T. Tchernev, Branford, (60) Provisional application No. 60/257,495, filed on Dec. CT (US); Xiaohong Liu, Canton, MA 21, 2000. Provisional application No. 60/258,171, (US); Kimberly A. Spytek, New filed on Dec. 22, 2000. Provisional application No. Haven, CT (US); Bryan D. Zerhusen, 60/269,940, filed on Feb. 20, 2001. Provisional appli Branford, CT (US); Meera Patturajan, cation No. 60/274,192, filed on Mar. 8, 2001. Provi Branford, CT (US); Denise M. Lepley, sional application No. 60/277,826, filed on Mar. 22, Branford, CT (US); Catherine E. 2001. Provisional application No. 60/279,840, filed Burgess, Wethersfield, CT (US); on Mar. 29, 2001. Provisional application No. 60/282, Richard A. Shimkets, Guilford, CT 981, filed on Apr. 11, 2001. Provisional application (US); William M. Grosse, Branford, No. 60/283,656, filed on Apr. 13, 2001. Provisional CT (US); Edward S. Szekeres JR., application No. 60/309,247, filed on Jul. 31, 2001. Branford, CT (US); Corine A.M. Provisional application No. 60/311,754, filed on Aug. Vernet, Branford, CT (US); Li Li, 10, 2001. Provisional application No. 60/313,331, Branford, CT (US); Stacie J. Casman, filed on Aug. 17, 2001. North Haven, CT (US); Ference L. Boldog, North Haven, CT (US); Linda Publication Classification Gorman, Branford, CT (US); Esha A. Gangoli, Madison, CT (US); Elma R. 51) Int.nt. Cl.C.7 ...... C12O 1/68; CO7H 21/04 Fernandes, Branford, CT (US); Danier C12N 9/00; C12P 21/02; C12N 5/06 K. Rieger, Branford, CT (US); Shlomit (52) U.S. Cl...... 435/6; 435/69.1; 435/183; R. Edinger, New Haven, CT (US); 435/320.1; 435/325; 536/23.2 Erik Gunther, Branford, CT (US); Isabelle Millet, Milford, CT (US); Paul Sciore, North Haven, CT (US); Karen (57) ABSTRACT Ellerman, Branford, CT (US); John R. MacDougall, Hamden, CT (US); Glennda Smithson, Guilford, CT (US) Disclosed herein are nucleic acid Sequences that encode novel polypeptides. Also disclosed are polypeptides encoded Correspondence Address: by these nucleic acid Sequences, and antibodies, which Ivor R. Erifi immunospecifically-bind to the polypeptide, as well as Mintz, Levin, Cohn, Ferris, derivatives, variants, mutants, or fragments of the aforemen Glovsky and Popeo, P.C. tioned polypeptide, polynucleotide, or antibody. The inven One Financial Center tion further discloses therapeutic, diagnostic and research Boston, MA 02111 (US) methods for diagnosis, treatment, and prevention of disor derS involving any one of these novel human nucleic acids (21) Appl. No.: 10/032,189 and proteins. US 2003/0170630 A1 Sep. 11, 2003

PROTEINS AND NUCLEC ACDS ENCODING 33,35, 37, 39, 41, 43, 45, 47,49, 51, 53, 55, and 57) or a SAME complement of Said oligonucleotide. 0007 Also included in the invention are substantially RELATED APPLICATIONS purified NOVX polypeptides (SEQ ID NOS: 2, 4, 6, 8, 10, 0001. This application claims priority from U.S. Ser. Nos. 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 26, 28.40, 42, 60/257.495, filed Dec. 21, 2000; 60/258,171 filed Dec. 22, 44, 46,48, 50, 52, 54, 56, and 58). In certain embodiments, 2000; 60/269,940, filed Feb. 20, 2001; 60/274,192 filed Mar. the NOVX polypeptides include an amino acid Sequence 8, 2001; 60/277,826, filed Mar. 22.2001; 60/279,840 filed that is Substantially identical to the amino acid Sequence of Mar. 29.2001; 60/282,981, filed Apr. 11, 2001; 60/283,656 a human NOVX polypeptide. filed Apr. 13, 2001; 60/309.247, filed Jul. 31, 2001; 60/311, 0008. The invention also features antibodies that immu 754, filed Aug. 10, 2001; and 60/313,331, filed Aug. 17, noselectively bind to NOVX polypeptides, or fragments, 2001; each of which is incorporated by reference in its entirety. homologs, analogs or derivatives thereof. 0009. In another aspect, the invention includes pharma FIELD OF THE INVENTION ceutical compositions that include therapeutically- or pro phylactically-effective amounts of a therapeutic and a phar 0002 The invention generally relates to nucleic acids and maceutically-acceptable carrier. The therapeutic can be, e.g., polypeptides encoded thereby. a NOVX nucleic acid, a NOVX polypeptide, or an antibody specific for a NOVX polypeptide. In a further aspect, the BACKGROUND OF THE INVENTION invention includes, in one or more containers, a therapeuti 0003. The invention generally relates to nucleic acids and cally- or prophylactically-effective amount of this pharma polypeptides encoded therefrom. More specifically, the ceutical composition. invention relates to nucleic acids encoding cytoplasmic, 0010. In a further aspect, the invention includes a method nuclear, membrane bound, and Secreted polypeptides, as of producing a polypeptide by culturing a cell that includes well as vectors, host cells, antibodies, and recombinant a NOVX nucleic acid, under conditions allowing for expres methods for producing these nucleic acids and polypeptides. sion of the NOVX polypeptide encoded by the DNA. If desired, the NOVX polypeptide can then be recovered. SUMMARY OF THE INVENTION 0011. In another aspect, the invention includes a method 0004. The invention is based in part upon the discovery of detecting the presence of a NOVX polypeptide in a of nucleic acid Sequences encoding novel polypeptides. The Sample. In the method, a Sample is contacted with a com novel nucleic acids and polypeptides are referred to herein pound that Selectively binds to the polypeptide under con as NOVX, or NOV1, NOV2, NOV3, NOV4, NOV5, NOV6, ditions allowing for formation of a complex between the NOV7, NOV8, NOV9, NOV10, NOV11, NOV12, and polypeptide and the compound. The complex is detected, if NOV13 nucleic acids and polypeptides. These nucleic acids present, thereby identifying the NOVX polypeptide within and polypeptides, as well as derivatives, homologs, analogs the Sample. and fragments thereof, will hereinafter be collectively des 0012. The invention also includes methods to identify ignated as “NOVX nucleic acid or polypeptide Sequences. Specific cell or tissue types based on their expression of a 0005. In one aspect, the invention provides an isolated NOVX. NOVX nucleic acid molecule encoding a NOVX polypep 0013 Also included in the invention is a method of tide that includes a nucleic acid Sequence that has identity to detecting the presence of a NOVX nucleic acid molecule in the nucleic acids disclosed in SEQ ID NOS: 1, 3, 5, 7, 9, 11, a sample by contacting the sample with a NOVX nucleic 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,35, 37, 39, 41, 43, acid probe or primer, and detecting whether the nucleic acid 45, 47, 49, 51, 53, 55, and 57. In some embodiments, the probe or primer bound to a NOVX nucleic acid molecule in NOVX nucleic acid molecule will hybridize under stringent the Sample. conditions to a nucleic acid Sequence complementary to a nucleic acid molecule that includes a protein-coding 0014. In a further aspect, the invention provides a method Sequence of a NOVX nucleic acid Sequence. The invention for modulating the activity of a NOVX polypeptide by also includes an isolated nucleic acid that encodes a NOVX contacting a cell Sample that includes the NOVX polypep polypeptide, or a fragment, homolog, analog or derivative tide with a compound that binds to the NOVX polypeptide thereof. For example, the nucleic acid can encode a polypep in an amount Sufficient to modulate the activity of Said tide at least 80% identical to a polypeptide comprising the polypeptide. The compound can be, e.g., a Small molecule, amino acid sequences of SEQID NOS: 2, 4, 6, 8, 10, 12, 14, Such as a nucleic acid, peptide, polypeptide, peptidomimetic, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 26, 28, 40, 42, 44, 46, carbohydrate, lipid or other organic (carbon containing) or 48, 50, 52, 54, 56, and 58. The nucleic acid can be, for inorganic molecule, as further described herein. example, a genomic DNA fragment or a cDNA molecule 0015. Also within the scope of the invention is the use of that includes the nucleic acid sequence of any of SEQ ID a therapeutic in the manufacture of a medicament for treat NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, ing or preventing disorders or Syndromes including, e.g., 31,33, 35, 37, 39, 41,43, 45, 47,49, 51, 53, 55, and 57. asthma, allergies, emphysema, bronchitis, autoimmune dis 0006 Also included in the invention is an oligonucle ease, immunodeficiencies, transplantation, graft verSuS host otide, e.g., an oligonucleotide which includes at least 6 disease, arthritis, tendonitis, Scleroderma, Systemic lupus contiguous nucleotides of a NOVX nucleic acid (e.g., SEQ erythematosus, ARDS, lymphedema, allergic encephalomy ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, elitis, experimental allergic encephalomyelitis (EAE), Vari US 2003/0170630 A1 Sep. 11, 2003

ous forms of arthritis, bacterial infections, cystic fibrosis, compositions of the present invention will have efficacy for lung cancer, adrenoleukodystrophy, congenital adrenal treatment of patients Suffering from the diseases and disor hyperplasia, leukodystrophies, cancer Such as AML, coro derS disclosed above and/or other pathologies and disorders nary artery disease, Stroke, hypertension, myocardial infarc of the like. tion, atherosclerosis, hemophilia, hypercoagulation, idio pathic thrombocytopenic purpura, aneurysm, hypertension, 0018. The invention further includes a method for screen myocardial infarction, embolism, cardiovascular disorders, ing for a modulator of disorders or Syndromes including, bypass Surgery, hypertriglyceridemia, hypoalphalipopro e.g., the diseases and disorders disclosed above and/or other teinemia, hyperlipidemia, noninsulin-dependent diabetes pathologies and disorders of the like. The method includes mellitus, obesity, diabetes, Diabetes insipidus nephrogenic, contacting a test compound with a NOVX polypeptide and autosomal dominant, Diabetes insipidus, nephrogenic, auto determining if the test compound binds to said NOVX Somal recessive, Tangier disease, LCAT deficiency, fish polypeptide. Binding of the test compound to the NOVX eye disease, Von Hippel-Lindau (VHL) syndrome, tuberous polypeptide indicates the test compound is a modulator of Sclerosis, hypercalceimia, Lesch-Nyhan Syndrome, cirrho activity, or of latency or predisposition to the aforemen sis, inflammatory bowel disease, diverticular disease, Hir tioned disorders or Syndromes. Schsprung's disease, Crohn's Disease, appendicitis, ulcers, 0019. Also within the scope of the invention is a method laryngitis, muscular dystrophy, myasthenia gravis, for Screening for a modulator of activity, or of latency or endometriosis, pancreatitis, hyperparathyroidism, hypopar predisposition to disorders or Syndromes including, e.g., the athyroidism, Xerostomia, pSoriasis, actinic keratosis, acne, diseases and disorders disclosed above and/or other patholo hair growth/loSS, allopecia, pigmentation disorders, endo gies and disorders of the like by administering a test com crine disorders, tonsillitis, cystitis, incontinence, uveitis, pound to a test animal at increased risk for the aforemen corneal fibroblast proliferation, amyotrophic lateral Sclero tioned disorders or Syndromes. The test animal expresses a sis, acute pancreatitis, cerebral cryptococcosis, colitis, thy recombinant polypeptide encoded by a NOVX nucleic acid. roiditis, nonsyndromic , keratinization disorders, Expression or activity of NOVX polypeptide is then mea gap-junction-related neuropathies and other pathological Sured in the test animal, as is expression or activity of the conditions of the nervous System, where dysfunctions of protein in a control animal which recombinantly-expresses junctional communication are considered to play a casual NOVX polypeptide and is not at increased risk for the role, demyelinating neuropathies (including Charcot-Marie disorder or syndrome. Next, the expression of NOVX Tooth disease), erythrokeratodermia variabilis (EKV), atrio polypeptide in both the test animal and the control animal is ventricular (AV) conduction defects Such as arrhythmia, lens compared. A change in the activity of NOVX polypeptide in cataract, osteoporosis, osteoarthirtis, Achalasia-addisonian the test animal relative to the control animal indicates the ism-alacrimia Syndrome, Cataract, polymorphic and lamel test compound is a modulator of latency of the disorder or lar; Cyclic ichthyosis with epidermolytic , Syndrome. Enuresis, nocturnal, 2, Epidermolysis bullosa simplex, Koebner, Dowling-Meara, and Weber-Cockayne types; Epi 0020. In yet another aspect, the invention includes a dermolytic hyperkeratosis, Fundus albipunctatus, Glioma; method for determining the presence of or predisposition to Ichthyosis bullosa of Siemens, , palmoplantar, a disease associated with altered levels of a NOVX polypep nonepidermolytic, MeeSmann corneal dystrophy; Moni tide, a NOVX nucleic acid, or both, in a Subject (e.g., a lethrix, Myopathy, congenital; , human Subject). The method includes measuring the amount Jackson-Lawler type, Pachyonychia congenita, Jadassohn of the NOVX polypeptide in a test sample from the subject Lewandowsky type, , Bothnia and comparing the amount of the polypeptide in the test type; Persistent Mullerian duct syndrome, type II; Spastic sample to the amount of the NOVX polypeptide present in paraplegia-10; , Liver disease, Suscep a control sample. An alteration in the level of the NOVX tibility to, from hepatotoxins or viruses, Alzheimer's dis polypeptide in the test Sample as compared to the control ease, Parkinson's disease, Huntington's disease, cerebral Sample indicates the presence of or predisposition to a palsy, epilepsy, multiple Sclerosis, ataxia-telangiectasia, disease in the Subject. Preferably, the predisposition behavioral disorders, addiction, anxiety, pain, neuroprotec includes, e.g., the diseases and disorders disclosed above tion, fertility, growth and reproductive disorders, renal artery and/or other pathologies and disorders of the like. Also, the Stenosis, interstitial nephritis, glomerulonephritis, polycystic expression levels of the new polypeptides of the invention kidney disease, renal tubular acidosis, IgA nephropathy, can be used in a method to Screen for various cancers as well and/or other pathologies and disorders of the like. as to determine the Stage of cancers. 0016. The therapeutic can be, e.g., a NOVX nucleic acid, 0021. In a further aspect, the invention includes a method a NOVX polypeptide, or a NOVX-specific antibody, or of treating or preventing a pathological condition associated biologically-active derivatives or fragments thereof. with a disorder in a mammal by administering to the Subject 0017 For example, the compositions of the present a NOVX polypeptide, a NOVX nucleic acid, or a NOVX invention will have efficacy for treatment of patients suffer Specific antibody to a Subject (e.g. a human Subject), in an ing from the diseases and disorders disclosed above and/or amount Sufficient to alleviate or prevent the pathological other pathologies and disorders of the like. The polypeptides condition. In preferred embodiments, the disorder, includes, can be used as immunogens to produce antibodies Specific e.g., the diseases and disorders disclosed above and/or other for the invention, and as vaccines. They can also be used to pathologies and disorders of the like. Screen for potential agonist and antagonist compounds. For 0022. In yet another aspect, the invention can be used in example, a cDNA encoding NOVX may be useful in gene a method to identity the cellular receptors and downstream therapy, and NOVX may be useful when administered to a effectors of the invention by any one of a number of Subject in need thereof. By way of non-limiting example, the techniques commonly employed in the art. These include but US 2003/0170630 A1 Sep. 11, 2003 are not limited to the two-hybrid system, affinity purifica 0024. Other features and advantages of the invention will tion, co-precipitation with antibodies or other specific-inter be apparent from the following detailed description and acting molecules. claims. DETAILED DESCRIPTION OF THE 0023. Unless otherwise defined, all technical and scien INVENTION tific terms used herein have the same meaning as commonly 0025 The present invention provides novel nucleotides understood by one of ordinary skill in the art to which this and polypeptides encoded thereby. Included in the invention invention belongs. Although methods and materials similar are the novel nucleic acid Sequences and their encoded or equivalent to those described herein can be used in the polypeptides. The Sequences are collectively referred to practice or testing of the present invention, Suitable methods herein as “NOVX nucleic acids” or “NOVX polynucle and materials are described below. All publications, patent otides' and the corresponding encoded polypeptides are applications, patents, and other references mentioned herein referred to as “NOVX polypeptides” or “NOVX proteins.” are incorporated by reference in their entirety. In the case of Unless indicated otherwise, “NOVX' is meant to refer to conflict, the present Specification, including definitions, will any of the novel Sequences disclosed herein. Table A pro control. In addition, the materials, methods, and examples vides a Summary of the NOVX nucleic acids and their are illustrative only and not intended to be limiting. encoded polypeptides.

TABLE A Sequences and Corresponding SEQ ID Numbers SEO ID NO NOVX (nucleic SEQID NO Assignment Internal Identification acid) (polypeptide) Homology CG55750-01 1. 2 Airway Trypsin-Like Protease-like 1684.46573 3 4 Airway Trypsin-Like Protease-like 1684.46539 5 6 Airway Trypsin-Like Protease-like 1684.46547 7 8 Airway Trypsin-Like Protease-like CG55782-01 9 1O P450-like CG55771-01 11 12 Apollipoprotein A-I precursor-like CG55771-02 13 14 Apollipoprotein A-I precursor-like CG55700-01 15 16 HSP90 co-chaperone-like CG55700-02 17 18 HSP90 Co-Chaperone (Progesterone Receptor Complex P23) - like CG55700-03 19 2O HSP90 co-chaperone-like CGS5706-01 21 22 Type III adenylyl cyclase like CGSO389-O2 23 24 Interleukin 1 receptor related protein-like CGSO389-03 25 26 Interleukin 1 receptor related protein-like CGSO389-04 27 28 Interleukin 1 receptor related protein-like CGSO389-01 29 3O Interleukin 1 receptor related protein-like CGSO387-O2 31 32 Connexin GJA3-like CGSO271-O1 33 34 Olfactory Receptor-like CGSS844-O1 35 36 P450-like 1a CG55752-01 37 38 Alpha Glucosidase 2, Alpha Neutral Subunit-like 1b CG55752-02 39 40 Alpha Glucosidase 2-like CG55752-03 41 42 Glucosidase II-like CG55752-04 43 44 Glucosidase II-like CG55776-01 45 46 Mechanical stress induced protein-like 174124289 47 48 Mechanical stress induced protein-like 1741.24313 49 50 Mechanical stress induced protein-like 2d 174124322 51 52 Mechanical stress induced protein-like 174124322 53 54 Mechanical stress induced protein-like US 2003/0170630 A1 Sep. 11, 2003

TABLE A-continued Sequences and Corresponding SEQ ID Numbers SEO ID NO NOVX (nucleic SEQ ID NO Assignment Internal Identification acid) (polypeptide) Homology 12f CG55776-03 55 56 Mechanical stress induced protein-like 13 CGSS908-O1 57 58 Integrin-like FG-GAP domain containing novel protein like

0.026 NOVX nucleic acids and their encoded polypep disorders, bypass Surgery, cirrhosis, inflammatory bowel tides are useful in a variety of applications and contexts. The disease, diverticular disease, Hirschsprung's disease, various NOVX nucleic acids and polypeptides according to Crohn's Disease, appendicitis, ulcers, diabetes, renal artery the invention are useful as novel members of the protein Stenosis, interstitial nephritis, glomerulonephritis, polycystic families according to the presence of domains and Sequence kidney disease, Systemic lupus erythematosus, renal tubular relatedness to previously described proteins. Additionally, acidosis, IgA nephropathy, laryngitis, emphysema, ARDS, NOVX nucleic acids and polypeptides can also be used to lymphedema, muscular dystrophy, myasthenia gravis, identify proteins that are members of the family to which the endometriosis, pancreatitis, hyperparathyroidism, hypopar NOVX polypeptides belong. athyroidism, growth and reproductive disorders, Xerostomia, 0.027 NOV1 is homologous to a Airway Trypsin-Like pSoriasis, actinic keratosis, acne, hair growth/loSS, allopecia, Protease-like family of proteins. Thus, the NOV1 nucleic pigmentation disorders, endocrine disorders, tonsillitis, cyS acids, polypeptides, antibodies and related compounds titis, incontinence, and/or other pathologies. according to the invention will be useful in therapeutic and 0031) NOV5 is homologous to the Type III adenylyl diagnostic applications implicated in, for example, asthma cyclase-like family of proteins. Thus NOV5 nucleic acids, and cystic fibrosis, allergies, emphysema, bronchitis, lung polypeptides, antibodies and related compounds according cancer, or other pathologie or conditions. to the invention will be useful in therapeutic and diagnostic 0028 NOV2 is homologous to the P450-like family of applications implicated in, diabetes, heart failure, neurologi proteins. Thus NOV2 nucleic acids, polypeptides, antibodies cal diseaseS Such as epilepsy, Sleep disorder, parkinsonism, and related compounds according to the invention will be Huntington's disease, Alzheimer's disease, depression, useful in therapeutic and diagnostic applications implicated Schizophrenia diseases, disorders and conditions. in various pathologies and disorders. 0032 NOV6 is homologous to the Interleukin 1 receptor 0029 NOV3 is homologous to a family of Apollipopro related protein-like family of proteins. Thus NOV6 nucleic tein A-I precursor-like proteins. Thus, the NOV3 nucleic acids, polypeptides, antibodies and related compounds acids and polypeptides, antibodies and related compounds according to the invention will be useful in therapeutic and according to the invention will be useful in therapeutic and diagnostic applications implicated in, for example: uveitis diagnostic applications implicated in, for example: coronary and corneal fibroblast proliferation, allergic encephalomy elitis, amyotrophic laternal Sclerosis, acute pancreatitis, artery disease, Stroke, hypertriglyceridemia, hypoalphalipo cerebral cryptococcosis, autoimmune disease including proteinemia, hyperlipidemia, Tangier disease, LCAT defi Type 1 diabetes mellitus (DM), experimental allergic ciency, fish-eye disease, noninsulin-dependent diabetes encephalomyelitis (EAE), Systemic lupus erythematosus mellitus, hypertension, myocardial infarction, atherosclero (SLE), colitis, thyroiditis and various forms of arthritis, sis, and/or other pathologies. cancer Such as AML, bacterial infections, and/or other 0030 NOV4 is homologous to the HSP90 co-chaperone pathologies/disorders. like family of proteins. Thus, NOV4 nucleic acids, polypep tides, antibodies and related compounds according to the 0033) NOV7 is homologous to members of the Interleu invention will be useful in therapeutic and diagnostic appli kin 1 receptor related protein-like family of proteins. Thus, cations implicated in, for example: adrenoleukodystrophy, the NOV7 nucleic acids, polypeptides, antibodies and congenital adrenal hyperplasia, hemophilia, hypercoagula related compounds according to the invention will be useful tion, idiopathic thrombocytopenic purpura, autoimmune dis in therapeutic and diagnostic applications implicated in, for ease, allergies, asthma, immunodeficiencies, transplantation, example, uveitis and corneal fibroblast proliferation, allergic graft versus host disease, Von Hippel-Lindau (VHL) syn encephalomyelitis, amyotrophic lateral Sclerosis, acute pan drome, Alzheimer's disease, Stroke, tuberous Sclerosis, creatitis, cerebral cryptococcosis, autoimmune disease hypercalceimia, Parkinson's disease, Huntington's disease, including Type 1 diabetes mellitus (DM), experimental cerebral palsy, epilepsy, Lesch-Nyhan Syndrome, multiple allergic encephalomyelitis (EAE), Systemic lupus erythema Sclerosis, ataxia-telangiectasia, leukodystrophies, behav tosus (SLE), colitis, thyroiditis and various forms of arthri ioral disorders, addiction, anxiety, pain, neuroprotection, tis, cancer Such as AML, bacterial infections, and/or other arthritis, tendonitis, fertility, atherosclerosis, aneurysm, pathologies/disorders. hypertension, fibromuscular dysplasia, Stroke, Scleroderma, 0034) NOV8 is homologous to the connexin GJA3-like obesity, myocardial infarction, embolism, cardiovascular family of proteins. Thus, NOV8 nucleic acids and polypep US 2003/0170630 A1 Sep. 11, 2003 tides, antibodies and related compounds according to the insipidus, nephrogenic, autosomal recessive, Enuresis, noc invention will be useful in therapeutic and diagnostic appli turnal, 2, Epidermolysis bullosa Simplex, Koebner, Dowl cations implicated in, for example; ) nonsyndromic deaf ing-Meara, and Weber-Cockayne types; Epidermolytic neSS, keratinization disorders, gap-junction-related neuropa hyperkeratosis, FunduS albipunctatus, Glioma, Ichthyosis thies and other pathological conditions of the nervous bullosa of Siemens, Keratoderma, palmoplantar, nonepider System, where dysfunctions of junctional communication molytic; Meesmann corneal dystrophy; ; are considered to play a casual role, demyelinating neuro Myopathy, congenital; Pachyonychia congenita, Jackson pathies (including Charcot-Marie-Tooth disease), erythro Lawler type; Pachyonychia congenita, Jadassohn-Lewan keratodermia variabilis (EKV), atrioventricular (AV) con dowsky type, Palmoplantar keratoderma, Bothnia type, Per duction defects Such as arrhythmia, lens cataract, and/or Sistent Mullerian duct Syndrome, type II; Spastic paraplegia other pathologies/disorders. 10, White Sponge nevus, Liver disease, Susceptibility to, from hepatotoxins or viruses; Von Hippel-Lindau (VHL) 0035) NOV9 is homologous to the Olfactory Receptor Syndrome, Alzheimer's disease, Stroke, tuberous Sclerosis, like family of proteins. Thus, NOV9 nucleic acids and hypercalceimia, Parkinson's disease, Huntington's disease, polypeptides, antibodies and related compounds according cerebral palsy, epilepsy, Lesch-Nyhan Syndrome, multiple to the invention will be useful in therapeutic and diagnostic Sclerosis, ataxia-telangiectasia, leukodystrophies, behav applications implicated in various pathologies or disorders. ioral disorders, addiction, anxiety, pain, neuroprotection; 0036) NOV10 is homologous to the P450-like family of lymphedema, allergies, and/or other pathologies/disorders. proteins. Thus, NOV10 nucleic acids and polypeptides, 0040. The NOVX nucleic acids and polypeptides can also antibodies and related compounds according to the invention be used to Screen for molecules, which inhibit or enhance will be useful in various pathologies or disorders. NOVX activity or function. Specifically, the nucleic acids 0037 NOV11 is homologous to the Integrin-like FG and polypeptides according to the invention may be used as GAP domain containing novel protein-like family of pro targets for the identification of Small molecules that modu teins. Thus, NOV11 nucleic acids and polypeptides, anti late or inhibit, e.g., neurogenesis, cell differentiation, cell bodies and related compounds according to the invention proliferation, hematopoiesis, wound healing and angiogen will be useful in therapeutic and diagnostic applications CSS. implicated in various pathologies or disorders. 0041 Additional utilities for the NOVX nucleic acids and 0038) NOV12 is homologous to the Mechanical stress polypeptides according to the invention are disclosed herein. induced protein-like family of proteins. Thus, NOV12 nucleic acids and polypeptides, antibodies and related com 0042) NOV1 pounds according to the invention will be useful in thera 0043) NOV1 includes three novel Airway Trypsin-Like peutic and diagnostic applications implicated in, for Protease-like proteins disclosed below. The disclosed example, Osteoporosis, Osteoarthritis, cardiac hypertrophy, sequences have been named NOV1a, NOV1b, and NOV1c. atherosclerosis, hypertension, restenosis, and/or other pathologies/disorders. 0044) NOV1a 0039) NOV13 is homologous to the Integrin-like FG 0045. A disclosed NOV1a nucleic acid of 1386 nucle GAP domain containing novel protein-like family of pro otides (also referred to as CG55750-01) encoding a Airway teins. Thus, NOV13 nucleic acids and polypeptides, anti Trypsin-Like Protease-like protein is shown in Table 1A. An bodies and related compounds according to the invention open reading frame was identified beginning with an ATG will be useful in therapeutic and diagnostic applications initiation codon at nucleotides 64-66 and ending with a TGA implicated in, for example, Achalasia-addisonianism-alac codon at nucleotides 1324-1326. A putative untranslated rimia Syndrome, Cataract, polymorphic and lamellar; Cyclic region upstream from the initiation codon and downstream ichthyosis with epidermolytic hyperkeratosis, Diabetes from the termination codon is underlined in Table 1A. The insipidus, nephrogenic, autosomal dominant; Diabetes Start and Stop codons are in bold letters.

TABLE 1A NOV1 a nucleotide sequence. ID NO: 1) AAAAGGAACATTTAGTCTTAAAATCCTATTCATTTTTAACACACAATTCTTTCTCAAAAGGCCAGACACTG

GGTAGAAGAGTGAGTTCACTGAAACCATGGATGTTTGCCCTTATTGTCAGAGCTGTTGTGTTGATTCTGGTG

ATACTCATTGGTCTCCTTGTTTATTTTTTGGCATATAAGTTTTACTATTACCAAACCTCCTTCCAGATCCCC

AGTATTGAATATAATTTAGCTATTAATACTTGTGTGACACAAGAGGAGAGAATCTATGACAATAAAATGTGT

AAAATAATGTCTAGGATATTTCGACATTCTTCTGTAGGCGGTCGATTTATCAAATCTCATGTTATCAAATTA

AGGCCAAGTAATGACAATTTGAAAGCAGATGTATTGCTTAAATTTCAGTTTATTCCTAACAATGAGAACGCA

ATAAAAACACAAGCTGATAACATTTTGCATCAGAAGTTGAAATCAAATGAAAGCTCTTTGACCATAAACAAA.

CCATCATTTAGACT CACACCTATTGACAGCAAAAAGATGAGGAATCTTCTCAACAGTCGCTGTGGAATAAGG US 2003/0170630 A1 Sep. 11, 2003

TABLE 1A-continued

NOV1 a nucleotide sequence.

ATGACATCTTCAAACATGCCATTACCAGCATCCTCTTCTACTCAAAGAATTGTCCAAGGAAGGGAAACAGCT

ATGGAAGGGGAATGGCCATGGCAGGCCAGCCTCCAGCTCATAGGGTCAGGCCATCAGTGTGGAGCCAGCCTC.

ATCAGTAACACATGGCTGCTCACAGCAGCTCACTGCTTTTGGAAAAATAAAGACCCAAGTCAATGGATTGCT

ACTTTTGGTGCAACTATAACACCACCCGCAGTGAAACGAAATGTGAGGAAAATTATTCTTCATGAGAATTAC

CATAGAGAAACAAATGAAAATGACATTGCTTTGGTTCAGCTCTCTACTGGAGTTGAGTTTTCAAATATAGTC

CAGAGAGTTTGCCTCCCAGACTCATCTATAAAGTTGCCACCTAAAACAAGTGTGTTCGTCACAGGATTTGGA

TCCATTGTAGATGATGGACCTATACAAAATACACTTCGGCAAGCCAGAGTGGAAACCATAAGCACTGATGTG

TGTAACAGAAAGGATGTGTATGATGGCCTGATAACTCCAGGAATGTTATGTGCTGGATCCATGGAAGGAAAA

ATAGATGCATGTAAGGGAGATTCTGGTGGACCTCTGGTTTATGATAATCATGACATCTGGTACATTGTAGGT

ATAGTAAGTTGGGGACAATCATGTGCACTTCCCAAAAAACCTGGAGTCTACACCAGAGTAACTAAGTATCGA

GATTGGATTGCCTCAAAGACTGGTATGTAGTGTGGATTGTCCAGAGTTATACACATGGCACACAGAGCTGA

TACTCCTGCGTATTTGTA

0046. In a search of public sequence databases, the interpreted as meaning that in a database of the current size NOV1 a nucleic acid Sequence, located on chromSome 4 has one might expect to See one match with a similar Score 489 of 707 bases (69%) identical to a gb:GENBANK Simply by chance. An Evalue of Zero means that one would ID:AF064819 acc:AF064819.1 mRNA from Homo sapiens not expect to See any matches with a similar Score Simply by (Homo Sapiens serine protease DESC1 (DESC1) mRNA, chance. See, e.g., http://www.ncbi.nlm.nih.gov/Education/ complete cds). Public nucleotide databases include all Gen BLASTinfo/. Occasionally, a string of X’s or N’s will result Bank databases and the GeneSeq patent database. from a BLAST search. This is a result of automatic filtering 0047. In all BLAST alignments herein, the “E-value” or of the query for low-complexity Sequence that is performed “Expect' value is a numeric indication of the probability that to prevent artifactual hits. The filter substitutes any low the aligned Sequences could have achieved their similarity to complexity sequence that it finds with the letter “N” in the BLAST query Sequence by chance alone, within the nucleotide sequence (e.g., “NNNNNNNNNNNNN”) or the database that was Searched. For example, the probability that letter “X” in protein sequences (e.g., “XXXXXXXXX"). the subject (“Sbjct”) retrieved from the NOV1 BLAST Low-complexity regions can result in high Scores that reflect analysis, e.g., Airway Trypsin-Like Protease mRNA from compositional bias rather than significant position-by-posi Homo Sapiens, matched the Query NOV1 sequence purely tion alignment. (Wootton and Federhen, Methods Enzymol by chance is 1.3e'. The Expect value (E) is a parameter 266:554-571, 1996). that describes the number of hits one can “expect' to see just by chance when Searching a database of a particular size. It 0049. The disclosed NOV1a polypeptide (SEQID NO: 2) decreases exponentially with the Score (S) that is assigned encoded by SEQ ID NO: 1 has 420 amino acid residues and to a match between two Sequences. ESSentially, the E value is presented in Table 1B using the one-letter amino acid describes the random background noise that exists for code. Signal P, Psort and/or Hydropathy results predict that matches between Sequences. NOV1a has a signal peptide and is likely to be localized in the plasma membrane with a certainty of 0.6850. In other 0.048. The Expect value is used as a convenient way to embodiments, NOV1a may also be localized to the endo create a Significance threshold for reporting results. The plasmic reticulum (membrane) with acertainty of 0.6400, the default value used for blasting is typically set to 0.0001. In Golgi body with a certainty of 0.1700 or in the endoplasmic BLAST 2.0, the Expect value is also used instead of the P reticulum (lumen) with a certainty of 0.1000. The most value (probability) to report the Significance of matches. For likely cleavage site for a NOV1a peptide is between amino example, an E value of one assigned to a hit can be acids 38 and 39, at: FLA-YK. US 2003/0170630 A1 Sep. 11, 2003

TABLE 1B

Encoded NOV1a protein sequence.

(SEQ ID NO: 2) MTLGRRVSSLKPWMFALIVRAWWLILWILIGLLWYFLAYKFYYYQTSFQIPSIEYNLAINTCWTOEERIYDN

KMCKIMSRIFRHSSWGGRFIKSHVIKLRPSNDNLKAnWLLKFQFIPNNENAIKTOADNILHQKIKSNESSLT

INKPSFRLTPIDSKRNLLNSRCGIRMTSSNMPLPASSSTORIVQGRETAJWIEGEWPWOASLQLIGSGHQCG

ASLISNTWLLTAAHCFWKNKDPTOWIATFGATITPPAWKRNVRKIILHENYHRETNENDIALWQLSTGVEFS

NIVQRVCLPDSSIKLPPKTSWFVTGFGSIVDDGPIONTLROARVETISTDWCNRKDWYDGLITPGMLCAGFM

EGKIDACKGDSGGPLWYDNHDIWYIVOIWSWGQSCALPKKPGVYTRWTKYRDWIASKTGM

0050 A search of sequence databases reveals that the 0051) NOV1b NOV1 a amino acid sequence has 192 of 411 amino acid 0.052 A disclosed NOV1b nucleic acid of 708 nucle otides (also referred to as 168446573) encoding a novel residues (46%) identical to, and 267 of 411 amino acid Airway Trypsin-Like Protease-like protein is shown in Table residues (64%) similar to, the 418 amino acid residue 1C. An open reading frame was identified beginning with an AGA initiation codon at nucleotides 1-3 and ending at ptnr:SPTREMBL-ACC:060235 protein from Homo sapiens nucleotides 706-708. The start codon is in bold letters in (Human) (Airway Trypsin-Like Protease) (E=3.1e). Pub Table 1C. Since the start codon of NOV1b is not a traditional initiation codon, and NOV1b has no termination codon, lic amino acid databases include the GenBank databases, NOV1b could be a partial open reading frame that could be SwissProt, PDB and PIR. extended in the 5' and/or 3' direction(s).

TABLE 1C

NOV1b nucleotide sequence.

(SEQ ID NO:3) AGATCTGTCCAAGGAAGGGAAACAGCTATGGAAGGGGAATGGCCATGGCAGGCCAGCCTCCAGCTCATAGGG

TCAGGCCATCAGTGTGGAGCCAGCCTCATCAGTAACACATGGCTGCTCACAGCAGCTCACTGCTTTTGGAAA

AATAAAGACCCAACT CAATGGATTGCTACTTTTGGTGCAACTATAACACCACCCGCAGTGAAACGAAATGTG

AGGAAAATTATTCTTCATGAGAATTACCATAGAGAAACAAATGAAAATGACATTGCTTTGGTTCAGCTCTCT

ACTGGAGTCGGGTTTTCAAATATAGTCCAGAGAGTTTGCCTCCCAGACTCATCTATAAAGTTGCCACCTAAA

ACAAGTGTGTTCGTCACAGGATTTGGATCCATTGTAGATGATGGACCTATACAAAATACACTTCGGCAAGCC

AGAGTGGAAACCATAAGCACTGATGTGTGTAACAGAAAGGATGTGTATGATGGCCTGATAACTCCAGGAATG

TTATGTGCTGGATTCATGGAAGGAAAAATAGATGCATGTAAGGGAGATTCTGGTGGACCTCTGGTTTATGAT

AATCATGACATCTGGTACATTGTAGGTATAGTAAGTTGGGGACAATCATGTGCACTTCCCAAAAAACCTGGA

GTCTACACCAGAGTAACTAAGTATCGAGATTGGATTGCCTCAAAGACTGGTATGCTCGAG US 2003/0170630 A1 Sep. 11, 2003

0053) The disclosed NOV1b polypeptide (SEQ ID NO: 4) encoded by SEQ ID NO: 3 has 236 amino acid residues and is presented in Table 1D using the one-letter amino acid code.

TABLE 1D Encoded NOV1b protein sequence. (SEQ ID NO: 4) RSVQGRETANEGEWOWOASKQKUGSGHQCGASLISNTWLLTAAHCFWKNKDPTOWIATFGATITPPAWKRNV

RKIILHENYHRETNENDIALWQLSTGWGFSNIVQRVCLPDSSIKLPPKTSWFWTGFGSIWDDGPIONTLRQA

RVETISTDWCNRKDWYDGLITPGMLCAGFMEGKIDACKGDSGGPLWYDNHDIWYIVCIWSWGQSCALPKKPG

WYTRWTKYRDWASKTGMLE

0054) NOV1c initiation codon at nucleotides 1-3 and ending at nucleotides 706-708. The start codon is in bold letters in Table 1 E. Since 0055) A disclosed NOV1c nucleic acid of 708 nucleotides the start codon of NOV1c is not a traditional initiation (also referred to as 168446539) encoding a novel Airway codon, and NOV1c has no termination codon, NOV1c could Trypsin-Like Protease-like protein is shown in Table 1E. An be a partial open reading frame that could be extended in the open reading frame was identified beginning with an AGA 5' and/or 3' direction(s).

TABLE 1E NOV1c nucleotide sequence. (SEQ ID NO:5) AGATCTGTCCAAGGAAGGGAAACAGCTATGGAAGGGGAATGGCCATGGCAGGCCAGCCTCCAGCTCATAGGG

TCAGGCCATCAGTGTGGAGCCAGCCTCATCAGTAACACATGGCTGCTCACAGCAGCTCACTGCTTTTGGAAA

AATAAAGACCCAACTCAATGGATTGCTACTTTTGGTGCAACTATAACACCACCCGCAGTGAAACGAAATGTG

AGGAAAATTATTCTTCATGAGAATTACCATAGAGAAACAAATGAAAATGACATTGCTTTGGTTCAGCTCTCT

ACTGGAGTTGAGTTTTCAAATATAGTCCAGAGAGTTTACCTCCCAGACTCATCTATAAAGTTGCCACCTAAA

ACAAGTGTGTTCGTCACAGGATTTGGATCCATTGTAGATGATGGACCTATACAAAATACACTTCGGCAAGCC

AGAGTGGAAACCATAAGCACTGATGTGTGTAACAGAAAGGATGTGTATGATGGCCTGATAACTCCAGGAATG

TTATGTGCTGGATTCATGGAAGGAAAAATAGATGCATGTAAGGGAGATTCTGGTGGACCTCTGGTTTATGAT

AATCATGACATCTGGTACATTGTAGGTATAGTAAGTTGGGGACAATCATGTGCACTTCCCAAAAAACCTGGA

GTCTACACCAGAGTAACTAAGTATCGAGATTGOATTGCCT CAAAGACTGGTATGCTCGAG

0056. The reverse complement is shown in Table 1F.

TABLE 1.F NOV1 c reverse complement nucleotide sequence. (SEQ ID NO: 59) CCGAGCATACCAGTCTTTGAGGCAATCCAATCTCGATACTTAGTTACTCTGGTGTAGACTCCAGGTTTTTT

GGGAAGTGCACATGATTGTCCCCAACTTACTATACCTACAATGTACCAGATGTCATGATTATCATAAACCAG

AGGTCCACCAGAATCTCCCTTACATGCATCTATTTTTCCTTCCATGAATCCAGCACATAACATTCCTGGAGT

TATCAGGCCATCATACACATCCTTTCTGTTACACACATCAGTGCTTATGGTTTCCACTCTGGCTTGCCGAAG

TGTATTTTGTATAGGTCCATCATCTACAATGGATCCAAATCCTGTGACGAACACACTTGTTTTAGGTGGCAA

CTTTATAGATGAGTCTGCGAGGTAAACTCTCTGGACTATATTTGAAAACTCAACTCCAGTAGAGAGCTGAAC US 2003/0170630 A1 Sep. 11, 2003

TABLE 1F-continued NOV1c reverse complement nucleotide sequence.

CAAAGCAATGTCATTTTCATTTGTTTCTCTATGGTAATTCTCATGAAGAATAATTTTCCTCACATTTCGTTT

CACTGCGGGTGGTGTTATAGTTGCACCAAAAGTAGCAATCCATTGAGTTGGGTCTTTATTTTTCCAAAAGCA

GTGAGCTGCTGTGAGCAGCCATGTGTTACTGATGAGGCTGGCTCCACACTCATGGCCTCACCCTATGAGCTC

GACGCTGGCCTGCCATGGCCATTCCCCTTCCATAGCTGTTTCCCTTCCTTGGACAGATCT

0057 The disclosed NOV1c polypeptide (SEQID NO: 6) encoded by SEQ ID NO: 5 has 236 amino acid residues and is presented in Table 1G using the one-letter amino acid code.

TABLE 1G Encoded NOV1c protein sequence. RSVQGRETAMEGEWPWQASLQLIGSGHQCGASLISNTWLLTAAHCFWKNKDPTQWIATFGATITPPAVKRNV (SEQ ID NO: 6) RKIILHENYHRETNENDIALWQLSTGVEFSNIVQRVYLPDSSIKLPPKTSWFWTGFGSIWDDGPIONTLRQA

RVETISTDWCNRKDWYDGLITPGMLCAGFMEGKIDACKGDSGGPLWYDNHDIWYIVCIWSWGQSCALPKKPG

WYTRWTKYRDWASKTGMLE

0.058 NOV1d AGA initiation codon at nucleotides 1-3 and ending at nucleotides 706-708. The start codon is in bold letters in 0059) A disclosed NOV1 d nucleic acid of 708 nucle- Table 1H. Since the start codon of NOV1d is not a traditional otides (also referred to as 168446547) encoding a novel initiation codon, and NOV1d has no termination codon, Airway Trypsin-Like Protease-like protein is shown in Table NOV1d could be a partial open reading frame that could be 1H. An open reading frame was identified beginning with an extended in the 5' and/or 3' direction(s).

TABLE 1H

NOV1d nucleotide sequence.

AGATCTGTCCAAGGAAGGGAAACAGCTATGGAAGGGGAATGGCCATGGCAGGCCAGCCTCCAGCTCATAGGG (SEQ ID NO:7)

TCACGCCATCAGTGTGGAGCCAGCCTCATCAGTAACACATGGCTGCTCACAGCAGCTCACTGCTTTTGGAAA

AATAAAGACCCAACTCAATGGATTGCTACTTTTGGTGCAACTATAACACCACCCGCAGTGAAACGAAATGTG

AGGAAAATTATTCTTCATGAGAATTACCATAGAGAAACAAATGAAAATGACATTGCTTTGGTTCAGCTCTCT

ACTGGAGTTGAGTTTTCAAATATAGTCCAGAGAGTTTGCCTCCCAGACTCATCTATAAAGTTGCCACCTAAA

ACAAGTGTGCTCGTCACAGGATTTGGATCCATTGTAGATGATGGACCTATACAAAATACACTTCGGCAAGCC

AGAGTGGAAACCATAAGCACTGATGTGTGTAACAGAAAGGATGTGTATGATGGCCTGATAACTCCAGGAATG

TTATGTGCTGGATTCATGGAAGGAAAAATAGATGCATGTAAGGGAGATTCTGGTGGACCTCTGGTTTATGAT

AATCATGACATCTCGTACATTGTAGGTATAGTAAGTTGCGGACAATCATGTGCACTTCCCAAAAAACCTGGA

GTCTACACCAGAGTAACTAAGTATCGAGATTGGATTGCCTCAAAGACTGGTATGCTCGAG US 2003/0170630 A1 Sep. 11, 2003

0060. The disclosed NOV1d polypeptide (SEQ ID NO: 8) encoded by SEQ ID NO: 7 has 236 amino acid residues and is presented in Table 1 I using the one-letter amino acid code.

TABLE 1.

Encoded NOV1d protein segnence.

RSVQGRETANEGEWPWQASLQLIGSCHQCGASLISNTWLLTAAHCFWKNKDPTOWIATFGATITPPAWKRNV (SEQ ID NO: 8)

RKIILHENYNRETNENDIALWQLSTGVEFSNIVQRVCLPDSSIKLPPKTSWLWTGFGSIWDDGPIONTLRQA

RVETISTDWONRKDWYDGLITPGMLCAGFMEGKIDACKGDSGGPLWYDNHDIWYIVOIWSWGQSCALPKKPG

WYTRWTKYRDWASKTGMLE

0061 Homologies to either of the above NOV1 proteins will be shared by the other NOV1 protein insofar as they are homologous to each other as shown below. Any reference to NOV1 is assumed to refer to all three of the NOV1 proteins in general, unless otherwise noted. 0062) The disclosed NOV1a polypeptide has homology to the amino acid sequences shown in the BLASTP data listed in Table 1.J.

TABLE 1.

BLAST results for NOV1a

Gene Indexf Length. Identity Positives Identifier Protein? Organism (aa) (%) (%) Expect gi17446381 refXP similar to DESC1 246 200/247 214/247 e-109 O68225.1 protein (H. Sapiens) (80%) (85%) (XM 068225) Homo Sapiens gi4758508 refNP airway trypsin- 418 180/390 251/390 OO4253.1 like protease (46%) (64%) (NM 004262) Homo Sapiens gi17437609 refXP similar to DESC1 protein 345 160/346 214/346 OO334.O.S (H. Sapiens) (46%) (61%) (XM 003340) Homo Sapiens gi7661558 refNP DESC1 protein 422 160/346 214/346 O54777.1 Homo Sapiens (46%) (61%) (NM 014058) gi17446387 refXP similar to airway 406 139/269 179f269 O68227.1 trypsin-like (51%) (65%) (XM 068227) protease (H. Sapiens)

0.063. The homology between these and other sequences is shown graphically in the ClustalW analysis shown in Table 1K. In the ClustalW alignment of the NOV1 proteins, as well as all other ClustalW analyses herein, the black outlined amino acid residues indicate regions of conserved Sequence (i.e., regions that may be required to preserve Structural or functional properties), whereas non-highlighted amino acid residues are leSS conserved and can potentially be altered to a much broader extent without altering protein Structure or function.

US 2003/0170630 A1 Sep. 11, 2003 12

Novic ------. . . . . l Now ------l gi17446381 ref LIFRYPSTD- - - - - SAEQIKKKEKALYQSLKKQLSLINKPSFRLT------78 gi4758508 ref 18 MKFQFTRNN- - - - -NGASMKSRIESVLRQMLNNS-GNLEIN-PSTEITSLTDOAAANWLI 170 5 gi17437609 ref LCRFASTE- DPETWDKIVOLVLHEKLQDAWGPPKWDPOSVKIKKINKETDSYLN 97 gi7661558 ref LCRFHSTE- - - - - DPETVDKVOLVLHEKLQDAVGPPKWDPHSVKIKKINKTETosYLN lif4 gi1744.6387 ref HFLWFDQKKEYYHGSFKILDPQINNNFGQSNTYQLKIDLRETTENLWATOPSPIWWAPRVA 98

Nowa SRCGIR-MTSSNMP------PASSST- - - - 10 NOW NOWic

NOV ------l gi1744.6381 ref -RCGIR-MTSSNMP------LPASSST - - - - 97 gi4758508 |ref 171 NECGAG- - - PDLT- - 84 15 gi17437609 |ref 98 HCCGTR--RSKILG- - lll. gil 7661558 ref 75 HCCGTR- -RSKILG------QS------88 gi1744.6387 ref 99 GSGGLPGMGSKDCPPSPHALPAWAMWKNGNWGPGSGAGEAPGLGAGPAWSPMSSSTGET 58

NOWa. 231 20 Now 45 Novic 45 Nowd 45 gil 1744.6381 ref 43 gi. 4758508 ref 84 EGS 228 25 gil 17437609 ref EGE WSACAH 55 gi7661558 ref 188 C 232 gil 74.46387|ref 59 26 NOWa. 232 291 30 NOW 46 105 NOWLc 46 OS NOW1 46 105 gil 74.46381 ref 44 203 gi4758508 ref 229 288 35 gil 7437609 ref 56 215 gi?661558 ref 233 292 gil 74.46387 ref 27 276

NoWa, 292 s 351 40 NOW 106 165 NOWLic 106 65 NOWild O6 1.65 gil 74.46381 ref 204 246 gi4758508 |ref 289 3.48 45 gi17437609|ref 26 275 gi7661558 ref 293 352 gil 1744.6387 ref 277 336

NOWa. 352 410 50 NOWib 56 224 NOWilc 1.66 224 NOWild 166 : 224 gi174463.8l ref 246 246 gi. 4758508 ref 349 4.08 55 gi17437609 |ref 276 : 3.35 gi7661558 ref 353 412 gil?4.46387 ref 337 396

NOWa. 411 60 NOW 225 Now 225 Now 225 gil 1744.6381 ref 246 gi4758508 |ref 4. OS 65 gil 17437609 |ref 336 gi7661558 |ref 43 gilt 446.387 ref 397 US 2003/0170630 A1 Sep. 11, 2003 13

0064. The presence of identifiable domains in NOV1, as 1K and all successive DOMAIN sequence alignments, fully well as all other NOVX proteins, was determined by conserved single residues are indicated by black shading or searches using software algorithms such as PROSITE, by the sign () and “strong” semi-conserved residues are DOMAIN, Blocks, Pfam, ProDomain, and Prints, and then indicated by grey Shading or by the sign (+). The "strong determining the Interpro number by crossing the domain group of conserved amino acid residues may be any one of match (or numbers) using the Interpro website (http:ww the following groups of amino acids: STA, NEOK, NHQK, w.ebi.ac.uk/ interpro). DOMAIN results for NOV1 as dis NDEO, OHRK, MILV, MILF, HY, FYW. closed in Tables 1L-1M, were collected from the Conserved 0065 Tables 1 L-M list the domain descriptions from Domain Database (CDD) with Reverse Position Specific DOMAIN analysis results against NOV1a. This indicates BLAST analyses. This BLAST analysis software samples that the NOV1a sequence has properties similar to those of domains found in the Smart and Pfam collections. For Table other proteins known to contain this domain.

TABLE 1.L. Domain Analysis of NOV1a gnl Smart SmartOOO 20 Tryp SPC. Trypsin-like serine protease; Many of these are synthesised as inactive precursor zymogens that are cleaved during limited proteolysis to generate their active forms . A few however, are active as single chain molecules, and others are inactive due to substitutions of the catalytic triad residues. (SEQ ID NO: 66) CD-Length = 230 residues, 100.0% aligned Score = 262 bits (669), Expect = 3e - 71 Query: 187RIVQGRETAMEGEWPWQASLQLIGSGHQCGASLISNTWLLTAAHCFWKNKDPTQWIATFG 246 | | | | | + | + | | | | | | | | | | | | | | | + || || || + + Sbjct: 1RIWGGSEANI-GSFPWOWSLQYRGGRHFCGGSLISPRWVLTAAHCVY-GSAPSSIRVRLG 58 Query: 247 AT---ITPPAVVRKIILHYRETNENDIALVQLSTGVEPSNIVQRVCLPDSSIKJL 303 -- ++ + i + ++ + + + + Sbjct: 59 SHDLSSGEETQTVKVSKVIVHPNYNPSTYDNDIALLKLSEPVTLSDTVRPICLPSSGYNV 118 Query: 304 PPKTSVFVTGFGSI-VDDGPIQNTLRQARVETISTDVCNRKDVYDGLITPGMLCAGFMEG 362 + ++ + +++ + | | | | | | | + | Sbjct : 119 PAGTTCTVSGWGRTSESSGSLPDTLQEVNVPIVSNATCRRAYSGGPAITDNMLCAGGLET 1.78 Query: 363 KIDACKGDSGGPLVYDNHDIWYIVCIVSWG-QSCALPKKPGVYTRVTKYRDWI 414 | + | | | | | | | | ++ | + || || | | | | | | | | | | + || || Sbjct: 179 GKDACQGDSGGPLVCNDP-RWVLVGIVSWGSYGCARPNKPGVYTRVSSYLDWI 230

0.066)

TABLE 1M

Domain Analysis of NOV1a gn1 Pfampfam00089, trypsin, Trypsin. Proteins recognized include all proteins in families S1, S2A S2B, S2C, and S5 in the classification of peptidases. Also included are proteins that are clearly members, but that lack peptidase activity, such as haptoglobin and protein z (PRTZ; ). (SEQ ID NO: 67) CD-Length = 217 residues, 100.0% aligned Score = 20.4 bits (518), Expect = 1e - 53

Query: 188IVQGRETAMEGEWPWQASLQLIGSGHQCGASLISNTWLLTAAHCFWKNKDPTOWIATFGA 247 | | | || | + | | | | | | + | | | | | | | | | | + || || || -- Sbjct: 1 IWGGREAQA-GSFPWOWSLQ-VSSGHFCGGSLISENWVLTAAHCVSGASSWRVWLGEHNL 58

Query: 248TITPPAV-KRNVRKIILHENYHRETNENDIALVQLSTGVEFSNIVQRVCLPDSSIKLPPK 306 +++ + + ++ + + + + i + Sbjct: 59 GTTEGTEQKFDVKKIIVHPNYNPDTN--DIALLKLKSPVTLGDTVRPICLPSASSDLPVG 116

Query: 307 TSVFVTGFGSIVDDGPIONTLRQARVETISTDVCNRKDVYDGLITPGMLCAGFMEGKIDA 366 + ++ + ++ + + + + + Sbjct : 117 TTCSVSGWGRTKNLGTSD-TLQEVVVPIVSRETCRS-AYGGTVTDTMICAGALGGK-DA 172 US 2003/0170630 A1 Sep. 11, 2003 14

TABLE 1M-continued Domain Analysis of NOV1a

Query: 367 CKGDSGGPLVYDNHDIWYIVCIVSWGQSCALPKKPGVYTRVTKYRDWI 414 | + | | | | | | | | + + || || || ++ Sbjct: 173CQGDSGGPLVCSDG---ELVGIVSWGYGCAVGNYPOVYTRVSRYLDWI 217

0067 Human airway trypsin-like protease (HAT) from methylcoumaryl-7-amide most efficiently and having a pH human Sputum is related to the prevention of fibrin deposi optimum of 7.5 with this substrate. The enzyme is strongly tion in the airway lumen by cleaving fibrinogen. In mucoid inhibited by aprotinin, diisopropylfluorophosphate, antipain, Sputum Samples from patients with chronic airway diseases, leupeptin, and Kunitz-type Soybean trypsin inhibitor, but the concentration of fibrinogen, as measured by ELISA, was inhibited only slightly by Bowman-Birk soybean trypsin in the range of 2-20 micrograms/ml, and trypsin-like activ inhibitor, benzamidine, and alpha 1-antitrypsin. Immunohis ity, as measured by Spectrofluorometry was in the range of tochemical Studies indicated that the enzyme is located 10-50 milliunits (mu)/ml. The trypsin-like activity of exclusively in the bronchiolar epithelial Clara cells and mucoid sputum was mainly due to HAT. As shown by colocalized with Surfactant. An immunoreactive protein with SDS-polyacrylamide gel electrophoresis, HAT cleaved a molecular mass of 28.5 kDa was also detected in airway fibrinogen, especially its alpha-chain, regardless of the con Secretions by Western blotting analyses, Suggesting that the centration of fibrinogen. Pretreatment of fibrinogen with 30-kDa protease in Clara cells is processed before or after its HAT resulted in a decrease or complete loss of its thrombin Secretion. Proteolytic cleavage of the hemagglutinin of induced clotting capacity, depending on the duration of influenza virus is a prerequisite for the virus to become pretreatment with HAT and the concentration of HAT HAT infectious. Tryptase Clara was shown to cleave the hemag may participate in the anticoagulation process within the glutinin and activate infectivity of influenza A virus in a airway, especially at the level of the mucous membrane, by dose-dependent way. These results Suggest that the enzyme cleaving fibrinogen transported from the blood Stream. is a possible activator of inactive viral fusion glycoprotein in PMID: 9864967, UI: 99082486 the respiratory tract and thus responsible for pneumopatho 0068 A novel trypsin-like protease has been purified to genicity of the virus. PMID: 1618859, UI: 92317085 homogeneity from the Sputum of patients with chronic airway diseases, by Sequential chromatographic procedures. 0070 The disclosed NOV1 nucleic acid of the invention The enzyme migrated on SDS-polyacrylamide gel electro encoding a Airway Trypsin-Like Protease-like protein phoresis to a position corresponding to a molecular weight includes the nucleic acid whose Sequence is provided in of 28 kDa under both reducing and non-reducing conditions, Table 1A, 1C, 1E, 1G or a fragment thereof. The invention and showed an apparent molecular weight of 27 kDa by gel also includes a mutant or variant nucleic acid any of whose filtration, indicating that it exists as a monomer. It had an bases may be changed from the corresponding base shown NH2-terminal sequence of Ile-Leu-Gly-Gly-Thr-Glu-Ala in Table 1A, 1C, 1E, or 1G while still encoding a protein that Glu-Glu-Gly-Ser-Trp-Pro-Trp-Gln-Val-Ser-Leu-Arg-Leu, maintains its Airway Trypsin-Like Protease-like activities which differed from that of any known protease. Studies and physiological functions, or a fragment of Such a nucleic with model peptide Substrates showed that the enzyme acid. The invention further includes nucleic acids whose preferentially cleaves the COOH-terminal side of arginine Sequences are complementary to those just described, residues at the P1 position of certain peptides, cleaving including nucleic acid fragments that are complementary to Boc-Phe-Ser-Arg4-methylcoumaryl-7-amide most effi any of the nucleic acids just described. The invention ciently and having an optimum pH of 8.6 with this substrate. additionally includes nucleic acids or nucleic acid frag The enzyme was strongly inhibited by diisopropyl fluoro ments, or complements thereto, whose Structures include phosphate, leupeptin, antipain, aprotinin, and Soybean chemical modifications. Such modifications include, by way trypsin inhibitor, but hardly inhibited by secretory leukocyte of nonlimiting example, modified bases, and nucleic acids protease inhibitor at 10 microM. An immunohistochemical whose Sugar phosphate backbones are modified or deriva Study indicated that the enzyme is located in the cells of the tized. These modifications are carried out at least in part to Submucosal Serous glands of the bronchi and trachea. These enhance the chemical Stability of the modified nucleic acid, results Suggest that the enzyme is Secreted from Submucosal Such that they may be used, for example, as antisense Serous glands onto the mucous membrane in patients with binding nucleic acids in therapeutic applications in a Subject. In the mutant or variant nucleic acids, and their comple chronic airway diseases. PMID: 9070615, UI: 97224034 ments, up to about 31% percent of the bases may be So 0069. A novel trypsin-like protease associated with rat changed. bronchiolar epithelial Clara cells, named Tryptase Clara, has been purified to homogeneity from rat lung by a Series of 0071. The disclosed NOV1 protein of the invention Standard chromatographic procedures. The enzyme has includes the Airway Trypsin-Like Protease-like protein apparent molecular masses of 180+/-16 kDa on gel filtration whose sequence is provided in Table 1B, 1D, 1F, or 1H. The and 30+/-1.5 kDa on Sodium dodecyl sulfate-polyacryla invention also includes a mutant or variant protein any of mide gel electrophoresis under reducing conditions. Its whose residues may be changed from the corresponding isoelectric point is pH 4.75. Studies with model peptide residue shown in Table 1B, 1D, 1F, or 1H while still Substrates showed that the enzyme preferentially recognizes encoding a protein that maintains its Airway Trypsin-Like a single arginine cleavage Site, cleaving Boc-Gln-Ala-Arg4 Protease-like activities and physiological functions, or a US 2003/0170630 A1 Sep. 11, 2003 functional fragment thereof. In the mutant or variant protein, lung cancer, or other pathologies or conditions. The NOV1 up to about 54% percent of the residues may be So changed. nucleic acid encoding the Airway Trypsin-Like Protease like protein of the invention, or fragments thereof, may 0.072 The invention further encompasses antibodies and further be useful in diagnostic applications, wherein the antibody fragments, Such as F, or (F), that bind immu presence or amount of the nucleic acid or the protein are to nospecifically to any of the proteins of the invention. be assessed. 0073. The above defined information for this invention Suggests that this Airway Trypsin-Like Protease-like protein 0076 NOV1 nucleic acids and polypeptides are further (NOV1) may function as a member of a “Airway Trypsin useful in the generation of antibodies that bind immuno Like Protease family'. Therefore, the NOV1 nucleic acids specifically to the novel NOV1 substances for use in thera and proteins identified here may be useful in potential peutic or diagnostic methods. These antibodies may be therapeutic applications implicated in (but not limited to) generated according to methods known in the art, using various pathologies and disorders as indicated below. The prediction from hydrophobicity charts, as described in the potential therapeutic applications for this invention include, "Anti-NOVX Antibodies' section below. The disclosed but are not limited to: protein therapeutic, Small molecule NOV1 proteins have multiple hydrophilic regions, each of drug target, antibody target (therapeutic, diagnostic, drug which can be used as an immunogen. In one embodiment, a targeting/cytotoxic antibody), diagnostic and/or prognostic contemplated NOV1 epitope is from about amino acids 40 marker, gene therapy (gene delivery/gene ablation), research to 225. In another embodiment, a NOV1 epitope is from tools, tissue regeneration in Vivo and in vitro of all tissues about amino acids 240 to 270. In other embodiments, a and cell types composing (but not limited to) those defined NOV1 epitope is from about amino acids 320 to 340, from here. about amino acids 360 to 370, and from about amino acids 390 to 410. These novel proteins can be used in assay 0074 The NOV1 nucleic acids and proteins of the inven Systems for functional analysis of various human disorders, tion are useful in potential therapeutic applications impli which will help in understanding of pathology of the disease cated in cancer including but not limited to various patholo and development of new drug targets for various disorders. gies and disorders as indicated below. For example, a cDNA encoding the Airway Trypsin-Like Protease-like protein 0.077). NOV2 (NOV1) may be useful in gene therapy, and the Airway 0078. A disclosed NOV2 nucleic acid of 1476 nucle Trypsin-Like Protease-like protein (NOV1) may be useful otides (also referred to as CG55782-01) encoding a novel when administered to a Subject in need thereof. P450-like protein is shown in Table 2A. An open reading 0075. By way of nonlimiting example, the compositions frame was identified beginning with an ATG initiation codon of the present invention will have efficacy for treatment of at nucleotides 1-3 and ending with a TAA codon at nucle patients Suffering from chronic airway diseases Such as otides 1474-1476. A The start and stop codons are in bold asthma and cystic fibrosis, allergies, emphysema, bronchitis, letters in Table 2A.

TABLE 2A

NOV2 nucleotide sequence (SEQ ID NO: 9).

AGGACAGCATTAAGCACAGCCATCTTACTCCTGCTCCTGGCTCTCGTCTGTCTGTCCTGACCCTAAGCTCA

AGAGATAAGGGAAAGCTGCCTCCGGGACCCAGACCCCTCTCAATCCTGGGAAACCTGCTGCTGCTTTGCTCC

CAAGACATGCTGACTTCTCTCACTAAGCTGAGCAAGGAGTATGGCTCCATGTACACAGTGCACCTGGGACCC

AGGCGGGTGGTGGTCCTCAGCGGGTACCAAGCTGTGAAGGAGGCCCTGGTGGACCAGGGAGAGGAGTTTAGT

GGCCGCGGTGACTACCCTGCCTTTTTCAACTTTACCAAGGGCAATGGCATCGCCTTCTCCAGTGGGGATCGA

TGGAAGGTCCTGAGACAGTTCTCTATCCAGATTCTACGGAATTTCGGGATGGGGAAGAGAAGCATTGAGGAG

CGAATCCTAGAGGAGGGCAGCTTCCTGCTGGCGGAGCTGCGGAAAACTGAAGGCGAGCCCTTTGACCCCACG

TTTGTGCTGAGTCGCTCAGTGTCCAACATTATCTGTTCCGTGCTCTCGGCAGCCGCTTTCGACTATGATGAT

GAGCGTCTGCTCACCATTATCCGCCTTATCAATGACAACTTCCAAATCATGAGCAGCCCCTGGGGCGAGTTG

TACGACATCTTCCCGAGCCTCCTGGACTGGGTGCCTGGGCCGCACCAACGCATCTTCCAGAACTTCAAGTGC

CTGAGAGACCTCATCGCCCACAGCGTCCACGACCACCAGGCCTCGCTAGACCCCAGATCTCCCCGGGACTTC

ATCCAGTGCTTCCTCACCAAGATGGCAGAGGAGAAGGAGGACCCACTGAGCCACTTCCACATGGATACCCTG US 2003/0170630 A1 Sep. 11, 2003 16

TABLE 2A-continued

NOV2 nucleotide sequence (SEQ ID NO: 9).

CTGATGACCACACATAACCTGCTCTTTGGCGGCACCAAGACGGTGAGCACCACGCTGCACCACGCCTTCCTG

GCACTCATGAAGTACCCAAAAGTTCAAGCCCGCGTGCAGGAGGAGATCGACCTCGTGGTGGGACGCGCGCGG

CTGCCGGCGCTGAAGGAACCGCGCGGCCATGCCTTACACAGACGCGGTGATCCACGAGGTGCACGCTTTGCA

GACATCATCCCCATGAACTTGCCGCACCGCGTCACTAGGGACACGGCCTTTCGCGGCTTCCTGATACCCAGG

GGCACCGATGTCATCACCCTCCTTAACACCGTCCACTACGACCCCAGCCAGTTCCTGACGCCCCAGGAGTTC

AACCCCGAGCATTTTTTGGATGCCAATCAGTCCTTCAAGAAGAGTCCAGCCTTCATGCCCTTCTCAGCTGGG

CGCCGTCTGTGCCTGGGAGAGTCGCTGGCGCGCATGGAGCTCTTTCTGTACCTCACCGCCATCCTGCAGAGC

TTTTCGCTGCAGCCGCTGGGTGCGCCCGAGGACATCGACCTGACCCCACTCAGCTCAGGTCTTGGCAATTTG

CCGCGGCCTTTCCAGCTGTGCCTGCGCCCGCGCAA

(0079) The disclosed NOV2 nucleic acid sequence, local- 0081. The disclosed NOV2 amino acid sequence has 484 identicalized to chromsometO 19, has 1419 of 1476gb:GENBANK bases (96%) of 491 amino acid residues (98%) identical to, and 486 of ID:HUMCYPIIFacc:JO2906.1 mRNA from Homo sapiens 491 amino acid residues (98%) similar to, the 491 amino (Human cytochrome P450IIF1 protein (CYP2F) mRNA, acid residue ptnr:SWISSPROT-ACC:P24903 protein from complete cds) (E=7.5e). Homo Sapiens (Human) (Cytochrome P450 2F1 (EC 0080 ANOV2 polypeptide (SEQID NO: 10) encoded by SEQ ID NO: 9 has 492 amino acid residues and is presented 1.14.14.1) (CYPIIF1)) (E=1.1e7). using the one-letter code in Table 2B. SignalP, Psort and/or Hydropathy results predict that NOV2 contains a signal 0082 NOV2 is expressed in at least lung. This informa peptide and is likely to be localized to the endoplasmic tion was derived by determining the tissue Sources of the reticulum (membrane) with a certainty of 0.8200. In other Sequences that were included in the invention including but embodiments, NOV2 may also be localized to the micro body (peroxisome) with a certainty of 0.2824, the plasma not limited to SeqCalling sources, Public EST sources, membrane with a certainty of 0.1900, or the endoplasmic Literature Sources, and/or RACE Sources. reticulum (lumen) with a certainty of 0.1000. The most likely cleavage site for NOV2 is between positions 24 and 0083) NOV2 also has homology to the amino acid 25: LSS-RD. sequences shown in the BLASTP data listed in Table 2C.

TABLE 2B

Encoded NOV2 protein sequence (SEQ ID NO: 10).

MDSISTAILLLLLALWCLLLTLSSRDKGKLPPGPRPLSILGNLLLLCSQDMLTSLTKLSKEYGSMYTVHLGP

RRVWWLSGYQAVKEALVDQGEEFSGTGDYPAFFNFTKGNGIAFSSGDRWKVLROFSIQILRNFGMGKRSIEE

RILEEGSFLLAELRKTEGEPFDPTFWLSRSVSNIICSWLFGSRFDYDDERLLTIIRLINDNFQIMSSPWGEL

YDIFPSLLDWWPGPHQRIFQNFKCLRDLIARSWHDHQASLDPRSPRDFIQCFLTKMAEEKEDPLSHFHMDTL

LMTTHNLLFGGTKTWSTTLRHAFLAMKYPKVOARVQEEIDLWWGRARLPALKDRAAMPYTDAVIHEWORFAI

DIIPMNLPHRVTRDTAFRGFLIPKGTDWITLLNTWHYDPSQFLTPQEFNPEHFLDANQSFKKSPAFMPFSAG

RRLCLGESLARMELFLYLTAILQSFSLQPLGAPEDIDLTPLSSGLGNLPRPFQLCLRPRX US 2003/0170630 A1 Sep. 11, 2003 17

TABLE 2C

BLAST results for NOV2 Gene Indexf Protein? Length Identity Positives Identifier Organism (aa) (%) (%) Expect gi1478.6875refXP cytochrome 495 460/495 460/495 O.O O12782.4 P450, (92%) (92%) (XM O12782) subfamily IIF, polypeptide 1 Homo Sapiens gi4503225 refNP cytochrome 491 460/495 462/495 O.O 000765.1 P450, (92%) (92%) (NM 000774) subfamily IIF, polypeptide 1: microsomal monooxygenase; xenobiotic monooxygenase; flavoprotein linked monooxygenase Homo Sapiens CYTOCHROME 491 397/491 438/491 O.O P450 2F3 (80%) (88%) (CYPIIF3 gi9506531 refNP Cytochrome 491 391f491 431f491 O.O 062176.1 P450, (79%) (87%) (NM 019303) subfamily IIF, polypeptide 1 Rattus norvegicus CYTOCHROME 491 385/491 427/491 O.O P450 2F2 (78%) (86%) (CYPIIF2) (NAPHTHALENE DEHYDROGENASE) (NAPHTHALENE HYDROXYLASE) (P450-NAH-2)

0084. The homology of these sequences is shown graphi cally in the ClustalW analysis shown in Table 2D. US 2003/0170630 A1 Sep. 11, 2003 18

5 Table 2D. ClustalW Analysis of NOV2 1) NOV2 (SEQ ID NO : 10) 2) gil478.6875 refixP o12782.4 (XM 012782) cytochrome P450, subfamily IIF, polypeptide 1 (Homo sapiens (SEQ ID NO: 68) 3) gi4503225 refNP 000765. 1 (NM 000774) cytochrome P450, subfamily IIF, 10 polypeptide 1; microsomal monooxygenase; xenobiotic monooxygenase; flavoprotein linked monooxygenase (Homo sapiens (SEQ ED NO: 69) 4) gi591580s spo18809c2F3_CAPHI CYTOCHROME P450 2F3 (CYPIIF3 (SEQ ID NO:70) 5) gig506531 ref NP_062176.1 (NM_019303) Cytochrome P450, subfamily IIF, polypeptide l (Rattus norvegicus (SEQ ID No.:71) 24

US 2003/0170630 A1 Sep. 11, 2003 20

NOW2 PRPFOLctiPRX 492 495 gi 4503225 ref 49 gi S915805 spo 49. 5 gis506531 ref 49.

gi461829 spp3 491 gi 6681.111 ref PRPFCLCSHIF 49. US 2003/0170630 A1 Sep. 11, 2003 21

0085 Table 2E lists the domain description from nucleic acid any of whose bases may be changed from the DOMAIN-analysis results against NOV2. This indicates corresponding base shown in Table 2A while Still encoding that the NOV2 sequence has properties similar to those of a protein that maintains its P450-like activities and physi other proteins known to contain this domain. ological functions, or a fragment of Such a nucleic acid. The

TABLE 2E Domain Analysis of NOV2 gn1 Pfampfamo OO67, p. 450, Cytochrome P450. Cytochrome P450s are involved in the oxidative degradation of various compounds. Particularly well known for their role in the degradation of environmental toxins and mutagens. Structure is mostly alpha and hinds a heme cofactor. (SEQ ID NO: 73) CD-Length 445 residues 100.0% aligned Score 453 bits (1165) Expect =1e - 128

Query: 31 PPGPRPLSILGNLLLLCSQDMLTSLTKLSKEYGSMYTVHLG PR 90 | | | | | | ++ + + + ++++ ++ + + Sbjct: 1 PPGPPPLPLIGNLLQLGRCPIH-SLTELRKKYGPWFTLYLG PR WWWWTGPEAWKEWLD 59

91 QGEEFSGRGDYPAFFNFThoNGIAFSSGDRWKV FGMGKRS-IEERILEE 149 Query: +++ | | | | | + + triot-- -- | | | | | | | + || || || Sbjct: 60 KGEEFAGRGDFPWFPWL--GYGILFSNGPRWRQ LRR--LLT FGMGKRSKLEERIQEE 115

Query: 15 O GSFLLAELRKTEGEPFDPTFWLSRSWSNIICSW RLLTII RLINDNFQIM. 209 | + || + | | | ++ + ++ + ++ ++ Sbjct: 116ARDLVERLRKEQGSPIDITELLAPAPLNWICSL EFLKLI DKLNELFFLW 175 Query: 210 SSPWGELYDIFPSLLDWWPGPHQRIFQNFKCLRDLIA PRSPRDFIQCFL 269 + ++ ++ + i + + -- -- ++ + Sbjct: 176S-PWGQLLDFFR----YLPGSHRKAFKAAKDLKDYLDKLIE ER R ETLEPGDP RDFLDSLL 230

Query: 27 OTKMAEEKED PLSHFHMDTILLM TENLLFCGTKTWSTT LALMKY PKVOARVQEEID 329 -- -- + +++++ Sbjct: 231 EAKREGG---SE LTDEELKATWL DLLFAGTDTTSST LSWALY LAKHPEVOAKLREEID 287

Query: 330LVWGRARLPALKDRAAMPYTDAVIHEWORFADIIPMN PHRWT DTAFRGFLIPKGTDWI 389 +51 | || ------Sbjct: 288EWIGRDRSPTYDDRANNPYLDAW KETLRLHPWWPLLLPRWAT DTEIDGYLIPKGTLWI 347

Query: 390 TLLNTVHYDPSQF LTPQEFNPEHF LDANQSFKKSPAFMPFSAG R RLCLGESLARMELFLY 449 ++ ++ | | | | | | | | | + | | | | | | | | | | | | | + Sbjct: PNPEEFDPERF LDENGKFKKSYAFLPFGAG RNCLGERLARMELFILF 4. Of

Query: 450 LTAILQSFSLQPLGAPEDIDLTPLSSGLGNLPRPFQLCL 488 | + | + + | + | + | Sbjct: 408 LATLLQRFELELVP-PGDIPLTPKPLGLPSKPPLYQL RA 4 45

0.086 The P450 gene Superfamily is a biologically invention further includes nucleic acids whose Sequences are diverse class of oxidase enzymes, members of the class are complementary to those just described, including nucleic found in all organisms. P450 proteins are clinically and acid fragments that are complementary to any of the nucleic toxicologically important in humans, they are the principal acids just described. The invention additionally includes enzymes in the metabolism of drugs and Xenobiotic com nucleic acids or nucleic acid fragments, or complements pounds, as well as in the Synthesis of cholesterol, Steroids thereto, whose structures include chemical modifications. and other lipids. Induction of some P450 genes can also be Such modifications include, by way of nonlimiting example, a risk factor for several types of cancer. This diversity of modified bases, and nucleic acids whose Sugar phosphate function is mirrored in the diversity of nucleotide and backbones are modified or derivatized. These modifications protein sequences; there are currently over 100 human P450 are carried out at least in part to enhance the chemical forms described. Allelic forms of many cytochrome P450 stability of the modified nucleic acid, such that they may be genes have been identified as causing quantitatively different used, for example, as antisense binding nucleic acids in rates of drug metabolism, and hence are important to con therapeutic applications in a Subject. In the mutant or variant sider in the development of Safe and effective human phar nucleic acids, and their complements, up to about 4% maceutical therapies. reviewed in E. Tanaka, J Clinical percent of the bases may be So changed. Pharmacy & Therapeutics 24:323-329, 1999). 0088. The disclosed NOV2 protein of the invention 0087. The disclosed NOV2 nucleic acid of the invention includes the P450-like protein whose sequence is provided encoding a P450-like protein includes the nucleic acid in Table 2B. The invention also includes a mutant or variant whose Sequence is provided in Table 2A or a fragment protein any of whose residues may be changed from the thereof. The invention also includes a mutant or variant corresponding residue shown in Table 2B while still encod US 2003/0170630 A1 Sep. 11, 2003 22 ing a protein that maintains its P450-like activities and 0093) NOV3a physiological functions, or a functional fragment thereof. In the mutant or variant protein, up to about 22% percent of the 0094) A disclosed NOV3a nucleic acid of 818 nucleotides residues may be So changed. (also referred to as CG557,71-01) encoding a novel Apoli 0089. The NOV2 nucleic acids and proteins of the inven poprotein A-I precursor-like protein is shown in Table 3A. tion are useful in potential therapeutic applications impli An open reading frame was identified beginning with a ATG cated in various pathologies and disorders. initiation codon at nucleotides 36-38 and ending with a TAA 0090 NOV2 nucleic acids and polypeptides are further useful in the generation of antibodies that bind immunospe codon at nucleotides 756-758. The start and stop codons are cifically to the novel substances of the invention for use in in bold letters, and the 5' and 3' untranslated regions are therapeutic or diagnostic methods. These antibodies may be underlined.

TABLE 3A

NOV3a Nucleotide Sequence (SEQ ID NO: 11)

TGGCTGAAGGCGGAGGTCCCCACGGCCCTTCAGGAGAAAGCTGCGGTGCTGACCTTGGCCGTGCTCATTC

CTGACGGGGAGCCAGGCTCGGCATTTCTGGCAGCAAGATGAACCCCCCAGAGCCCCTGGGATCGAGTAGAA

GGACCTGGCCACTGTGTACGTGGATGTGCTCAAAGACAGCGTGACCTCCACCTTCAGCAAGCTGCGCGAAC

AGCTCGGCCCTGTGACCCAGGAGTTCTGGGATAACCTGGAAAAGGAGACAGAGGGCCTGAGGCAGGAGATG

AGCAAGGATCTGGAGGAGGTGAAGGCCAAGGTGCAGCCCTACCTGGACGACTTCCAGAAGAAGTGGCAGGA

GGAGATGGAGCTCTACCGCCAGAAGGTGGAGCCGCTGCGCGCAGAGCTCCAAGAGGGCGCGCGCCAGAAGC

TGCACGAGCTGCAAGAGAAGCTGAGCCCACTGGGCGAGGAGATGCGCGACCGCGCGCGCGCCCATGTGGAC

GCGCTGCGCACGCATCTGGCCCCTGACAGCGACGAGCTGCGCCAGCGCTTGGCCGCGCGCCTTGAGGCTCT

CAAGGAGAACGGCGGCGCCAGACTGGCCGAGTATCACGCCAAGGCCACCGAGCATCTGAGCACGCTCAGCG

AGAAGGCCAAGCCCGCGCTCGAGGACCTCCGCCAAGGCCTGCTGCCCGTGCTGGAGAGCTTCAAGGTCAGC

TTCCTCAGCGCTCTCGAGGAGTACACTAAGAAGCTCA ACACCCAGGAGGCGCCCGCGCCGCCCCCCTTCC

CGGTGCTCAGAATAAACGTTTCCAAAGTGGGAAAAAA generated according to methods known in the art, using 0095 The disclosed NOV3a nucleic acid sequence maps prediction from hydrophobicity charts, as described in the to and has 640 of 643 bases (99%) identical "Anti-NOVX Antibodies' section below. The disclosed to a gb:GENBANK-ID:HSAPOAIBacc:X02162.1 mRNA NOV2 protein has multiple hydrophilic regions, each of from Homo Sapiens (Human mRNA for apolipoprotein AI which can be used as an immunogen. In one embodiment, a (apo AD) (E=9.5e). contemplated NOV2 epitope is from about amino acids 75 to 160. In another embodiment, a NOV2 epitope is from 0096] A disclosed NOV3a protein (SEQ ID NO: 12) about amino acids 170 to 270. In additional embodiments, encoded by SEQ ID NO: 11 has 240 amino acid residues, and from about amino acids 400 to 430. These novel proteins and is presented using the one-letter code in Table 3B. Signal can be used in assay Systems for functional analysis of P, Psort and/or Hydropathy results predict that NOV3a does various human disorders, which are useful in understanding have a Signal peptide, and is likely to be localized to of pathology of the disease and development of new drug extracellularly with a certainty of 0.3700. In other embodi targets for various disorders. ments NOV3a is also likely to be localized endoplasmic reticulum (membrane) with a certainty of 0.1000, to the 0091) NOV3 endoplasmic reticulum (lumen) with a certainty of 0.1000, 0092 NOV3 includes three novel Apollipoprotein A-I or to the microbody (peroxisome) with a certainty of 0.1000. precursor-like proteins disclosed below. The disclosed The most likely cleavage site for NOV3a is between posi sequences have been named NOV3a and NOV3b. tions 18 and 19, (SQA-RH). US 2003/0170630 A1 Sep. 11, 2003

TABLE 3B Encoded NOV3a protein sequence (SEQ ID NO:12). MKAAVLTAWLFLTGSQARHFWQODEPPQSPWDRVKDLATWYWDVLKDSWTSTFSKLREQLGPWTOEFWADN

LEKETEGLRQEMSKDLEEWKAKVOPYLDDFQKKWQEEMELYROKVEPLRAELQEGAROKLHELQEKLASPL

EEMRDRARAHVDALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQ

GLLPVLESPKVSFLSALEEYTKKLNTQ

0097. The disclosed NOV3a amino acid has 193 of 193 were then employed in PCR amplification based on the amino acid residues (100%) identical to, and 193 of 193 following pool of human cDNAS. adrenal gland, amino acid residues (100%) similar to, the 267 amino acid marrow, brain-amygdala, brain-cerebellum, brain-hip residue ptnr:SWISSPROT-ACC:P02647 protein from Homo pocampus, brain-Substantia nigra, brain-thalamus, Sapiens (Human) (Apolipoprotein A-I Precursor (APO-Al)) brain-whole, fetal brain, fetal kidney, fetal liver, fetal lung, (E-7.1e). heart, kidney, lymphoma-Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, Salivary gland, Skeletal 0098) NOV3 is expressed in at least Colon, Gall Bladder, muscle, Small intestine, Spinal cord, Spleen, Stomach, testis, Heart, Liver, Lung, Lymph node, Lymphoid tissue, Ovary, thyroid, trachea, uterus. Usually the resulting amplicons Placenta, Spleen, Testis, Thymus, and Whole Organism. were gel purified, cloned and Sequenced to high redundancy. This information was derived by determining the tissue The resulting Sequences from all clones were assembled Sources of the Sequences that were included in the invention with themselves, with other fragments in CuraGen Corpo including but not limited to SeqCalling sources, Public EST ration's database and with public ESTs. Fragments and ESTs Sources, Literature Sources, and/or RACE Sources. were included as components for an assembly when the extent of their identity with another component of the 0099 NOV3b assembly was at least 95% over 50 bp. In addition, sequence 0100. In NOV3b, the target sequence identified previ traces were evaluated manually and edited for corrections if ously, NOV3a, was Subjected to the exon linking process to appropriate. These procedures provide the Sequence confirm the sequence. PCR primers were designed by start reported below, which is designated NOV3b. This differs ing at the most upstream Sequence available, for the forward from the previously identified sequence NOV3a in having 2 primer, and at the most downstream Sequence available for internal Splice regions. the reverse primer. In each case, the Sequence was examined, walking inward from the respective termini toward the 0101) A disclosed NOV3b nucleic acid of 677 nucle coding Sequence, until a Suitable Sequence that is either otides (also referred to as Curagen Accession No. CG55771 unique or highly Selective was encountered, or, in the case 02) encoding a novel Apolipoprotein A-1 Precursor-like of the reverse primer, until the Stop codon was reached. Such protein is shown in Table 3C. An open reading frame was primers were designed based on in Silico predictions for the identified beginning with an ATG initiation codon at nucle full length cDNA, part (one or more exons) of the DNA or otides 1-3 and ending with a TGA codon at nucleotides protein Sequence of the target Sequence, or by translated 634–636. A putative untranslated region downstream from homology of the predicted exons to closely related human the termination codon are underlined in Table 3C. The start Sequences Sequences from other Species. These primers and Stop codons are in bold letters.

TABLE 3C NOV3b nucleotide sequence (SEQ ID NO: 13).

AGAAAGCTGCGGTGCTGACCTTGGCCGTGCTCTTCCTGACGGGTGGGAGCCAGGCTCGGCATTTCTGGCAG

CAAGATGAACCCCCCCAGAGCCCCTGGGATCGAGTGAAGGACCTGGCCACTGTGTACGTCGATGTGCTCAAA.

GACAGCGGCGACAGCGTGACCTCCACCTTCAGCAAGCTGCGCGAACAGCTCGGCCCTGTGACCCAGGAGTTC

TGGGATAACCTGGAAAAGGAGACAGAGGGCCTGAGGCAGGAGATGAGCAAGGATCTCGAGGACGTGAATGCC

AAGGTGCAGCCCTACCTGGACGACTTCCAGAAGAAGTGGCAGGAGGAGATGGAGCTCTACCGCCAGAAGGTG

GAGCCGCTGCGCGCAGAGCTCCAAQAGGGCGCGCGCCAGAAGCTGCACGAGCTGCGCCAGCGCTTGGCCGAG

CGCCTTGAGGCTCTCAAGGAGAACGGCGGCGCCAGACTGGCCGAGTACCACGCCAAGGCCACCGAGCATCTG

AGCACGCTCAGCGAGAAGGCCAAGCCCGCGCTCGAGGACCTCCGCCAAGGCCTGCTGCCCGTGCTGGAGAGC

TTCAAGGTCAGCTTCCTGAGCGCTCTCGAGGAGTACACTAAAAGCTCA ACACCCACTGAGGCGCCCCGCCGC

CGCCCCCCTTCCCGGTGCTCAGAATAAAC US 2003/0170630 A1 Sep. 11, 2003 24

0102) In a search of public sequence databases, the 0104. A search of sequence databases reveals that the NOV3b nucleic acid sequence, located on chromosome 11, NOV3b amino acid sequence has 106 of 161 amino acid has 491 of 676 bases (72%) identical to a gb:GENBANK ID:HSAPOAITlacc:X07496.1 mRNA from Homo sapiens residues (65%) identical to, and 121 of 161 amino acid (Human Tangier apoA-I gene) (E=3.1e). Public nucle residues (75%) similar to, the 267 amino acid residue otide databases include all GenBank databases and the ptnr:SWISSPROT-ACC:P02647 protein from Homo sapi GeneSeq patent database. ens (Human) (Apollipoprotein A-I Precursor (APO-AI)) 0103) The disclosed NOV3b polypeptide (SEQ ID NO: (E=5.6e'). Public amino acid databases include the Gen 14) encoded by SEQID NO: 13 has 211 amino acid residues Bank databases, SwissProt, PDB and PIR. and is presented in Table 3B using the one-letter amino acid code. Signal P, Psort and/or Hydropathy results predict that 0105 NOV3b is expressed in at least Liver, Spleen, NOV3b has a signal peptide and is likely to be localized extracellularly with a certainty of 0.3798. In other embodi Ovary. Expression information was derived from the tissue ments, NOV3b may also be localized to the microbody Sources of the Sequences that were included in the derivation (peroxisome) with a certainty of 0.1141, in the endoplasmic reticulum (membrane) with a certainty of 0.1000, or in the of the sequence of CuraGen. Acc. No. CG55771-02. endoplasmic reticulum (lumen) with a certainty of 0.1000. The most likely cleavage site for NOV3b is between posi 0106 NOV3a also has homology to the amino acid tions 19 and 20, SOA-RH. sequences shown in the BLASTP data listed in Table 3E.

TABLE 3D

Encoded NOV3b protein sequence (SEQ ID NO:14).

MKAAVLTLAWLFLTGGSQARHWQQDEPPQSPWDRVKDLATVYWDVLKDSGDSWTSTFSKLRAEQLGPWTOEF

WDNLEKETEGLRQEMSKDLEEWKAKVOPYLDDFQKKWQEEMELYROKVEPLRAELQEGAROKLHELRORLAE

RLEALKENCGARLAEYHAKATEHLSTLSEKAKPALEDLROGLLPVLESFKVSFLSALEEYTKKLNTQ

TABLE 3E

BLAST results for NOV3a

Gene Indexf Protein? Length. Identity Positives Identifier Organism (aa) (%) (%) Expect gi2119390pir55 proapo-A-I 267 212/267 213/267 4e–95 236 protein - human (79%) (79%) gi4557321 refNP O apolipoprote 267 213/267 213/267 4e–95 00030.1 in A-I (79%) (79%) (NM 000039) precursor Homo Sapiens gi178775gb|AAA517 proapolipoprotein 249 207/249 207/249 2e-91 47.1(M29068) Homo Sapiens (83%) (83%) gi399042sp|P15568 APOLIPOPROTE 267 202/267 207/267 2e-90 APA1 MACFA INA- (75%) (76% PRECURSOR (APO-AI) gi86614pirA26529 apolipoprote 267 202/267 207/267 2e-90 in A-I (75%) (76%) precursor - crab-eating macaque

0107 The homology of these sequences is shown graphi cally in the ClustalW analysis shown in Table 3F.

US 2003/0170630 A1 Sep. 11, 2003 26

gi 399042 sppl HEHEKLSP------GEERDRARAWDARTHILA------YS 9 gi 86614 pirA HEHEKLS------SEERDRARAHvARTHILA YS 9

NOW3a DELROF-LAARLEALKENGGARLAEYHAKA 221 5 NOW3 ---RQR-LAERLEALKENGGARL Exa 92 gi 2119390 pir DELROR-LAARLEALKENGGARLAEYHAF 248 gi4557321 ref DELRQF-LAARLEALKENGGARLAEYHA 248 gil 78775gbAA DELROR-LARLEALKENGGARLAEYHA 230 gi 399042 sppl DELRQR-LAARLEALKENGGARLAEYHAK r x 248 10 gi 86614 pir A 248

NOW3a NOW3 gi2119390 pir SFSALEEKK 15 gi455732iref F: SFSALEKK gi178775gbAA FR WSFLSALEETKK gi 399042 spPl SFSAEEYKK gi 86614 pir A V. SFISAEEK US 2003/0170630 A1 Sep. 11, 2003 27

0108 Table 3G lists the domain description from to it. In a healthy control population, the frequency of DOMAIN analysis results against NOV3a. This indicates heterozygotes was about 5%. Among hypertriglyceridemic that the NOV3a sequence has properties similar to those of subjects, 34% were heterozygotes and about 6% were other proteins known to contain this domain. homozygotes for the variant. The primary gene transcript

TABLE 3G Domain Analysis of NOV3a gnl Pfampfam01442, Apollipoprotein, Apollipoprotein A1/A4/E family. These proteins contain several 22 residue repeats which form a pair of alpha helices. This family includes: Apollipoprotein A-I, Apollipoprotein A-IV, and Apollipoprotein E. (SEQ ID NO : 79) CD-Length=262 residues, 95.0% aligned Score=182 bits (461), Expect=2e-47 Query: 15 GSQARHFWQQDEPPQSPWDRVKDLATWYWDWLKDS------49 | | | | | | | | | | | | | | + || + + Sbjct: 14 GCQAR-FWQAD

Query: 50 - -WTSTFSKLREQLGPWTOEFWDNLEKETEGLRQEMSKDLEEWKAKVOPYLDDFQKKWOE 107 + ++ || | | | | | | | | | | | + | + + + ++ i + + + Sbjct: 72 DELKSYAEELQEQLGPWAQEFWARLSKETQALRAELGKDLEDWRNRL.APYRDELQQMLGQ 131

Query: 108 EMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRDRARAHVDALRTHLAPYSDEL 167 + + ++ + +++ ++ + || | | | | | | ++ Sbjct: 132 NIEEYRQKLEPLARELRKRLRRDAEELQKRLAPYAEELRERAERNVDALRTRLGPYVEQL 191

Query: 168 RQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQGLLPVLESFKVSFL 227

Sbjct: 192 RQKLTQRLEELRERAQPYAEEYKEQLEEQLSELREKLAPLREDLQEWLNPVLEQLKTQAE 251

Query: 228 SALEEYTKKLN 238 + Sbjct: 252 AFQEELKSWLE 262

0109) Apollipoprotein A-I is the major apoprotein of HDL encodes a preproapoA-I containing 24 amino acids on the and is a relatively abundant plasma protein with a concen amino terminus of the mature plasma apoA-I (Law et al., tration of 1.0-1.5 mg/ml. It is a single polypeptide chain with 1983). 243 amino acid residues of known primary amino acid sequence (Brewer et al., 1978). ApoA-I is a cofactor for 0112 Law et al. (1984) assigned the APOA1 gene to 11p LCAT (245900), which is responsible for the formation of 11-q13 by filter hybridization analysis of human-mouse cell most cholesteryl esters, in plasma. ApoA-I also promotes hybrid DNAS. The genes for apoA-I and apoC-III are on efflux of cholesterol from cells. The liver and Small intestine in the mouse. Mouse homologs of other are the Sites of Synthesis of apoA-I. The primary translation genes on human 11p (insulin, beta-globin, LDHA, HRAS) product of the APOAI gene contains both a pre and a pro are situated on mouse . Using a cDNA probe Segment, and posttranslational processing of apoA-I may be to detect apoA-I Structural gene Sequences in human-Chi involved in the formation of the functional plasma apoA-I nese hamster cell hybrids, Cheung et al. (1984) assigned the isoproteins. Dayhoff (1976) pointed to Sequence homologies gene to the region 11q13-qter. Since other information had of A-I, A-II, C-I, and C-III. Suggested 11p11-q13 as the location, the SRO becomes 0110 Yui et al. (1988) found that apoA-I is identical to 11q13. It is noteworthy that in the mouse and in man, serum PGI(2) stabilizing factor (PSF). PGI(2), or prostacy APOA1 and PGBD (called Ups in the mouse) are syntenic. clin, is Synthesized by the vascular endothelium and Smooth Both are on chromosome 11 in man and chromosome 9 in muscle, and functions as a potent vasodilator and inhibitor the mouse. Bruns et al. (1984) localized the genes for apoA-I of platelet aggregation. The stabilization of PGI(2) by HDL and apoC-III (previously shown to be in a 3-kb segment of and apoA-I may be an important protective action, against the genome; Breslow et al., 1982; Shoulders et al., 1983) to the accumulation of platelet thrombi at Sites of Vascular chromosome 11 by Southern blot analysis of DNA from damage. The beneficial effects of HDL in the prevention of human-rodent cell hybrids. Because in the mouse apoA-I is coronary artery disease may be partly explained by this on chromosome 9 and apoA-II is on (Lusis effect. A-I(Milano) and A-I(Marburg) give rise to HDL et al., 1983), the gene for human apoA-II is probably not on deficiency. Other HDL deficiency States are Tangier disease chromosome 11. Indeed, APOA2 (107670) is on human (HDLDT1; 205400), LCAT deficiency (245900), and fish chromosome 1. On the basis of data provided by Pearson eye disease (136120). (1987), the APOA1 locus was assigned to 11q23-qter by 0111 Breslow et al. (1982) isolated and characterized HGM9. This would place APOC3 and APOA4 in the same cDNA clones for human apoA-I. Rees et al. (1983) studied region. Because the XmnI genotype at the APOA1 locus was the cloned APOAI gene and a DNA polymorphism 3-prime heterozygous in a boy with partial deletion of the long arm US 2003/0170630 A1 Sep. 11, 2003 28 of chromosome 11, del(11)(cq23.3-qter), Arinami et al. It was also found in 8 of 12 index cases of kindreds with (1990) localized the gene to 11q23 by excluding the region familial hypoalphalipoproteinemia. Among all patients with 11q24-qter. coronary artery disease, 58% had HDL cholesterol levels below the 10th percentile; however, this frequency increased 0113 Haddad et al. (1986) found that in the rat, as in man, to 73% when patients with the 3.3-kb band were considered. the APOA1, APOC3 and APOA4 genes are closely linked. Borecki et al. (1986) studied 16 kindreds ascertained Indeed, their direction of transcription, size, relative location through probands clinically determined to have primary and intron-exon organization were found to be remarkably hypoalphalipoproteinemia characterized by low HDL cho Similar to those of the corresponding human genes. lesterol but otherwise normal blood lipids. They concluded 0114. There are 8 well-characterized apolipoproteins: that these families provided clear evidence for a major apoA-I, apoA-II, apoA-IV, apoB, apoC-I, apoC-II, apoC-III, gene. Moll et al. (1986) measured apoA-I levels in families and apoe. The APOA1 and APOC3 genes are oriented ascertained through cases of hypertension or early coronary artery disease. They concluded that the findings Supported a foot-to-foot, i.e., the 3-prime end of APOA1 is followed major effect of a Single genetic locus on the quantitative after an interval of about 2.5 kb by the 3-prime end of variation of plasma apoA-I in a Sample of pedigrees APOC3 (Karathanasis et al., 1983). enriched for individuals at risk for coronary artery disease. 0115) In 4 generations of a Norwegian kindred, Using a radioimmunoassay, Moll et al. (1989) measured Schamaun et al. (1983) found, by 2-D electrophoresis, a plasma apoA-I levels in 1,880 individuals from 283 pedi variant of apolipoprotein A-I. Codominant inheritance was grees. Complex Segregation analysis Suggested heteroge displayed. One homozygote was identified. There was no neous etiologies for the individual differences in adjusted obvious cardiovascular disease, even in the homozygote. apoA-I levels observed. The authors concluded that envi Karathanasis et al. (1983) found that a group of severely ronmental factors and polygenic loci account for 32 and hypertriglyceridemic patients with types IV and V hyperli 65%, respectively, of the adjusted variation in a subset of poproteinemia had an increased frequency of an RFLP 126 families. In the other 157 pedigrees, Segregation analy associated with the apoA-I gene. Rees et al. (1985) found a sis Strongly Supported the presence of a single locus Strong correlation between hypertriglyceridemia and a DNA accounting for 27% of the adjusted variation. In Japanese, Sequence polymorphism located in or near the 3-prime Rees et al. (1986) found association of triglyceridemia with noncoding region of APOC3 and revealed by digestion of a different haplotype of the A-I/C-III region than that found human DNA with the restriction enzyme Sst-1 and hybrid in Caucasians. ization with an APOA1 c)NA probe. In 74 hypertriglyceri demic Caucasians, 3 were homozygous and 23 were het 0117 Ferns et al. (1986) found a common allele of the erozygous for the polymorphism, giving a gene frequency of APOA2 locus which showed a weak association with hyper 0.19; none of 52 normotriglyceridemics had the polymor triglyceridemia; in contrast, an uncommon allele of the phism, although it was frequent in Africans, Chinese, Japa APOA1-APOC3-APOA4 gene cluster demonstrated a stron nese, and Asian Indians. No differences in high density ger relationship with hypertriglyceridemia. Ferns et al. lipoprotein or in apolipoproteins A-I and C-III phenotypes (1986) found higher levels of serum triglycerides with were found in perSons with or without the polymorphism. possession of both disease-related alleles than with either Ferns et al. (1985) found an uncommon allelic variant Singly. Fager et al. (1981) found an inverse relationship (called S2) of the apoA-I/C-III gene cluster in 10 of 48 between Serum apoA-II and a risk of myocardial infarction. postmyocardial infarction patients (21%). In 47 control Hayden et al. (1987) found an association between certain Subjects it was present in only 2 and in none of those who RFLPs and familial combined hyperlipidemia (FCH; were normotriglyceridemic. (The S2 allele, a DNA poly 144250). APOA1 is linked to THY1 (188230) at a distance morphism, is characterized by SSt restriction fragments of of about 1 cM (Gatti, 1987); thus, the more distal location of 5.7 and 3.2 kb length, whereas the common S1 allele this apolipoprotein cluster as Suggested by other evidence produces fragments of 5.7 and 4.2 kb length.) Ferns et al. may be true. In certain patients with premature atheroscle (1985) found no difference in the distribution of alleles in the rosis, Karathanasis et al. (1987) demonstrated a DNA inver highly polymorphic region of 11p near the insulin gene. sion containing portions of the 3-prime ends of the APOA1 Kessling et al. (1985) failed to find an association between and APOC3 genes, including the DNA region between these any allele of several RFLPs studied and hypertriglyceri genes. The breakpoints of this DNA inversion were found to demia. Buraczynska et al. (1985) found association between be located between the fourth exon of the APOA1 gene and an EcoRI polymorphism of the APOA1 gene and noninsu the first intron of the APOC3 gene; thus, the inversion results lin-dependent diabetes mellitus. in reciprocal fusion of the 2 gene transcriptional units. The absence of transcripts with correct mRNA sequences causes 0116 Familial hypoalphalipoproteinemia, by far the most deficiency of both apolipoproteins in the plasma of these common of the forms of primary depression of HDL cho patients, leading to atherosclerosis. Bojanovski et al. (1987) lesterol, has been thought to be an autosomal dominant. It is found that both proapolipoprotein A-I and the mature protein asSociated with premature coronary artery disease and Stroke are metabolized abnormally rapidly in Tangier disease. (Vergani and Bettale, 1981; Third et al., 1984; Daniels et al., Thompson et al. (1988) investigated the seeming paradox 1982). Using a PstI polymorphism at the 3-prime end of the that 2 RFLPs at the A-I/C-III cluster were in strong linkage APOA1 gene, Ordovas et al. (1986) found the rarer allele disequilibrium while a third variant, located between the 2 (3.3-kb band) in 4.1% of 123 randomly selected control other markers, appeared to be in linkage equilibrium with subjects and 3.3% of 30 subjects with no angiographic these 2 “outside markers. Thompson et al. (1988) showed evidence of coronary artery disease. In contrast, among 88 that, for the gene frequencies encountered, very large Sample patients who had severe coronary artery disease before age sizes would be required to demonstrate negative (i.e., repul 60, as documented by angiography, the frequency was 32%. Sion-phase) linkage disequilibrium. Such numbers are usu US 2003/0170630 A1 Sep. 11, 2003 29 ally difficult to attain in human studies. Therefore, failure to uptake and degradation of the lipoprotein particle. In Ste demonstrate linkage disequilibrium by conventional meth roidogenic cells of rodents, the Selective uptake pathway ods does not necessarily imply its absence. accounts for 90% or more of the cholesterol destined for Steroid production or cholesteryl ester accumulation. To test 0118 Kessling et al. (1988) studied the high density the importance of the 3 major HDL proteins in determining lipoprotein-cholesterol concentrations along with restriction cholesteryl ester accumulation in Steroidogenic cells of the fragment length polymorphisms in the APOA2 and APOA1 adrenal gland, ovary, and testis, Plump et al. (1996) used APOC3-APOA4 gene cluster in 109 men selected from a mice which had been rendered deficient in apoA-I, apoA-II, random sample of 1,910 men aged 45 to 59 years. They or apoE by gene targeting in embryonic Stem cells. ApoE found no significant difference in allelic frequencies at either and apoA-II deficiencies were found to have only modest locus between the groups of individuals with high and low effects on cholesteryl ester accumulation. In contrast, apoA-I HDL-cholesterol levels. They did find an association deficiency caused an almost complete failure to accumulate between a Pst RFLP associated with apoA-I and genetic cholesteryl ester in steroidogenic cells. Plump et al. (1996) variation determining the plasma concentration of apoA-I. interpreted these results as indicating that apoA-I is essential No significant association was found between alleles for the for the selective uptake of HDL-cholesteryl esters. They apoA-II MspI RFLP and apoA-II or HDL concentrations. Stated that the lack of apoA-I has a major impact on adrenal ApoA-I has 243 amino acids of known Sequence. It is gland physiology, causing diminished basal corticosteroid Secreted into the bloodstream by the liver and intestine as a production, a blunted Steroidogenic response to StreSS, and protein that is rapidly converted to mature apoA-I. Two increased expression of compensatory pathways to provide major isoforms of mature, normal A-I, which arise by cholesterol Substrate for Steroid production. deamidation, can be separated in human Serum. Antonarakis et al. (1988) studied DNA polymorphism of a 61-kb segment 0122) In studies of 3 restriction enzyme polymorphisms of 11q that contains the APOA1, APOC3, and APOA4 genes in the AI-CII-AIV gene cluster, Dallinga-Thie et al. (1997) within a 15-kb stretch. Eleven RFLPs located within the analyzed haplotypes and showed an association with Severe 61-kb Segment were used by haplotype analysis. Consider hyperlipidemia in Subjects with FCH. Furthermore, non able linkage disequilibrium was found. Several haplotypes parametric Sib pair linkage analysis revealed significant had arisen by recombination and the rate of recombination linkage between these markers in the gene cluster and the within the gene cluster was estimated to be at least 4 times FCH phenotype. The findings confirmed that the AI-CIII greater than that expected based on uniform recombination. AIV gene cluster contributes to the FCH phenotype, but this Taken individually, the polymorphism information content contribution is genetically complex. An epistatic interaction (PIC) of each of the 11 polymorphisms ranged from 0.053 between different haplotypes of the gene cluster was dem to 0.375, while that of their haplotypes ranged between onstrated. They concluded that 2 different susceptibility loci 0.858 and 0.862. (The PIC value, which was introduced by exist in the gene cluster. Botstein et al. (1980) in their classic paper on the use of RFLPs: as linkage markers, represents the Sum of the 0123 Naganawa et al. (1997) reported 2 haplotypes due frequency of each possible mating multiplied by the prob to 5 polymorphisms in the intestinal enhancer region of the ability that an offspring will be informative.) By genetic APOA1 gene in endoscopic biopsy Samples from healthy linkage analysis using RFLPs in the APOA1/C3/C4 gene Volunteers. The mutant haplotype had a population fre quency of 0.44; frequency of wildtype was 0.53. APOA1 cluster, mRNA levels were 49% lower in mutant haplotype homozy 0119) Kastelein et al. (1990) showed that the mutation gotes than in wildtype homozygotes, while APOA1 Synthe causing familial hypoalphalipoproteinemia (familial HDL sis was 37% lower than wildtype in individuals homozygous deficiency) in a family of Spanish descent was not located in for the mutant allele. Heterozygotes had 28% and 41% this cluster. reductions of mRNA levels and APOA1 synthesis, respec tively, as compared to wildtype homozygotes. Expression 0120 Smith et al. (1992) investigated the common G/A Studies in Caco-2 cells showed a 46% decrease in transcrip polymorphism in the APOA1 gene promoter at a position 76 tional activity in cells containing the mutant constructs, and bp upstream of the transcriptional start site (-76). Of 54 binding of Caco-2 nuclear proteins in mutant, but not Subjects whose apoA-I production rates had been deter wildtype, Sequences. Naganawa et al. (1997) concluded that mined by turnover Studies, 35 were homozygous for a intestinal APOA1 transcription and protein Synthesis were guanosine at this locus and 19 were heterozygous for a guanosine and adenosine (G/A). The apoA-I production reduced in the presence of common mutations which rates were significantly lower (by 11%) in the G/Aheterozy induced nuclear protein binding. gotes than in the G homozygotes (P=0.025). However, no 0124 Genschel et al. (1998) counted 4 naturally occur effect on HDL cholesterol or apoA-I levels were noted. ring mutant forms of apoA-I that were known at that time to Differential gene expression of the 2 alleles was tested by result in amyloidosis. The most important feature of all linking each of the alleles to the reporter gene chloram variants was the very similar formation of N-terminal frag phenicol acetyltransferase and determining relative pro ments found in the amyloid deposits. They Summarized the moter efficiencies after transfection into the human HepG2 Specific features of all known amyloidogenic variants of hepatoma cell line. The A allele, as well as the Gallele, APOA1 and speculated about the metabolic pathway expressed only 68%. involved. 0121. In addition to its ability to remove cholesterol from 0.125 To determine the frequency of de novo hypoalpha cells, HDL also delivers cholesterol to cells through a poorly lipoproteinemia in the general population due to mutation of defined proceSS in which cholesteryl esters are Selectively the APOA1 gene, Yamakawa-Kobayashi et al. (1999) ana transferred from HDL particles into the cell without the lyzed sequence variations in the APOA1 gene in 67 children US 2003/0170630 A1 Sep. 11, 2003 30 with a low high-density lipoprotein (HDL) cholesterol level. nostic marker, wherein the presence or amount of the nucleic These children were selected from 1,254 school children acid or the protein are to be assessed, as well as potential through a school Survey. Four different mutations with therapeutic applications Such as the following: (i) a protein deleterious potentia, 3 frameshifts and I splice Site mutation, therapeutic, (ii) a Small molecule drug target, (iii) an anti were identified in 4 Subjects. The plasma apoA-I levels of body target (therapeutic, diagnostic, drug targeting/cyto the 4 children with these mutations were reduced to approxi toxic antibody), (iv) a nucleic acid useful in gene therapy mately half of the normal levels and were below the first (gene delivery/gene ablation), and (v) a composition pro percentile of the general population distribution (80 mg/dl). moting tissue regeneration in vitro and in Vivo. The frequency of hypoalphalipoproteinemia due to a mutant 0130. The NOV3 nucleic acids and proteins of the inven APOA1 gene was estimated at 6% in subjects with low HLD tion are useful in potential diagnostic and therapeutic appli cholesterol levels and 0.3% in the Japanese population cations implicated in various diseases and disorders generally. described below. For example, the compositions of the 0.126 High density lipoprotein deficiency is also caused present invention will have efficacy for treatment of patients by mutations in the ABC1 gene (600046), which lead to Suffering from coronary artery disease, Stroke, hypertriglyc reductions in cellular cholesterol efflux. The disorder is eridemia, hypoalphalipoproteinemia, hyperlipidemia, clinically and biochemically Severe in the case of the reces Tangier disease, LCAT deficiency, fish-eye disease, non Sively inherited Tangier disease, whereas it is milder in the insulin-dependent diabetes mellitus, hypertension, myocar dominantly inherited type 2 familial high density lipoprotein dial infarction, atherOSclerosis, and/or other pathologies. deficiency (604091). 0131 NOV3 nucleic acids and polypeptides are further 0127. The disclosed NOV3 nucleic acid of the invention useful in the generation of antibodies that bind immunospe encoding a Apollipoprotein A-I precursor-like protein cifically to the novel substances of the invention for use in includes the nucleic acid whose Sequence is provided in therapeutic or diagnostic methods. These antibodies may be Table 3A, 3C, or a fragment thereof. The invention also generated according to methods known in the art, using includes a mutant or variant nucleic acid any of whose bases prediction from hydrophobicity charts, as described in the may be changed from the corresponding base shown in “Anti-NOVX Antibodies' section below. For example the Table 3A, or 3C while still encoding a protein that maintains disclosed NOV3 protein have multiple hydrophilic regions, its Apollipoprotein A-I precursor-like activities and physi each of which can be used as an immunogen. In one ological functions, or a fragment of Such a nucleic acid. The embodiment, contemplated NOV3 epitope is from about invention further includes nucleic acids whose Sequences are amino acids 20 to 40. In another embodiment, a NOV3 complementary to those just described, including nucleic epitope is from about amino acids 50 to 220. In additional acid fragments that are complementary to any of the nucleic embodiments, NOV3 epitopes are from about amino acids acids just described. The invention additionally includes 240 to 260. This novel protein also has value in development nucleic acids or nucleic acid fragments, or complements of powerful assay System for functional analysis of various thereto, whose structures include chemical modifications. human disorders, which will help in understanding of Such modifications include, by way of nonlimiting example, pathology of the disease and development of new drug modified bases, and nucleic acids whose Sugar phosphate targets for various disorders. backbones are modified or derivatized. These modifications 0132) NOV4 are carried out at least in part to enhance the chemical stability of the modified nucleic acid, such that they may be 0133) NOV4 includes three novel HSP90 co-chaperone used, for example, as antisense binding nucleic acids in like proteins disclosed below. The disclosed Sequences have therapeutic applications in a Subject. In the mutant or variant been named NOV4a, NOV4b, and NOV4c. nucleic acids, and their complements, up to about 1% 0134) NOV4a percent of the bases may be So changed. 0135) A disclosed NOV4a nucleic acid of 513 nucleotides 0128. The disclosed NOV3 protein of the invention (designated CuraGen Acc. No. CG55700-01) encoding a includes the Apollipoprotein A-I precursor-like protein novel HSP90 co-chaperone-like protein is shown in Table whose sequence is provided in Table 3B, or 3D. The 4A. An open reading frame was identified beginning with an invention also includes a mutant or variant protein any of ATG initiation codon at nucleotides 54-56 and ending with whose residues may be changed from the corresponding a TAA codon at nucleotides 444-446. A putative untranslated residue shown in Table 3B, or 3D while still encoding a region downstream from the termination codon is underlined protein that maintains its Apollipoprotein A-I precursor-like in Table 4A, and the Start and Stop codons are in bold letters. activities and physiological functions, or a functional frag ment thereof. In the mutant or variant protein, up to about 25 TABLE 4A percent of the residues may be So changed. NOV4 a Nucleotide Sequence (SEQ ID NO: 15) 0129. The protein similarity information, expression pat tern, and map location for the Apollipoprotein A-I precursor CATTTGCTGTCTCCTCTGCTCACCAGTTCGCCCGTCCCCCTGCCCCGTTC like protein and nucleic acid (NOV3) disclosed herein ACAAGCAGCCTGCTTCTGCAAAGTGGTACGATCGAAGGGACTATGTCTT suggest that NOV3 may have important structural and/or physiological functions characteristic of the citron kinase CATTGAATTTTGTGTTGAAGACAGTAAGGATGTTAATGTAAATTTTGAAA like family. Therefore, the NOV3 nucleic acids and proteins of the invention are useful in potential diagnostic and AATCCAAACTTACATTCAGTTGTCTCGGAGGAAGTGATAATTTTAAGCAT therapeutic applications. These include Serving as a specific TTAAATGAAATTGATCTTTTTCACTGTATTGATCCAAATGATTCCAAGCA or Selective nucleic acid or protein diagnostic and/or prog US 2003/0170630 A1 Sep. 11, 2003

Umbilical Vein, Urinary Bladder, Uterus, Vein, Vulva, TABLE 4A-continued Whole Organism. This information was derived by deter mining the tissue Sources of the Sequences that were NOV4 a Nucleotide Sequence (SEQ ID NO:15) included in the invention including but not limited to Seq Calling Sources, Public EST Sources, Literature Sources, TAAAAGAACGGACAGATCAATTTTATGTTGTTTACGAAAAGGAGAATCTG and/or RACE Sources. GCCAGTCATGGCCAAGGTTAACAAAAGAAAGGGCAAAGATGATGAACAAC 0140) NOV4b ATGGGTGGTGATGAGGATGTAGATTTACCAGAAGTAGATGGAGCAGATGA 0.141. In the present invention, the target Sequence iden tified previously, NOV4a, was subjected to the exon linking TGATTCACAAGACAGTGATGATGAAAAAATGCCAGATCTGGAGTAAGGAA process to confirm the Sequence. PCR primers were TATTGTCATCACCTGGATTTTGAGAAAGAAAAATAACTTCTCTGCAAGAT designed by Starting at the most upstream Sequence avail able, for the forward primer, and at the most downstream TTCATAATTGAGA Sequence available for the reverse primer. In each case, the Sequence was examined, walking inward from the respective termini toward the coding Sequence, until a Suitable 0.136 The nucleic acid sequence of 354 of 388 bases Sequence that is either unique or highly Selective was (91%) identical tO gb:GENBANK encountered, or, in the case of the reverse primer, until the ID:HUMPRAacc:L24804.1 mRNA from Homo sapiens Stop codon was reached. Such primers were designed based (Human (p23) mRNA, complete cds) (E=3.3e). on in silico predictions for the full length cDNA, part (one 0137 ANOV4a polypeptide (SEQ ID NO: 16) encoded or more exons) of the DNA or protein sequence of the target by SEQ ID NO: 15 is 130 amino acid residues and is Sequence, or by translated homology of the predicted exons presented using the one letter code in Table 4B. Signal P. to closely related human Sequences Sequences from other Psort and/or Hydropathy results predict that NOV4a has no species. These primers were then employed in PCR ampli Signal peptide and is likely to be localized at the nucleus fication based on the following pool of human cDNAS: with a certainty of 0.4600. In other embodiments, NOV4a adrenal gland, bone marrow, brain-amygdala, brain-cer may also be localized to the microbody (peroxisome) with a ebellum, brain-hippocampus, brain-Substantia nigra, brain-thalamus, brain-whole, fetal brain, fetal kidney, certainty of 0.3000, the mitochondrial membrane space with fetal liver, fetal lung, heart, kidney, lymphoma-Raji, mam a certainty of 0.1000, or the lysosome (lumen) with a mary gland, pancreas, pituitary gland, placenta, prostate, certainty of 0.1000. Salivary gland, Skeletal muscle, Small intestine, Spinal cord, Spleen, Stomach, testis, thyroid, trachea, uterus. Usually the TABLE 4B resulting amplicons were gel purified, cloned and Sequenced NOV4a protein sequence (SEQ ID NO: 16) to high redundancy. The resulting Sequences from all clones were assembled with themselves, with other fragments in MQPASAKWYDRRDYWFIEFCVEDSKDVNVNFEKSKLTFSCLGGSDNFKHL CuraGen Corporation's database and with public ESTs. NEIDLFHCIDPNDSKHKRTDRSILCCLRKGESGQSWPRLTKERAKMMNNM Fragments and ESTs were included as components for an assembly when the extent of their identity with another GGDEDVDLPEVDGADDDSQDSDDEKMPDLE component of the assembly was at least 95% over 50 bp. In addition, Sequence traces were evaluated manually and edited for corrections if appropriate. These procedures pro 0.138. The full amino acid sequence of the protein of the vide the Sequences reported below, which are designated invention was found to have 101 of 122 amino acid residues NOV4b . (82%) identical to, and 107 of 122 amino acid residues (87%) similar to, the 160 amino acid residue ptnr:SWISS 0142. A disclosed NOV4b nucleic acid of 520 nucle NEW-ACC:Q15185 protein from Homo Sapiens (Human) otides (designated CuraGen Acc. No. CG55700-02) encod (HSP90 Co-Chaperone (Progesterone Receptor Complex ing a novel HSP90 Co-Chaperone (Progesterone Receptor P23)) (E-7.9e). Complex P23)-like protein is shown in Table 4C. An open reading frame was identified beginning with an ATG initia 0139 NOV4 is expressed in at least Adrenal Gland/ tion codon at nucleotides 1-3 and ending with a TAA codon Suprarenal gland, Amnion, Amygdala, Aorta, Appendix, at nucleotides 481-483. A putative untranslated region Ascending Colon, Bone, Bone Marrow, Brain, Bronchus, downstream from the termination codon is underlined in Brown adipose, Cartilage, Cervix, Chorionic Villus, Table 4C, and the start and stop codons are in bold letters. , Colon, Cornea, Coronary Artery, Dermis, Duode num, Epidermis, Foreskin, Gall Bladder, Gastro-intestinal/ TABLE 4C Digestive System, Hair Follicles, Heart, Hippocampus, Islets of Langerhans, Kidney, Kidney Cortex, Larynx, Left NOV4b Nucleotide Sequence (SEQ ID NO: 17) cerebellum, Liver, Lung, Lung Pleura, Lymph node, Lym phoid tissue, Mammary gland/Breast, Muscle, Ovary, Ovi AGCAGCCTGCTTCTGCAAAGTGGTACGATCGAAGGGACTATGTCTTCAT duct/Uterine Tube/Fallopian tube, Pancreas, Parathyroid TGAATTTTGTGTTGAAGACAGTAAGGATGTTAATGTAAATTTTGAAAAAT Gland, Parietal Lobe, Parotid Salivary glands, Peripheral Blood, Pharynx, Pituitary Gland, Placenta, Prostate, Retina, CCAAACTTACATTCAGTTGTCTCGGAGGAAGTGATAATTTTAAGCATTTA Right Cerebellum, Salivary Glands, Skin, Small Intestine, AATGAAATTGATCTTTTTCACTGTATTGATCCAAATGATTCCAAGCATAA Spinal Chord, Spleen, Stomach, Substantia Nigra, Temporal Lobe, Testis, Thalamus, Thymus, Thyroid, Tonsils, Trachea, US 2003/0170630 A1 Sep. 11, 2003 32

fetal liver, fetal lung, heart, kidney, lymphoma-Raji, mam TABLE 4C-continued mary gland, pancreas, pituitary gland, placenta, prostate, Salivary gland, Skeletal muscle, Small intestine, Spinal cord, NOV4b Nucleotide Sequence (SEQ ID NO:17) Spleen, Stomach, testis, thyroid, trachea, uterus. Usually the AAGAACGGACAGATCAATTTTATGTTGTTTACGAAAAGGAGAATCTGGCC resulting amplicons were gel purified, cloned and Sequenced AGTCATGGCCAAGGTTAACAAAAGAAAGGGCAAAGCTTAATTGGCTTAGT to high redundancy. The resulting Sequences from all clones were assembled with themselves, with other fragments in GTCGACTTCAATAATTGGAAAGACTGGGAAGATGATTCAGATGAAGACAT CuraGen Corporation's database and with public ESTs. GTCTAATTTTGATCGTTTCTCTGAGATGATGAACAACATGGGTGGTGATG Fragments and ESTs were included as components for an assembly when the extent of their identity with another AGGATGTAGATTTACCAGAAGTAGATGGAGCAGATGATGATTCACAAGAC component of the assembly was at least 95% over 50 bp, In AGTGATGATGAAAAAATGCCAGATCTGGAGAAGGAATATTGTCATCAC addition, Sequence traces were evaluated manually and edited for corrections if appropriate. These procedures pro CTGGATTTTGAGAAAGAAAAA vide the Sequences reported below, which are designated Accession Number NOV4c 0143 A NOV4b polypeptide (SEQ ID NO: 18) encoded by SEQ ID NO: 17 is 160 amino acid residues and is 0147 A disclosed NOV4c nucleic acid of 426 nucleotides presented using the one letter code in Table 4D. (designated CuraGen Acc. No. CG55700-03) encoding a novel HSP90 co-chaperone -like protein is shown in Table TABLE 4D 4E. An open reading frame was identified beginning with a CCT initiation codon at nucleotides 1-3 and ending at NOV4b protein sequence (SEQ ID NO: 18) nucleotides 424-426. The start codon is in bold letters in MQPASAKWYDRRDYWFIEFCVEDSKDVNVNFEKSKLTFSCLGGSDNFKHL Table 4E. Because the initiation codon is not a traditional initiation codon, and the lack of a termination codon, NEIDLFHCIDPNDSKHKRTDRSILCCLRKGESGQSWPRLTKERAKLNWLS NOV4c could be a partial reading frame that could be WDFNNWKDWEDDSDEDMSNFDRFSEMMNNMGGDEDVDLPEVDGADDDSQD extended in the 5' or 3' directions.

SDDEKMPDLE TABLE 4E NOV4c Nucleotide Sequence (SEQ ID NO:19) 0144. The human cDNA encodes a protein of 160 amino acids that does not show homology to previously identified CCGCTTCTGCAAAGTGGTACGATCGAAGGGACTATGTCTTCATTGAATT proteins. The chicken and human cDNAS are 88% identical TTGTGTTGAAGACAGTAAGGATGTTAATGTAAATTTTGAAAAATCCAAAC at the DNA level and 96.3% identical at the protein level. p23 is a highly acidic phosphoprotein with an aspartic TTACATTCAGTTGTCTCGGAGGAAGTGATAATTTTAAGCATTTAAATGAA acid-rich carboxy-terminal domain. Bacterially overex pressed human p23 was used to raise Several monoclonal ATTGATCTTTTTCACTGTATTGATCCAAATGATTCCAAGCATAAAAGAAC antibodies to p23. These antibodies Specifically immunopre GGACAGATCAATTTTATGTTGTTTACGAAAAGGAGAATCTGGCCAGTCAT cipitate p23 in complex with hsp90 in all tissues tested and can be used to immunoaffinity isolate progesterone receptor GGCCAAGGTTAACAAAAGAAAGGGCAAAGCTTAATTGGCTTAGTGTCGAC complexes from chicken Oviduct cytosol. TTCAATAATTGGAAAGACTGGGAAGATGATTCAGATGAAGACATGTCTAA

0145) NOV4c TTTTGATCGTTTCTCTGAGAAATGCCAGATCTGGAGTAAGGAATATTGTC 0146 In the present invention, the target Sequence iden tified previously NOV4a, was subjected to the exon linking ATCACCTGGATTTGAAGAAAGAAAAA process to confirm the Sequence. PCR primers were designed by Starting at the most upstream Sequence avail able, for the forward primer, and at the most downstream 0.148. The nucleic acid sequence of NOV4, localized to Sequence available for the reverse primer. In each case, the , has 399 of 423 bases (94%) identical to a Sequence was examined, walking inward from the respective gb:GENBANK-ID:HUMPRAacc:L24804.1 mRNA from termini toward the coding Sequence, until a Suitable Homo Sapiens (Human (p23) mRNA, complete cds) Sequence that is either unique or highly Selective was (E-7.0e–7). encountered, or, in the case of the reverse primer, until the Stop codon was reached. Such primers were designed based 0149 A NOV4c polypeptide (SEQ ID NO: 20) encoded on in silico predictions for the full length cDNA, part (one by SEQ ID NO: 19 is 142 amino acid residues and is or more exons) of the DNA or protein sequence of the target presented using the one letter code in Table 4F. Signal P. Sequence, or by translated homology of the predicted exons Psort and/or Hydropathy results predict that NOV4c has no to closely related human Sequences Sequences from other Signal peptide and is likely to be localized at the microbody species. These primers were then employed in PCR ampli (peroxisome) with a certainty of 0.7015. In other embodi fication based on the following pool of human cDNAS: ments, NOV4c may also be localized to the nucleus with a adrenal gland, bone marrow, brain-amygdala, brain-cer certainty of 0.4600, the mitochondrial membrane space with ebellum, brain-hippocampus, brain-Substantia nigra, a certainty of 0.1000, or the lysosome (lumen) with a brain-thalamus, brain-whole, fetal brain, fetal kidney, certainty of 0.1000. US 2003/0170630 A1 Sep. 11, 2003

(100%) identical to, and 123 of 123 amino acid residues TABLE 4F (100%) similar to, the 160 amino acid residue ptnr:SWISS NEW-ACC:Q15185 protein from Homo sapiens (Human) NOV 4c protein sequence (SEQ ID NO: 20) (HSP90 Co-Chaperone (Progesterone Receptor Complex PASAKWYDRRDYWFIEFCWEDSKDWNWNFEKSKILTFSCLGGSDNFKHLNE P23)) (E=1.5e7). 0151. NOV4c is expressed in at least liver, pancreas, IDLFHCIDPNDSKHKRTDRSILCCLRKGESGQSWPRLTKERAKLNWLSVD lymph node, hepatocellular carcinoma. Expression informa FNNWKDWEDDSDEDMSNFDRFSEKCOIWSKEYCHHLDLKKEK tion was derived from the tissue Sources of the Sequences that were included in the derivation of the Sequence of Cura Gen. Acc. No. CG55700-03. 0150. The full amino acid sequence of the protein of the 0152 NOV4a also has homology to the amino acid invention was found to have 123 of 123 amino acid residues sequences shown in the BLASTP data listed in Table 4G.

TABLE 4G

BLAST results for NOV4a Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi1362904pir progesterOne 160 121f160 121/160 2e-55 A56211 receptor-related (75%) (75% protein p23 - human gi892.8249sp TELOMERASE- 160 119/160 121f16O 2e-54 Q9ROQ7 P23 MOUSE BINDING (74%) (75%) PROTEIN P23 (HSP90 CO-CHAPERONE) (PROGESTERONE RECEPTOR COMPLEX P23) gi508180Ogb telomerase binding 16O 117/16O 119/16O 9e-53 AAD39543.1 protein p23 (73%) (74%) AF153479 1 Mus musculus (AF153479) progesterOne 160 116/160 120/160 2e-52 B56211 receptor-related (72%) (74%) protein p23 - chicken Chain A, Crystal 125 95/96 96/96 4e-47 Structure Of The (98%) (99%) Human Co-Chaperone P23

0153. The homology of these sequences is shown graphi cally in the ClustalW analysis shown in Table 4H.

US 2003/0170630 A1 Sep. 11, 2003

0154 Using immunoprecipitation of unactivated avian O157 The protein similarity information, expression pat progesterone receptor, Johnson et al. (Mol Cell Biol 1994; tern, and map location for the HSP90 co-chaperone-like 14:1956-63) purified hsp90, hsp70, and three additional protein and nucleic acid (NOV4) disclosed herein Suggest proteins, p54, p50, and p23. p23 is also present in immu that this NOV4 protein may have important structural and/or noaffinity-purified hsp9o complexes along with hsp70 and physiological functions characteristic of the HSP90 co another protein, p60. Antibody and cDNA probes for p23 chaperone family. Therefore, the NOV4 nucleic acids and were prepared in an effort to elucidate the Significance and proteins of the invention are useful in potential diagnostic function of this protein. Antibodies to p23 detect similar and therapeutic applications. These include Serving as a levels of p23 in all tissues tested and croSS-react with a Specific or Selective nucleic acid or protein diagnostic and/or protein of the same Size in mice, rabbits, guinea pigs, prognostic marker, wherein the presence or amount of the humans, and Saccharomyces cerevisiae, indicating that p23 nucleic acid or the protein are to be assessed, as well as is a conserved protein of broad tissue distribution. These potential therapeutic applications Such as the following: (i) antibodies were used to Screen a chicken brain cDNA library, resulting in the isolation of a 468-bp partial cl)NA a protein therapeutic, (ii) a Small molecule drug target, (iii) clone encoding a Sequence containing four Sequences cor an antibody target (therapeutic, diagnostic, drug targeting/ responding to peptide fragments isolated from chicken p23. cytotoxic antibody), (iv) a nucleic acid useful in gene This partial clone was Subsequently used to isolate a full therapy (gene delivery/gene ablation), and (v) a composition length human cDNA clone. The human cDNA encodes a promoting tissue regeneration in vitro and in Vivo. protein of 160 amino acids that does not show homology to 0158. The NOV4 nucleic acids and proteins of the inven previously identified proteins. The chicken and human tion are useful in potential diagnostic and therapeutic appli cDNAS are 88% identical at the DNA level and 96.3% identical at the protein level. p23 is a highly acidic phos cations implicated in various diseases and disorders phoprotein with an aspartic acid-rich carboxy-terminal described below. For example, the compositions of the domain. Bacterially overexpressed human p23 was used to present invention will have efficacy for treatment of patients raise Several monoclonal antibodies to p23. These antibodies Suffering from adrenoleukodystrophy, congenital adrenal Specifically immunoprecipitate p23 in complex with hsp90 hyperplasia, hemophilia, hypercoagulation, idiopathic in all tissues tested and can be used to immunoaffinity isolate thrombocytopenic purpura, autoimmune disease, allergies, progesterone receptor complexes from chicken Oviduct asthma, immunodeficiencies, transplantation, graft verSuS cytosol. host disease, Von Hippel-Lindau (VHL) syndrome, Alzhe imer's disease, Stroke, tuberous Sclerosis, hypercalceimia, 0155 The disclosed NOV4 nucleic acid of the invention Parkinson's disease, Huntington's disease, cerebral palsy, encoding a HSP90 co-chaperone-like protein includes the epilepsy, Lesch-Nyhan Syndrome, multiple Sclerosis, ataxia nucleic acid whose Sequence is provided in Table 4A, 4C, 4E telangiectasia, leukodystrophies, behavioral disorders, or a fragment thereof. The invention also includes a mutant addiction, anxiety, pain, neuroprotection, arthritis, tendoni or variant nucleic acid any of whose bases may be changed tis, fertility, atherosclerosis, aneurysm, hypertension, fibro from the corresponding base shown in Table 4A, 4C, or 4E muscular dysplasia, Stroke, Scleroderma, obesity, myocar while still encoding a protein that maintains its HSP90 dial infarction, embolism, cardiovascular disorders, bypass co-chaperone -like activities and physiological functions, or Surgery, cirrhosis, inflammatory bowel disease, diverticular a fragment of Such a nucleic acid. The invention further disease, Hirschsprung's disease, Crohn's Disease, appendi includes nucleic acids whose Sequences are complementary citis, ulcers, diabetes, renal artery Stenosis, interstitial to those just described, including nucleic acid fragments that nephritis, glomerulonephritis, polycystic kidney disease, are complementary to any of the nucleic acids just described. Systemic lupus erythematosus, renal tubular acidosis, IgA The invention additionally includes nucleic acids or nucleic nephropathy, laryngitis, emphysema, ARDS, lymphedema, acid fragments, or complements thereto, whose Structures muscular dystrophy, myasthenia gravis, endometriosis, pan include chemical modifications. Such modifications include, creatitis, hyperparathyroidism, hypoparathyroidism, growth by way of nonlimiting example, modified bases, and nucleic and reproductive disorders, Xerostomia, pSoriasis, actinic acids whose Sugar phosphate backbones are modified or keratosis, acne, hair growth/loSS, allopecia, pigmentation derivatized. These modifications are carried out at least in disorders, endocrine disorders, tonsillitis, cystitis, inconti part to enhance the chemical Stability of the modified nucleic nence, and/or other pathologies. The NOV4 nucleic acids, or acid, Such that they may be used, for example, as antisense fragments thereof, may further be useful in diagnostic appli binding nucleic acids in therapeutic applications in a Subject. cations, wherein the presence or amount of the nucleic acid In the mutant or variant nucleic acids, and their comple or the protein are to be assessed. ments, up to about 9% percent of the bases may be so changed. 0159 NOV4 nucleic acids and polypeptides are further useful in the generation of antibodies that bind immunospe 0156 The disclosed NOV4 protein of the invention cifically to the novel substances of the invention for use in includes the HSP90 co-chaperone-like protein whose therapeutic or diagnostic methods. These antibodies may be sequence is provided in Table 4B, 4D, or 4F. The invention generated according to methods known in the art, using also includes a mutant or variant protein any of whose prediction from hydrophobicity charts, as described in the residues may be changed from the corresponding residue “Anti-NOVX Antibodies' section below. For example, the shown in Table 4B, 4D, or 4F while still encoding a protein disclosed NOV4 protein has multiple hydrophilic regions, that maintains its HSP90 co-chaperone -like activities and each of which can be used as an immunogen. In one physiological functions, or a functional fragment thereof. In embodiment, a contemplated NOV4 epitope is from about the mutant or variant protein, up to about 28% percent of the amino acids 5 to 125. These novel proteins can be used in residues may be So changed. assay Systems for functional analysis of various human US 2003/0170630 A1 Sep. 11, 2003 36 disorders, which will help in understanding of pathology of the disease and development of new drug targets for various TABLE 5A-continued disorders. NOV5 Nucleotide Sequence (SEQ ID NO:21)

0160 NOV5 ATACGACCACAAGCGTTTTCGGGAGCACGACTTACCTATGGTGGCCTTAG 0161) A disclosed NOV5 nucleic acid of 2993 nucle otides (also referred to as CG55706-01) encoding a novel AGCAGATGCAAGGATTCAACCCTGGGCTCAATGGCACTGACAGGCTGCCC Type III adenylyl cyclase-like protein is shown in Table SA. CTGGTGCCTTCCAAGTACTCTATGACGGTGATGGTGTTCCTCATGATGCT An open reading frame was identified beginning with an ATG initiation codon at nucleotides 148-150 and ending CAGCTTCTACTACTTCTCCCGCCACGTAGAAAAACTGGCACGGACACTTT with a TAG codon at nucleotides 2431-2433. Putative TCTTGTGGAAGATTGAGGTCCACGACCAGAAGGAACGTGTCTATGAGATG untranslated regions upstream from the initiation codon and downstream from the termination codon are underlined in CGACGCTGGAACGAGGCCTTGGTCACCAACATGTTGCCTGAGCACGTGGC

Table 5A, and the start and stop codons are in bold letters. ACGCCATTTCCTGGGGTCCAAGAAGAGAGATGAGGAGCTGTATAGCCAGA

TABLE 5A CGTATGATGAGATTGGAGTCATGTTTGCCTCCCTGCCCAACTTTGCTGAC NOV5 Nucleotide Sequence (SEQ ID NO:21) TTCTACACAGAGGAGAGCATCAACAATGGTGGTATTGAGTGTCTGCGTTT

GCTGGAGGTGGCCTCCCCTCCGCCCCAGACAAGAAGAGGCCCTCAGCCCT CCT CAATGAAATCATCTCAGATTTTGACTCTCTCCTGGACAATCCCAAGT

CCCCCGGTCTCAGAGAGCCCTGAGAGGAGGCCCAGTCCAGAGCTCTTCCT TCCGGGTGATCACCAAGATCAAAACCATTGGCAGCACGTATATGGCGGCT

CCGTTCCCAGTCCACTTCTCTAGGGCCAGTAGCAGACACCAGCCAGTAG TCAGGAGTCACCCCCGATGTCAACACCAATGGCTTTGCCAGCTCCAACAA

CCGAGGAACCAGGGCTTCTCCGAGCCCGAATACTCGGCCGAGTACTCAGC GGAAGACAAGTCCGAGAGAGAGCGCTGGCAGCACCTGGCTGACCTGGCCG

CGAGTACTCCGTCAGCCTGCCCTCGGACCCTGACCGCGGGGTGGGCCGGA ACTTCGCGCTGGCCATGAAGGATACGCTCACCAACATCAACAACCAGTCC

CCCATGAAATCTCGGTCCGGAACTCGGGCTCCTGCCTGTGCCTGCCTCGC TTCAATAACTTCATGCTGCGCATAGGCATGAACAAAGGCGGGGTTCTGGC

TTCATGCGGCGCGGCTCTGCGGGGAGCAGCCCTCGGGCGCGCCGAGCTCT TGGGGTCATCGGAGCCCGGAAACCACACTACGACATCTGGGGCAATACAG

CCCGCCCCAGCCCGCGCGGGGACCGTCCCGGAGCACGCGGTGGCCGAGTT TCAATGTAGCCAGCAGGATGGAGTCCACGGGGGTCATGGGCAACATTCAG

CCCGCACAGTTCTAGCTGATCAGTGCTACCTGTGCTCTGGAAACCCGCTC GTGGTAGAAGAAACCCAAGTCATCCTCCGAGAGTACGGCTTCCGCTTTGT

TGCGTTCCTGCTGGAGGTGGCCTCCCCTTCGCCCCAGACAAGAAGAGGCC GAGGCGAGGCCCCATCTTTGTGAAGGGGAAGGGGGAGCTGCTGACCTTCT

CTCAGCCCTCCCCCGGTCTCAGAGAGCCCTGAGAGGAGGCCCAGTCCAGA TCTTGAAGGGGCGGGATAAGCTAGCCACCTTCCCCAATGGCCCCTCTGTC

GCTCTTCCTCAAAGTCCAGCTCCCCTGCCCTCATTGAGACCAAGGAGCCC ACACTGCCCCACCAGGTGGTGGACAACTCCGAATGGCCTCGAGCCTGAA

AACGGGAGTGCCCACAGCAGTGGGTCCACGTCGGAGAAGCCCGAGGAGCA ACAGTCCAAACCGGAAGGGAGAATTTATTTTTTGAAACTGAAGGAAGTC

GGATGCCCAGGCCGACAACCCCTCATTCCCCAACCCACGCCGGAGGCTGC CCGACCTTCCTGGATTGAAGTGCACACTCATGGACTTTAGGTTTAGAAAC

GCCTGCAGGACCTGGCTGACCGAGTGGTGGATGCCTCTGAAGATGAGCAC CTCCTCAGCCTTCATTTGTTCGTGGATGTGTGAGCTCTGAGGGTGGCCCT

GAGCTCAACCAGCTGCTCAACGAGGCCCTGCTTGAGCGAGAGTCCGCCCA GCTATTCCTCTGCGTGCCTGTAGTGTCCCCAGCATAGGGGTCTTAGGCAT AGTAGTAAAGAAGAGAAACACCTTCCTCTTGTCCATGCGGTTCATGGACC AGGGCTGAACAGTCCTTCCAGAGCCCTCGTTCCAATCCCTGCCGTCCTTG CCGAGATGGAAACCCGCTACTCGGTGGAGAAGGAGAAGCAGAGTGGGGCT CCCCTGAGGGGCCCTGACCACTGTGAGCAGGAGGGTGGCAGAGCTGGGAC GCCTTCAGCTGCTCCTGCGTCGTCCTGCTCTGCACGGCCCTGGTCGAGAT AAAGCTGCCTTTGCCGCTGGGCTTTCCGGGACTGTGGAGGGAGCACAGGC ACTCATCGACCCCTGGCTAATGACAAACTATGTGACCTTCATGGTGGGGG GGGGAAGCTCCACTTCAGACAGGGCTTGGTGGGGCAGGACATGGCTCCCA AGATTCTGCTCCTCATCCTGACCATCTGCTCCCTGGCTGCCATCTTTCCC TTTTGAAGGGAGGTCTCCATGTGGTCCGAGTGAGGTGAGACGGCCCTCGT CGGGCCTTTCCTAAGAAGCTTGTGGCCTTCTCAACTTGGATTGACCGGAC CCTGGTGTTCCTGATCATCTTGAAAGGTTCTTCTGGAACTCCTGTCCCCT CCGCTGGGCCAGGAACACCTGGGCCATGCTCGCCATCTTCATCCTGGTGA TAGTCATGAGAACAGAAAGTGCAATATTTCCTTTCACCTGGCCC TGGCAAATGTCGTGGACATGCTCAGCTGTCTCCAGTACTACACGGGACCC

AGCAATGCAACGGCAGGGATGGAGACGGAGGGCAGCTGCCTGGAGAACCC 0162 The NOV5 nucleic acid was identified on the CAAGTATTACAACTATGTGGCCGTGCTGTCCCTCATCGCCACCATCATGC p22-p24 region of and has 2489 of 2526 TGGTGCAGGTCAGCCACATGGTGAAGCTCACGCTCATGCTGCTCGTCGCA bases (98%) identical to a gb: GENBANK ID:AF033861 acc:AF033861.1 mRNA from Homo Sapiens GGCGCCGTGGCCACCATCAACCTCTATGCCTGGCGTCCCGTCTTTGATGA (Homo Sapiens type III adenylyl cyclase (AC-III) mRNA, complete cds) (E=0.0). US 2003/0170630 A1 Sep. 11, 2003 37

0163) A disclosed NOV5 polypeptide (SEQ ID NO: 22) encoded by SEQ ID NO: 21 is 761 amino acid residues and TABLE 5B-continued is presented using the one-letter code in Table 5B. Signal P. Psort and/or Hydropathy results predict that NOV5 has no Encoded NOV5 protein sequence (SEQ ID NO: 22) Signal peptide and is likely to be localized in the plasma membrane with a certainty of 0.6000. In other embodiments, QWWEETOWILREYGFRFVRRGPIFWKGKGELLTFFLKGRDKLATFPNGPS NOV5 may also be localized to the Golgi body with acer tainty of 0.4000, the endoplasmic reticulum with a certainty VTLPHQVVDNS of 0.3000, or the mitochondrial inner membrane with a certainty of 0.0300. 0164. The disclosed NOV5 amino acid sequence has 628 of 641 amino acid residues (97%) identical to, and 632 of TABLE 5B 641 amino acid residues (98%) similar to, the 1144 amino Encoded NOV5 protein sequence (SEQ ID NO: 22) acid residue ptnr:SPTREMBL-ACC:060266 protein from Homo sapiens (Human) (Type III ADENYLYL Cyclase MPRNOGFSEPEYSAEYSAEYSWSLPSDPDRGWGRTHEISWRNSGSCLCLP (KIAA0511 Protein)) (E=0.0). RFMRRGSAGSSPRARRALPPQPARGPSRSTRWPSSRTVLADQCYLCSGNP 0.165. NOV5 is expressed in at least adrenal gland, bone LCVPAGGGLPFAPDKKRPSALPRSQRALRGGPWQSSSSKSSSPALIETKE marrow, brain-amygdala, brain-cerebellum, brain-hip pocampus, brain-Substantia nigra, brain-thalamus, PNGSAHSSGSTSEKPEEQDAQADNPSFPNPRRRLRLQDLADRVVDASEDE brain-whole, fetal brain, fetal kidney, fetal liver, fetal lung, HELNOLLNEALLERESAQWWKKRNTFLLSMRFMDPEMETRYSWEKEKQSG heart, kidney, lymphoma-Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, Salivary gland, Skeletal AAFSCSCWWLLCTALWEILIDPWLMTNYWTFMWGEILLLILTICSLAAIF muscle, Small intestine, Spinal cord, Spleen, Stomach, testis, PRAFPKKLWAFSTWIDRTRWARNTWAMLAIFILVMANVVDMLSCLQYYTG thyroid, trachea, uterus. This information was derived by determining the tissue Sources of the Sequences that were PSNATAGMETEGSCLENPKYYNYWAVLSLIATIMLWOWSHMVKLTLMLLV included in the invention including but not limited to Seq AGAVATINLYAWRPWFDEYDHKRFREHDLPMVALEOMOGFNPGLNGTDRL Calling sources, Public EST sources, and/or RACE sources. 0166 In addition, the sequence is predicted to be PLVPSKYSMTVMVFLMMLSFYYFSRHWEKLARTLFLWKIEWHDQKERVYE expressed in human islet, brain, liver, and lung because of MRRWNEALWTNMLPEHWARHFLGSKKRDEELYSQTYDEIGVMFASLPNFA the expression pattern of (GENBANK-ID:gb:GENBANK ID:AF033861 acc:AF033861.1) a closely related Homo DFYTEESINNGGIECLRFLNEIISDFDSLLDNPKFRWITKIKTIGSTYMA Sapiens type III adenylyl cyclase (AC-III) mRNA, complete ASGWTPDWNTNGFASSNKEDKSERERWOHLADLADFALAMKDTLTNINNO cds homolog. SFNNFMLRIGMNKGGWLAGWIGARKPHYDIWGNTWNWASRMESTGWMGNI 0.167 NOV5 also has homology to the amino acid sequences shown in the BLASTP data listed in Table 5C.

TABLE 5C

BLAST results for NOV5

Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect ADENYLATE CYCLASE 1144 549/648 574/648 O.O CYA3 RAT TYPE III (84%) (87%) (ADENYLATE CYCLASE, OLFACTIVETYPE) (ATP PYROPHOSPHATELYASE) (ADENYLYL CYCLASE) (AC-III) (AC3) gi4757724 refNP adenylate cyclase 1144 588,619 588/619 O.O 004027.1 3; adenylyl (94%) (94%) (NM 004036) cyclase, type III; ATP pyrophosphatelyase Homo Sapiens gi7437177pir adenylate cyclase 1167 216/574 324/574 4e-99 T13927 (EC 4.6.1.1) (37%) (55%) isoform 39E - fruit fly (Drosophila melanogaster) US 2003/0170630 A1 Sep. 11, 2003

TABLE 5C-continued

BLAST results for NOV5 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi7302124gb Ac3 gene product 1167 216/574 324/574 5e-99 AAF57223.1 Drosophila melanogaster (37%) (55%) (AE003781) gi6752978refNP adenylate cyclase 1249 199/536 307/536 3e-91 033753.1 8 Mus musculus (37%) (57%) (NM 009623)

0168 The homology of these sequences is shown graphi cally in the ClustalW analysis shown in Table 5D.

US 2003/0170630 A1 Sep. 11, 2003 41

gi. 74371.77|pir gi7302124 gbA 968786 gi. 6752978 |ref 214 5 NOWS R 674 gi117787 spP2 F 1057 gi4757724.ref R 1057 gi7437177 pir 846 gi. 73 02124 gb|A 1028 10 gig752978 |ref 235

NOV5 734 gill,7787 spp2 1117 gi4757724 ref ESTF 1117 15 gi7437177 pir KPHYDIign TVNVASRMESTGs GNIQv O . &M 906 gil 7302124 IgbA KPHYDI.GNTv NVAsRMEsc OVTEET iRGIA GELETF 1088 gis?52978 |ref iNTE i. 262

NOW5 76. 20 gil 117787 spP2 1144 gi4757724 ref 144 gi7437177 pir 933 gi7302124 IgbA 148 gil 6752978 |ref 262 25 NOW5 ------f6. gil 117787 spp2 ------1144 gi4757724 ref ------1144 gi7437177 pir ------933 30 giT302124 gb|A LRNDSAGQESVAECQSLLE 1167 gi 6752978 ref ------262 US 2003/0170630 A1 Sep. 11, 2003 42

0169 Tables 5E-F list the domain description from Recently, type 3 adenylyl cyclase (AC-III) overexpression DOMAIN analysis results against NOV5. This indicates that has been implicated in reversing the defect of Spontaneous the NOV5 sequence has properties similar to those of other diabetics in Goto-Kakizaki (GK) rat. More recently, cDNA proteins known to contain this domain. of the human AC-III homologue has been cloned with an

TABLE 5E Domain Analysis of NOV5 gnl Pfampfam00211, guanylate cyc, Adenylate and Guanylate cyclase catalytic domain. (SEQ ID NO: 90) CD-Length=185 residues, 100.0% aligned Score=204 bits (518), Expect=2e-53 Query: 531 LYSQTYDEIGVMFASLPNFADFYTEESINNGGIECLRFLNEIISDFDSLLDNPKFRVITK 590 +++ + ++ + + + + i + Sbjct: 1 WYAERYDEWTILFADIWGFTALSERHSP----EEWWRLLNELFTRFDELWDAHG---GYK 53

Query: 591 IKTIGSTYMAASGVTPDVNTNGFASSNKEDKSERERWQHLADLADFALAMKDTLTNINNQ 650 Sbjct: 54 WKTIGDAYMAASGLPPA------SAAHAAKLADFATAMWEALEEWNWG 95

Query: 651 SFNNFMLRIGMNKGGVLAGVIGARKPHYDIWGNTVNVASRMESTGVMGNIQVVEETQVIL 710 ++ +++ Sbjct: 96 HTEPLRLRIGIHTGPWWAGWIGAKRPRYDWWGDTWNWASRMESLGPGKIHWSESTYRLL 155

Query : 711 -REYGFRF-VRRGPIFVKGKGE-LLTFFLK 737 + i + i + + + Sbjct: 156 NGLESFQFRFPRGEVSVKGKGKPMKTYFLH 185

0170)

TABLE 5F Domain Analysis of NOV5 gnl Smart Smart000 44, CYCC, Adenylyl-/guanylyl cyclase, catalytic domain; Present in two copies in mammalian adenylyl cyclases. Eubacterial homologues are known. Two residues (Asn, Arg) are thought to be involved in catalysis. These cyclases have important roles in a diverse range of cellular processes. (SEQ ID NO: 91) CD-Length=194 residues, 99.5% aligned Score=174 bits (442), Expect=1e-44 Query : 500 EMRRWNEALVTNMLPEHVARHFLGSKKRDEELYSQTYDEIGVMFASLPNFADFYTEESIN 559 + + + + || || + + ++ ++ + -- Sbjct: 1 EEKRKNDRILLDQLLPASWAESLKRGG---EPWPAPSYDEWTILFTDIWGFTALSSA---- 53 Query: 560 NGGIECLRFLNEIISDFDSLLDNPKFRVITKIKTIGSTYMAASGVTPDVNTNGFASSNKE 619 + + ++ ++ | + | | | | | | | + Sbjct: 54 ATPEQVVTLLNDLYSRFDRIIDRHG---GYKWKTIGDAYMVVSGLPTAAL------1 OO Query: 620 DKSERERWQHLADLADFALAMKDTLTNINNQ-SFNNFMLRIGMNKGGVLAGVIGARKPHY 678 ++ + +++ ++ Sbct: 101 ------WQHAELAALEALDMVESLKTWLWOHRGNGLRVRIGIHTGPWVAGVWGITMPRY 153 Query: 679 DIWGNTVNVASRMESTGVMGNIQVVEETQVILREYGFRFV 718 ++++ || | | | | | | | | | | + + Sbjct: 154 CLFGDTVNLASRMESVGDPGQIQVSEETYSLLRRRSGQFE 193

0171 Adenylyl cyclase (AC) is an enzyme that synthe- open reading frame encoding 1144 amino acids containing sizes cyclic adenosine monophosphate or cyclic AMP from 12 transmembrane-Spanning domains. Human AC-III gene adenosine triphosphate (ATP), an important player of Some shows 95% homology with the rat sequence and is widely intracellular Signaling pathways. Adenylyl cyclases are inte expressed in different tissues (Busfield et al., 2000, Genom gral membrane proteins that consist of two bundles of Six ics vol. 66: 213-216; Yang et al., 1999, Biochem Biophy Res transmembrane Segments and two catalytic domains extend commun, vol. 254: 548-551). ing as loops into the cytoplasm. There are at least nine isoforms of adenylyl cyclase, based on cloning of full-length 0172. The disclosed NOV5 nucleic acid of the invention cDNAs. These enzymes differ considerably in regulatory encoding a Type madenylyl cyclase-like protein includes properties and are differentially expressed among tissues. the nucleic acid whose Sequence is provided in Table 5A or US 2003/0170630 A1 Sep. 11, 2003 a fragment thereof. The invention also includes a mutant or 0178) NOV6a variant nucleic acid any of whose bases may be changed from the corresponding base shown in Table 5A while still 0179 A disclosed NOV6a nucleic acid of 1769 nucle encoding a protein that maintains its Type III adenylyl otides (also referred to as CG50389-02) encoding a novel cyclase-like activities and physiological functions, or a Interleukin 1 receptor related protein-like protein is shown fragment of Such a nucleic acid. The invention further in Table 6A. An open reading frame was identified begin includes nucleic acids whose Sequences are complementary ning with an ATG initiation codon at nucleotides 386-388 to those just described, including nucleic acid fragments that and ending with a TAG codon at nucleotides 1619-1621. A are complementary to any of the nucleic acids just described. putative untranslated region upstream from the initiation The invention additionally includes nucleic acids or nucleic codon and downstream from the termination codon is under acid fragments, or complements thereto, whose Structures lined in Table 6A, and the start and stop codons are in bold include chemical modifications. Such modifications include, letters. by way of nonlimiting example, modified bases, and nucleic acids whose Sugar phosphate backbones are modified or TABLE 6A derivatized. These modifications are carried out at least in part to enhance the chemical Stability of the modified nucleic NOV6a Nucleotide Sequence (SEQ ID NO: 23) acid, Such that they may be used, for example, as antisense CGCCCGCCCACGGCGGCGGGGAAATACCTAGGCATGGAAGTGGCATGACA binding nucleic acids in therapeutic applications in a Subject. In the mutant or variant nucleic acids, and their comple GGGCTCGTGTCCCTGTCATATTTTCCACTCTCCACGAGGTCCTGCGCGCT ments, up to about 2% percent of the bases may be So changed. TCAATCCTGCAGGCAGCCCGGTTTGGGGATGTGGTCCTTGCTGCTCTGCG GGTTGTCCATCGCCCTTCCACTGTCTGTCACAGCAGATGGATGCAAGGAC 0173 The disclosed NOV5 protein of the invention includes the Type III adenylyl cyclase-like protein whose ATTTTTATGAAAAATGAGATACTTTCAGCAAGCCAGCCTTTTGCTTTTAA sequence is provided in Table 5B. The invention also includes a mutant or variant protein any of whose residues TTGTACATTCCCTCCCATAACATCTGGGGAAGTCAGTGTAACATGGTATA may be changed from the corresponding residue shown in AAAATTCTAGCAAAATCCCAGTGTCCAAAATCATACAGTCTAGAATTCAC Table 5B while still encoding a protein that maintains its Type III adenylyl cyclase-like activities and physiological CAGGACGAGACTTGGATTTTGTTTCTCCCCATGGAAGGGGGGACTCAGG functions, or a functional fragment thereof. In the mutant or AGTCTACCAATGTGTTATAAAGACTGTAACGAGATTAAAGGGGAGCGGTT variant protein, up to about 63% percent of the residues may be So changed. CACTGTTTTGGAAACCAGGCTTTTGGTGAGCAATGTCTCGGCAGAGGACA 0.174. The NOV5 nucleic acids and proteins of the inven GAGGGAACTACGCGTGTCAAGCCATACTGACACACTCAGGGAAGCAGTAC tion are useful in potential therapeutic applications impli GAGGTTTTAAATGGCATCACTGTGAGCATTACAGAAAGAGCTGGATATGG cated in diabetes, heart failure, neurological diseaseS Such as epilepsy, sleep disorder, parkinsonism, Huntington's dis AGGAAGTGTCCCTAAAATCATTTATCCAAAAAATCATTCAATTGAAGTAC ease, Alzheimer's disease, depression, Schizophrenia, and/or other diseases, disorders and conditions of the like. The AGCTTGGTACCACTCTGATTGTGGACTGCAATGTAACAGACACCAAGGAT NOV5 nucleic acid, or fragments thereof, may further be AATACAAATCTACGATGCTGGAGAGTCAATAACACTTTGGTGGATGATTA. useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the protein are to be assessed. CTATGATGAATCCAAACGAATCAGAGAAGGGGTGGAAACCCATGTCTCTT 0.175. NOV5 nucleic acids and polypeptides are further TTCGGGAACATAATTTGTACACAGTAAACATCACCTTCTTGGAAGTGAAA useful in the generation of antibodies that bind immunospe ATGGAAGATTATGGCCTTCCTTTCATGTGCCACGCTGGAGTGTCCACAGC cifically to the novel substances of the invention for use in therapeutic or diagnostic methods. These antibodies may be ATACATTATATTACAGCTCCCAGCTCCGGATTTTCGAGCTTACTTGATAG generated according to methods known in the art, using prediction from hydrophobicity charts, as described in the GAGGGCTTATCGCCTTGGTGGCTGTGGCTGTGTCTGTTGTGTACATATAC “Anti-NOVX Antibodies' section below. For example the AACATTTTTAAGATCGACATTGTTCTTTGGTATCGAAGTGCCTTCCATTC disclosed NOV5 protein have multiple hydrophilic regions, each of which can be used as an immunogen. In one TACAGAGACCATAGTAGATGGGAAGCTGTATGACGCCTATGTCTTATACC embodiment, contemplated NOV5 epitope is from about CCAAGCCCCACAAGGAAAGCCAGAGGCATGCCGTGGATGCCCTGGTGTTG amino acids 5 to 270. In other embodiments, NOV5 epitope is from about amino acids 400 to 450, and from about amino AATATCCTGCCCGAGGTGTTGGAGAGACAATGTGGATATAAGTTGTTTAT acids 470 to 770. This novel protein also has value in development of powerful assay System for functional analy ATTCGGCAGAGATGAATTCCCTGGACAAGCCGTGGCCAATGTCATCGATG sis of various human disorders, which will help in under AAAACGTTAAGCTGTGCAGGAGGCTGATTGTCATTGTGGTCCCCGAATCG Standing of pathology of the disease and development of new drug targets for various disorders. CTGGGCTTTGGCCTGTTGAAGAACCTGTCAGAAGAACAAATCGCGGTCTA 0176) NOV6 CAGTGCCCTGATCCAGGACGGGATGAAGGTTATTCTCATTGAGCTGGAGA 0177 NOV6 includes three novel Airway Trypsin-Like AAATCGAGGACTACACAGTCATGCCAGAGTCAATTCAGTACATCAAACAG Protease-like proteins disclosed below. The disclosed AAGCATGGTGCCATCCGGTGGCATGGGGACTTCACGGAGCAGTCACAGTG sequences have been named NOV6a, NOV6b, and NOV6c. US 2003/0170630 A1 Sep. 11, 2003 44

pituitary gland, placenta, prostate, Salivary gland, Skeletal TABLE 6A-continued muscle, Small intestine, Spinal cord, Spleen, Stomach, testis, thyroid, trachea, uterus. This information was derived by NOV6a Nucleotide Sequence (SEQ ID NO:23) determining the tissue Sources of the Sequences that were included in the invention including but not limited to Seq TATGAAGACCAAGTTTTGGAAGACAGTGAGATACCACATGCCGCCCAGAA Calling sources, Public EST sources, and/or RACE sources. GGTGTCGGCCGTTTCTCCGGTCCACGTGCCGCAGCACACACCTCTGTACC 0184) NOV6b GCACCGCAGGCCCAGAACTAGGCTCAAGAAGAAAGAAGTGTACTCTCACG 0185. A disclosed NOV6b nucleic acid of 1827 nucle otides (also referred to as CG50389-03) encoding a novel ACTGGCTAAGACTTGCTGGACTGACACCTATGGCTGGAAGATGACTTGTT Interleukin 1 receptor related protein-like protein is shown TTGCTCCATGTCTCCTCATTCCTACACCTATTTTCTGCTGCAGGATGAGG in Table 6C. An open reading frame was identified beginning with an ATG initiation codon at nucleotides 65-67 and CTAGGGTTAGCATTCTAGA ending with a TAA codon at nucleotides 1715-1717. A putative untranslated region upstream from the initiation codon and downstream from the termination codon is under 0180. The disclosed NOV6a nucleic acid sequence, lined in Table 6C, and the start and stop codons are in bold located on the q12 region of chromosome 2, has 1363 of letters. 1370 bases (99%) identical to a gb:GENBANK ID:HSU49065acc:U49065.1 mRNA from Homo sapiens (Human interleukin-1 receptor-related protein mRNA, com TABLE 6C plete cds) (E=7.0e'). NOV6b Nucleotide Sequence (SEQ ID NO:25) 0181. A disclosed NOV6a polypeptide (SEQ ID NO: 24) GTCATATTTTCCACTCTCCACGAGGTCCTGCGCGCTTCAATCCTGCAGGC encoded by SEQ ID NO: 23 is 411 amino acid residues and is presented using the one-letter amino acid code in Table AGCCCGGTTTGGGGATGGTCCTTGCTGCTCTGCGGGTTGTCCATCGCC 6B. Signal P, Psort and/or Hydropathy results predict that CTTCCACTGTCTGTCACAGCAGATGGATGCAAGGACATTTTTATGAAAAA NOV6a contains no signal peptide and is likely to be localized in the plasma membrane with a certainty of TGAGATACTTTCAGCAAGCCAGCCTTTTGCTTTTAATTGTACATTCCCTC 0.7300. In other embodiments, NOV6A is also likely to be CCATAACATCTGGGGAAGTCAGTGTAACATGGTATAAAAATTCTAGCAAA. localized to the endoplasmic reticulum (membrane) with a certainty of 0.2000, or to the mitochondrial inner membrane ATCCCAGTGTCCAAAATCATACAGTCTAGAATTCACCAGGACGAGACTTG with a certainty of 0.1000 GATTTTGTTTCTCCCCATGGAATGGGGGGACTCAGGAGTCTACCAATGTG

TABLE 6B TTATAAAGGGTAGAGACAGCTGTCATAGAATACATGTAAACCTAACTGTT Encoded NOV6a protein sequence (SEQ ID NO:24). TTTGAAAAACATTGGTGTGACACTTCCATAGGTGGTTTACCAAATTTATC MGGLRSLPMCYKDCNEIKGERFTWLETRLLWSNWSAEDRGNYACQAILTH AGATGAGTACAAGCAAATATTACATCTTGGAAAAGATGATAGTCTCACAT

SGKQYEWLNGITVSITERAGYGGSWPKIIYPKNHSIEVOLGTTLIVDCNW GTCATCTGCACTTCCCGAAGAGTTGTGTTTTGGGTCCAATAAAGTGGTAT

TDTKDNTNLRCWRWNNTLWDDYYDESKRIREGWETHWSFREHNLYTWNT AAAGACTGTAACGAGATTAAAGGGGAGCGGTTCACTGTTTTGGAAACCAG

FLEVKMEDYGLPFMCHAGVSTAYIILQLPAPDFRAYLIGGLIALWAVAVS GCTTTTGGTGAGCAATGTCTCGGCAGAGGACAGAGGGAACTACGCGTGTC

WWYIYNIFKIDIWLWYRSAFHSTETIVDGKLYDAYWLYPKPHKESQRHAV AAGCCATACTGACACACTCAGGGAAGCAGTACGAGGTTTTAAATGGCATC

DALWLNILPEWLERQCGYKLFIFGRDEFPGQAVANVIDENVKLCRRLIVI ACTGTGAGCATTAGTACCACTCTGATTGTGGACTGCAATGTAACAGACAC

WWPESLGFGLLKNLSEEQIAVYSALIQDGMKVILIELEKIEDYTVMPESI CAAGGATAATACAAATCTACGATGCTGGAGAGTCAATAACACTTTGGTGG

QYIKOKHGAIRWHGDFTEQSQCMKTKFWKTVRYHMPPRRCRPFLRSTCRS ATGATTACTATGATGAATCCAAACGAATCAGAGAAGGGGTGGAAACCCAT

THLCTAPQAQN GTCTCTTTTCGGGAACATAATTTGTACACAGTAAACATCACCTTCTTGGA

AGTGAAAATGGAAGATTATGGCCTTCCTTTCATGTGCCACGCTGGAGTGT

0182. The disclosed NOV6a amino acid sequence has CAACAGCATACATTATATTACAGCTCCCAGCTCCGGATTTTCGAGCTTAC 401 of 401 amino acid residues (100%) identical to, and 401 of 401 amino acid residues (100%) similar to, the 562 amino TTGATAGGAGGGCTTATCGCCTTGGTGGCTGTGGCTGTGTCTGTTGTGTA acid residue ptnr:SPTREMBL-ACC:Q13525 protein from Homo Sapiens (Human) (Interleukin-1 Receptor-Related CATATACAACATTTTTAAGATCGACATTGTTCTTTGGTATCGAAGTGCCT Protein) (E=3.8e'). TCCATTCTACAGAGACCATAGTAGATGGGAAGCTGTATGACGCCTATGTC 0183) NOV6a is expressed in at least adrenal gland, bone TTATACCCCAAGCCCCACAAGGAAAGCCAGAGGCATGCCGTGGATGCCCT marrow, brain-amygdala, brain-cerebellum, brain-hip pocampus, brain-Substantia nigra, brain-thalamus, GGTGTTGAATATCCTGCCCGAGGTGTTGGAGAGACAATGTGGATATAAGT brain-whole, fetal brain, fetal kidney, fetal liver, fetal lung, TGTTTATATTCGGCAGAGATGAATTCCCTGGACAAGCCGTGGCCAATGTC heart, kidney, lymphoma-Raji, mammary gland, pancreas, US 2003/0170630 A1 Sep. 11, 2003 45

0188 The disclosed NOV6b amino acid sequence has TABLE 6C-continued 336 of 345 amino acid residues (97%) identical to, and 338 of 345 amino acid residues (97%) similar to, the 575 amino NOV6b Nucleotide Sequence (SEQ ID NO:25) acid residue ptnr:TREMBLNEW-ACC:AAG21368 protein from Homo sapiens (Human) (IL-1RRP2) (E=1.7e). ATCGATGAAAACGTTAAGCTGTGCAGGAGGCTGATTGTCATTGTGGTCCC 0189 NOV6b is expressed in at least the following CGAATCGCTGGGCTTTGGCCTGTTGAAGAACCTGTCAGAAGAACAAATCG tissues: amygdala, brain-cerebellum, brain-hippocam pus, brain-Substantia nigra, brain-thalamus, brain CGGTCTACAGTGCCCTGATCCAGGACGGGATGAAGGTTATTCTCGTTGAG whole, fetal brain, fetal kidney, fetal liver, fetal lung, heart, CTGGAGAAAATCGAGGACTACACAGTCATGCCAGAGTCAATTCAGTACAT kidney, lymphoma-Raji, mammary gland, pancreas, pitu itary gland, placenta, prostate, Salivary gland, Skeletal CAAACAGAAGCATGGTGCCATCCGGTGGCATGGGGACTTCACGGAGCAGT muscle, Small intestine, Spinal cord, Spleen, Stomach, testis, CACAGTGTATGAAGACCAAGTTTTGGAAGACAGTGAGATACCACATGCCA thyroid, trachea and uterus. Expression information was derived from the tissue Sources of the Sequences that were CCCAGAAGGTGTCGGCCGTTTCCTCCGGTCCAGCTGCTGCAGCACACACC included in the derivation of the sequence of NOV6b. TTGCTGCCGCACCGCAGGCCCAGAACTAGGCTCAAGAAGAAAGAAGTGTA 0190. NOV6c CTCTCACGACTGGCAAGACTTGCTGGACTGACACCTATGGCTGGAAGAT 0191) A disclosed NOV6c nucleic acid of 1897 nucle otides (also referred to as CG50389-04) encoding a novel GACTTGTTTTGCTCCATGTCTCCTCATTCCTACACCTATTTTCTGCTGCA Interleukin 1 receptor related protein-like protein is shown GGATGAGGCTAGGGTTAGCATTCTAGA in Table 6E. An open reading frame was identified beginning with an ATG initiation codon at nucleotides 51-53 and ending with a TAA codon at nucleotides 1785-1787. A 0186 The disclosed NOV6b nucleic acid sequence, putative untranslated region upstream from the initiation located on the p12 region of chromosome 2, has 1118 of codon and downstream from the termination codon is under 1121 bases (99%) identical to a gb:GENBANK lined in Table 6E, and the start and stop codons are in bold ID:AF284434|acc:AF284434.1 mRNA from Homo sapiens letters. (Homo Sapiens IL-1Rrp2 mRNA, complete cds) (E=0.0). 0187. A disclosed NOV6b polypeptide (SEQ ID NO: 26) TABLE 6E encoded by SEQ ID NO: 25 is 550 amino acid residues and NOV6c Nucleotide Sequence (SEQ ID NO: 27) is presented using the one-letter amino acid code in Table 6D. Signal P, Psort and/or Hydropathy results predict that GAATTCCGCCCGCCCACGGCGGCGGGGAAATACCTAGGCATGGAAGTGGC NOV6b contains a signal peptide and is likely to be localized AGACAGGGCTCGTGTCCCTGTCATATTTTCCACTCTCCACGAGGTCCTG in the plasma membrane with a certainty of 0.4600. In other embodiments, NOV6B is also likely to be localized to the CGCGCTTCAATCCTGCAGGCAGCCCGGTTTGGGGATGTGGTCCTTGCTGC endoplasmic reticulum (membrane) with a certainty of 0.1000, the endoplasmic reticulum (lumen) with a certainty TCTGCGGGTTGTCCATCGCCCTTCCACTGTCTGTCACAGCAGATGGATGC of 0.1000, or extracellularly with a certainty of 0.1000. The AAGGACATTTTTATGAAAAATGAGATACTTTCAGCAAGCCAGCCTTTTGC most likely cleavage site for NOV6b is between positions 19 and 20: VTA-DG. TTTTAATTGTACATTCCCTCCCATAACATCTGGGGAAGTCAGTGTAACAT GGTATAAAAATTCTAGCAAAATCCCAGTGTCCAAAATCATACAGTCTAGA TABLE 6D ATTCACCAGGACGAGACTTGGATTTTGTTTCTCCCCATGGAATGGGGGGA Encoded NOV6b protein sequence (SEQ ID NO: 26). CTCAGGAGTCTACCAATGTGTTATAAAGGGTAGAGACAGCTGTCATAGAA MWSLLLCGLSTALPLSWTADGCKDIFMKNEILSASQPFAFNCTFPPITSG TACATGTAAACCTAACTGTTTTTGAAAAACATTGGTGTGACACTTCCATA EWSWTWYKNSSKIPWSKIIQSRIHQDETWILFLPMEWGDSGVYQCVIKGR GGTGGTTTACCAAATTTATCAGATGAGTACAAGCAAATATTACATCTTGG DSCHRIHVNLTVFEKHWCDTSIGGLPNLSDEYKQILHLGKDDSLTCHLHF AAAAGATGATAGTCTCACATGTCATCTGCACTTCCCGAAGAGTTGTGTTT PKSCWLGPIKWYKDCNEIKGERFTWLETRLLWSNVSAEDRGNYACQAILT TGGGTCCAATAAAGTGGTATAAAGACTGTAACGAGATTAAAGGGGAGCGG HSGKQYEVLNGITVSISTTLIVDCNVTDTKDNTNLRCWRVNNTLVDDYYD TTCACTGTTTTGGAAACCAGGCTTTTGGTGAGCAATGTCTCGGCAGAGGA ESKRIREGWETHWSFREHNLYTWNITFLEWKMEDYGLPFMCHAGWSTAYI CAGAGGGAACTACGCGTGTCAAGCCATACTGACACACTCAGGGAAGCAGT ILOLPAPDFRAYLIGGLIALWAVAWSWWYIYNIFKIDIWLWYRSAFHSTE ACGAGGTTTTAAATGGCATCACTGTGAGCATTAGTACCACTCTGATTGTG TIVDGKLYDAYWLYPKPHKESQRHAWDALVLNILPEWLERQCGYKLFIFG GACTGCAATGTAACAGACACCAAGGATAATACAAATCTACGATGCTGGAG RDEFPGQAVANVIDENVKLCRRLIVIVWPESLGFGLLKNLSEEQIAVYSA AGTCAATAACACTTTGGTGGATGATTACTATGATGAATCCAAACGAATCA. LIQDGMKVILVELEKIEDYTVMPESIQYIKOKHGAIRWHGDFTEQSQCMK GAGAAGGGGTGGAAACCCATGTCTCTTTTCGGGAACATAATTTGTACACA TKFWKTVRYHMPPRRCRPFPPVOLLOHTPCCRTAGPELGSRRKKCTLTTG GTAAACATCACCTTCTTGGAAGTGAAAATGGAAGATTATGGCCTTCCTTT US 2003/0170630 A1 Sep. 11, 2003 46

0.6000, the Golgi body with a certainty of 0.4000, or in the TABLE 6E-continued mitochondrial inner membrane Space with a certainty of 0.3386. The most likely cleavage site for NOV6c is between NOV6c Nucleotide Sequence (SEQ ID NO: 27) positions 47 and 48: VTA-DG.

CATGTGCCACGCTGGAGTGTCAACAGCATACATTATATTACAGCTCCCAG TABLE 6F CTCCGGATTTTCGAGCTTACTTGATAGGAGGGCTTATCGCCTTGGTGGCT Encoded NOV6c protein sequence (SEQ ID NO: 28). GTGGCTGTGTCTGTTGTGTACATATACAACATTTTTAAGATCGACATTGT MTGLWSLSYFPLSTRSCALQSCRQPGLGMWSLLLCGLSTALPLSWTADGC

TCTTTGGTATCGAAGTGCCTTCCATTCTACAGAGACCATAGTAGATGGGA KDIFMKNEILSASQPFAFNCTFPPITSGEWSWTWYKNSSKIPWSKIIQSR

AGCTGTATGACGCCTATGTCTTATACCCCAAGCCCCACAAGGAAAGCCAG IHODETWILFLPMEWGDSGVYQCVIKGRDSCHRIHVNLTWFEKHWCDTSI

AGGCATGCCGTGGATGCCCTGGTGTTGAATATCCTGCCCGAGGTGTTGGA GGLPNLSDEYKQILHLGKDDSLTCHLHFPKSCWLGPIKWYKDCNEIKGER

GAGACAATGTGGATATAAGTTGTTTATATTCGGCAGAGATGAATTCCCTG FTWLETRLLWSNWSAEDRGNYACQAILTHSGKQYEWLNGITVSISTTLIV

GACAAGCCGTGGCCAATGTCATCGATGAAAACGTTAAGCTGTGCAGGAGG DCNWTDTKDNTNLRCWRWNNTLWDDYYDESKRIREGWETHWSFREHNLYT

CTGATTGTCATTGTGGTCCCCGAATCGCTGGGCTTTGGCCTGTTGAAGAA WNITFLEVKMEDYGLPFMCHAGVSTAYIILQLPAPDFRAYLIGGLIALWA

CCTGTCAGAAGAACAAATCGCGGTCTACAGTGCCCTGATCCAGGACGGGA VAWSWWYIYNIFKIDIVLWYRSAFHSTETIVDGKLYDAYWLYPKPHKESQ

TGAAGGTTATTCTCGTTGAGCTGGAGAAAATCGAGGACTACACAGTCATG RHAWDALVLNILPEVLERQCGYKLFIFGRDEFPGQAVANVIDENVKLCRR

CCAGAGTCAATTCAGTACATCAAACAGAAGCATGGTGCCATCCGGTGGCA LIVIVWPESLGFGLLKNLSEEQIAVYSALIQDGMKVILVELEKIEDYTVM

TGGGGACTTCACGGAGCAGTCACAGTGTATGAAGACCAAGTTTTGGAAGA PESIQYIKOKHGAIRWHGDFTEQSQCMKTKFWKTVRYHMPPRRCRPFPPV

CAGTGAGATACCACATGCCACCCAGAAGGTGTCGGCCGTTTCCTCCGGTC QLLQHTPCCRTAGPELGSRRKKCTLTTG

CAGCTGCTGCAGCACACACCTTGCTGCCGCACCGCAGGCCCAGAACTAGG

CTCAAGAAGAAAGAAGTGTACTCTCACGACTGGCAAGACTTGCTGGACT 0194 The disclosed NOV6c amino acid sequence has 336 of 345 amino acid residues (97%) identical to, and 338 GACACCTATGGCTGGAAGATGACTTGTTTTGCTCCATGTCTCCTCATTCC of 345 amino acid residues (97%) similar to, the 575 amino acid residue ptnr:TREMBLNEW-ACC:AAG21368 protein TACACCTATTTTCTGCTGCAGGATGAGGCTAGGGTTAGCATTCTAGA from Homo sapiens (Human) (IL-IRRP2) (E=1.7e). 0.192 The disclosed NOV6c nucleic acid sequence, 0195 NOV6c is expressed in at least the following located on the p12 region of chromosome 2, has 1118 of tissues: adrenal gland, bone marrow, brain-amygdala, 1121 bases (99%) identical to a gb:GENBANK brain-cerebellum, brain-hippocampus, brain-Substantia nigra, brain-thalamus, brain-whole, fetal brain, fetal kid ID:AF284434|acc:AF284434.1 mRNA from Homo sapiens ney, fetal liver, fetal lung, heart, kidney, lymphoma-Raji, (Homo Sapiens IL-1Rrp2 mRNA, complete cds) (E=0.0). mammary gland, pancreas, pituitary gland, placenta, pros 0193 A disclosed NOV6c polypeptide (SEQ ID NO: 28) tate, Salivary gland, Skeletal muscle, Small intestine, Spinal encoded by SEQ ID NO: 27 is 578 amino acid residues and cord, Spleen, Stomach, testis, thyroid, trachea and uterus. is presented using the one-letter amino acid code in Table 6F. Expression information was derived from the tissue Sources Signal P, Psort and/or Hydropathy results predict that of the Sequences that were included in the derivation of the NOV6c contains a signal peptide and is likely to be localized sequence of NOV6c. in the mitochondrial inner membrane with a certainty of 0.8546. In other embodiments, NOV6c is also likely to be 0.196 NOV6a also has homology to the amino acid localized to the plasma membrane with a certainty of sequences shown in the BLASTP data listed in Table 6G.

TABLE 6G

BLAST results for NOV6a

Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect

gi4504663 refNP interleukin 1 562 382.f4O1 382.f4O1 O.O 003845.1 receptor-like 2 (95%) (95%) (NM 003854) Homo Sapiens gi13637728refXP similar to IL-1Rrp2 603 356/375 356/375 O.O 002685.3 (H. Sapiens) (94%) (94%) (XM 002685) Homo Sapiens US 2003/0170630 A1 Sep. 11, 2003 47

TABLE 6G-continued

BLAST results for NOV6a Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi106446861gb IL-1Rrp2 575 355/375 356/375 AAG21368.1 Homo Sapiens (94%) (94%) AF284434 1 (AF284434) gi123.6081gb interleukin-1 561 262/38O e-155 AAB53238.1 receptor-related (68%) (U49066) protein Rattus norvegicus gi10644684gb IL-1Rrp2 Mus musculus 574. 262/38O e-153 AAG21367.1 (68%) AF284.433 1 (AF284.433)

0197) The homology of these sequences is shown graphi cally in the ClustalW analysis shown in Table 6H. US 2003/0170630 A1 Sep. 11, 2003 48

Table 6H Information for the ClustalW proteins 20 1) NOV6a (SEQ ID NO:24) 2) NOV6b (SEQ ID NO: 26) 3) NOV6c (SEQ ID NO: 28) 4) gi4504663 refNP 0.03845.1 (NM_003854) interleukin 1 receptor-like 2 (Homo sapiens) (SEQ ID NO:92) 25 5) gil363.7728 refXP 002685.3 (XM 002685) similar to IL-1Rrp2 (H. sapiens) (Homo sapiens) (SEQ ID NO: 93) 63

US 2003/0170630 A1 Sep. 11, 2003

0198 Tables 61-J list the domain description from may be changed from the corresponding base shown in DOMAIN analysis results against NOV6. This indicates that Table 6A, 6C, or 6E while still encoding a protein that the NOV6 sequence has properties similar to those of other maintains its Interleukin 1 receptor related protein-like proteins known to contain this domain. activities and physiological functions, or a fragment of Such

TABLE 61 Domain Analysis of NOV6 gnl Pfampfam01582, TIR, TIR domain. The TIR domain is an intracellular signaling domain found in MyD88 interleuicin 1 receptor and the Toll receptor. Called TIR (by SMART2) for Toll - Interleukin - Resistance. (SEQ ID NO :97) CD-Length = 141 residues 100.0% aligned Score = 128 bits (322), Expect = 6e-31 Query: 234 AYVLYPKPHKESQRHAVDALVLNILPEVLERQCGYKLFIFGRDEFPGQAVANVIDENVKL 293 ++ + ++ | | | + | | | | | | | | | |+++ + ++ Sbjct: 1 AFSFSGKDDR------DTFWSHILLKE-LEEKPGIKLFIDDRDELPGESILENLFEAIEK 53 Query: 294 CRRLIVIVVPESLGFGLLKNLSEEQIAVYSALIQDGMKVILIELEKIEDYTVMPESIQYI 353 + | | | | | | | | | || ++ + ++ Sbjct: 54 SRRAIVILSSNYASSSW--CLDELVEAWKLALEQGNKKVILPEFYKVDPSDWRKQSGKFG 111 Query: 354 KQKHGAIRWHGDFTEQSQCMRTKFWKTVRYHMPP 387 ++ + + Sbjct: 112 KAFLKTLKWFGDKTSQ----RIRFWKKALYAMPV 141

0.199)

TABLE 6 J Domain Analysis of NOV6 gnl Smart smartOO255, TIR, Toll - interleukin 1 - resistance (SEQ ID NO: 98) CD-Length = 140 residues, 99.3% aligned Score = 102 bits (254), Expect = 4e-23 Query: 232 YDAYVLYPKPHKESQRHAVDALVLNILPEVLERQCGYKLFIFGRDEFPGQAVANVIDENV 291 ++ -- -- + + i + + Sbjct: 2 YDWFISYSG------DEDWRNEFLSHLLEQLRGYKLCVFIDDFEPGGGDLENIDEAI 52 Query: 292 KLCRRLIVIVVPESLGFGLLKNLSEEQIAVYSALIQDGMKVILIELEKI-EDYTVMPESI 350 + ++ + + + ++ | | | Sbjct: 53 EKSRIAIVVLSPNYAESEWCLD--ELVAALENALEQGGLRVIPIFYEVIPSDVRKQPGSF 110 Query: 351 QYIKQKHGAIRWHGDFTEQSQCMKTKFWKTVRYHMPPR 388 + + ++ ++ ++ | || + + Sbjct: 111 RKVFKKN-YLKWTEDEKDR------FWKKALYAWPSK 14 O

0200 Interleukin-1 (IL-1) is a central regulator of the a nucleic acid. The invention further includes nucleic acids immune and inflammatory responses. Recently, a family of whose Sequences are complementary to those just described, proteins have been described that share significant homol including nucleic acid fragments that are complementary to ogy in their Signaling domains with the Type I IL-1 receptor any of the nucleic acids just described. The invention (IL-1RI), which includes the IL-1 receptor-related protein. additionally includes nucleic acids or nucleic acid frag The members of IL-1RI are clustered within 450 kb on ments, or complements thereto, whose Structures include human chromosome 2d and all of them are important in host chemical modifications. Such modifications include, by way responses to injury and infection. The remarkable conser of nonlimiting example, modified bases, and nucleic acids Vation between diverse species indicates that the IL-1System whose Sugar phosphate backbones are modified or deriva represents an ancient signaling machine critical for tized. These modifications are carried out at least in part to responses to environmental Stresses and attack by pathogens enhance the chemical Stability of the modified nucleic acid, (O'Neill L. A., Greene, C., 1998, J. Leukoc Biol., vol. 63: Such that they may be used, for example, as antisense 650-657, Busfield et al., 2000, Genomics vol. 66:213-216). binding nucleic acids in therapeutic applications in a Subject. In the mutant or variant nucleic acids, and their comple 0201 The disclosed NOV6 nucleic acid of the invention ments, up to about 1% percent of the bases may be So encoding a Interleukin 1 receptor related protein-like protein changed. includes the nucleic acid whose Sequence is provided in Table 6A, 6C, 6E or a fragment thereof. The invention also 0202) The disclosed NOV6 protein of the invention includes a mutant or variant nucleic acid any of whose bases includes the Interleukin 1 receptor related protein-like pro US 2003/0170630 A1 Sep. 11, 2003 52 invention also includes a mutant or variant protein any of roiditis and various forms of arthritis, cancer Such as AML, whose residues may be changed from the corresponding bacterial infections, and/or other pathologies and disorders. residue shown in Table 6B, 6D, or 6F while still encoding a protein that maintains its Interleukin 1 receptor related 0205 NOV6 nucleic acids and polypeptides are further protein-like activities and physiological functions, or a func useful in the generation of antibodies that bind immunospe tional fragment thereof. In the mutant or variant protein, up cifically to the novel substances of the invention for use in to about 32% percent of the residues may be So changed. therapeutic or diagnostic methods. These antibodies may be generated according to methods known in the art, using 0203 The above defined information for this invention prediction from hydrophobicity charts, as described in the Suggests that these Interleukin 1 receptor related protein-like “Anti-NOVX Antibodies' section below. For example the proteins (NOV6) may function as a member of a “Interleu disclosed NOV6 protein have multiple hydrophilic regions, kin 1 receptor related protein family”. Therefore, the NOV6 each of which can be used as an immunogen. In one nucleic acids and proteins identified here may be useful in embodiment, contemplated NOV6 epitope is from about potential therapeutic applications implicated in (but not amino acids 80 to 150. In other embodiments, NOV6 limited to) various pathologies and disorders as indicated epitope is from about amino acids 200 to 250, or from about below. The potential therapeutic applications for this inven amino acids 330 to 420. This novel protein also has value in tion include, but are not limited to: protein therapeutic, Small development of powerful assay System for functional analy molecule drug target, antibody target (therapeutic, diagnos sis of various human, disorders, which will help in under tic, drug targeting/cytotoxic antibody), diagnostic and/or Standing of pathology of the disease and development of prognostic marker, gene therapy (gene delivery/gene abla new drug targets for various disorders. tion), research tools, tissue regeneration in Vivo and in vitro of all tissues and cell types composing (but not limited to) 0206 NOV7 those. defined here. 0207. A disclosed NOV7 nucleic acid of 1769 nucle 0204. The nucleic acids and proteins of NOV6 are useful otides (also referred to CG50389-01) encoding a novel in any inflammatory diseases Such as uveitis and corneal Interleukin 1 receptor related protein-like protein is shown fibroblast proliferation, allergic encephalomyelitis, amyo in Table 7A. An open reading frame was identified begin trophic lateral Sclerosis, acute pancreatitis, cerebral crypto ning with an ATG initiation codon at nucleotides 45-47 and coccosis, autoimmune disease including Type 1 diabetes ending with a TGA codon at nucleotides 477-479. In Table mellitus (DM), experimental allergic encephalomyelitis 7A, the 5' and 3' untranslated regions are underlined and the (EAE), Systemic lupus erythematosus (SLE), colitis, thy Start and Stop codons are in bold letters.

TABLE 7A

NOV7 Nucleotide Sequence

(SEQ ID NO: 29) CGCCCGCCCACGGCGGCGGGGAAATACCTAGGCATGGAAGTGGCATGACAGGGCTCGTGTCCCTGTCATAT

TTTCCACTCTCCACGAGGTCCTGCGCGCTTCAATCCTGCAGGCAGCCCGGTTTGGGGATGTGGTCCTTGCT

GCTCTGCGGGTTGTCCACGCCCTTCCACTGTCTGTCACAGCAGATGGATGCAAGGACATTTTTATGAAAAA

ATGAGATACTTTCAGCAAGCCAGCCTTTTGCTTTTAATTGTACATTCCCTCCCATAACATCTGGGGAAGTC

AGTGTAACATGGTATAAAAATTCTAGCAAAATCCCAGTGTCCAAAATCATACAGTCTAGAATTCACCAGGA

CGAGACTTGGATTTTGTTTCTCCCCATGGAATGGGGGGACTCAGGAGTCTACCAATGTGTTATAAAGACTG

TAACGAGATTAAAGGGGAGCGGTTCACTGTTTTGGAAACCAGGCTTTTGGGAGCAATGTCTCGGCAGAGG

ACAGAGGGAACTACGCGTGTCAAGCCATACTGACACACTCAGGGAAGCAGTACGAGGTTTTAAATGGCATC

ACTGTGAGCATTACAGAAAGAGCTGGATATGGAGGAAGTGTCCCTAAAATCATTTATCCAAAAAATCATTC

AATTGAAGTACAGCTTGGTACCACTCTGATTGTGGACTGCAATGTAACAGACACCAAGGATAATACAAATC

TACGATGCTGGAGAGTCAATAACACTTTGGTGGATGATTACTATGATGAATCCAAACGAATCAGAGAAGGG

GTGGAAACCCATGTCTCTTTTCGGGAACATAATTTGTACACAGTAAACATCACCTTCTTGGAAGTGAAAAT US 2003/0170630 A1 Sep. 11, 2003 53

TABLE 7A-continued

NOV7 Nucleotide Sequence

GGAAGATTATGGCCTTCCTTTCATGTGCCACGCTGGAGTGTCCACAGCATACATTATATTACAGCTCCCAG

CTCCGGATTTTCGAGCTTACTTGATAGGAGGGCTTATCGCCTTGGTGGCTGTGGCTGTGTCTGTTGTGTAC

ATATACAACATTTTTAAGATCGACATTGTTCTTTGGTATCGAAGTGCCTTCCATTCTACAGAGACCATAGT

AGATGGGAAGCTGTATGACGCCTATGTCTTATACCCCAAGCCCCACAAGGAAAGCCAGAGGCATGCCGTGG

ATGCCCTGGTGTTGAATATCCTGCCCGAGGTGTTGGAGAGACAATGTGGATATAAGTTGTTTATATTCGGC

AGAGATGAATTCCCTGGACAAGCCGTGGCCAATGTCATCGATGAAAACGTTAAGCTGTGCAGGAGGCTGAT

TGTCATTGTGGTCCCCGAATCGCTGGGCTTTGGCCTGTTGAAGAACCTGTCAGAAGAACAAATCGCGGTCT

ACAGTGCCCTGATCCAGGACGGGATGAAGGTTATTCTCATTGAGCTGGAGAAAATCGAGGACTACACAGTC

ATGCCAGAGTCAATTCAGTACATCAAACAGAAGCATGGTGCCATCCGGTGGCATGGGGACTTCACGGAGCA

GTCACAGTGTATGAAGACCAAGTTTTGGAAGACAGTGAGATACCACATGCCGCCCAGAAGGTGTCGGCCGT

TTCTCCGGTCCACGTGCCGCAGCACACACCTCTGTACCGCACCGCAGGCCCAGAACTAGGCTCAAGAAGAA

AGAAGTGTACTCTCACGACTGGCTAAGACTTGCTGGACTGACACCTATGGCTGGAAGATGACCTGTTTTGC

TCCATGTCTCCTCATTCCTACACCTATTTTCTGCTGCAGGATGAGGCTAGGGTTAGCATTCTAGA

0208. The disclosed NOV7 nucleic acid sequence, local microbody (peroxisome) with a certainty of 0.6400, to the ized to the q12 region of chromosome 2, has 1363 of 1370 mitochondrial inner membrane with a certainty of 0.5762, or bases (99%) identical to a gb: GENBANK ID:HSU49065acc:U49065.1 mRNA from Homo sapiens the mitochondrial intermembrane Space with a certainty of (Human interleukin-1 receptor-related protein mRNA, com 0.3386. The most likely cleavage site for a NOV7 peptide is plete cds) (E=7.0e'). between amino acids 47 and 48, at: VTA-DG.

TABLE 7B

Encoded NOV7 protein sequence.

(SEQ ID NO:30) MTGLVSLSYFPLSTRSCALQSCRQPGLGMWSLLLCGLSIALPLSWTADGCKDIFMKNEILSASQPFAFNCT

FPPITSGEWSWTWYKNSSKIPWSKIIQSRIHQDETWILFLPMEWGDSGVYQCVIKTVTRLKGSGSLFWKPG

0209) A disclosed NOV7 polypeptide (SEQ ID NO:30) 0210. The disclosed NOV7 amino acid sequence has 129 of 144 amino acid residues (99%) identical to 129 of 563 encoded by SEQ ID NO: 29 is 144 amino acid residues and amino acid residues gb:GENBANK is presented using the one-letter amino acid code in Table ID:HSU49065|acc:U49065.1 protein from Homo Sapiens 7B. Signal P, Psort and/or Hydropathy results predict that (Human interleukin-1 receptor-related protein mRNA, com NOV7 has a signal peptide and is likely to be localized in the plete cds). plasma membrane with a certainty of 0.6500. In other 0211 NOV7 also has homology to the amino acid embodiments, NOV7 is also likely to be localized to the sequence shown in the BLASTP data listed in Table 7C. US 2003/0170630 A1 Sep. 11, 2003 54

TABLE 7C

BLAST results for NOV7 Gene Indexf Length Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi13637728refXP similar to IL- 603 126/126 126/126 002685.3 1Rrp2 (H. Sapiens) (100%) (100%) (XM 002685) Homo Sapiens gi4504663 refNP interleukin 1 562 98/98 98/98 Se-55 003845.1 (NM receptor-like 2 (100%) (100%) 003854) Homo Sapiens gi10644684gb IL-1Rrp2 574. 59/100 73/100 3e-30 AAG21367.1 Mus musculus (59%) (73%) AF284.433 1 (AF284.433) gi123.6081gb interleukin-1 561 54/100 73/100 AAB53238.1 receptor-related (54%) (73%) (U49066) protein Rattus norvegicus gi400047sp|Q02955 INTERLEUKIN-1 576 35/102 55/102 IL1R RAT RECEPTOR, TYPE I (34%) (53%) PRECURSOR (IL-1R-1) (P80)

0212. The homology of these sequences is shown graphi cally in the ClustalW analysis shown in Table 7D. US 2003/0170630 A1 Sep. 11, 2003 55

Table 7D. Information for the ClustalW proteins 10 l) NOV7 (SEO ID NO:30) 2) gi 13637728 refixP 002685.3 (XM 002685) similar to IL-1Rrp2 (H. sapiens) (Homo sapiens) (SEQ ID NO: 93) 3) gi4504663 refNP 0.03845.1 (NM_003854) interleukin l receptor-like 2 (Homo sapiens (SEQ ID NO:92) 15 4) gi10644684 gbAAG21367. IAF284.433 l (AF284.433) IL-1Rrp2 (Mus musculus) (SEQ ID NO:94) 5) gi 123.6081 gbAAB53238.1 (U49066) interleukin-1 receptor-related protein (Rattus norvegicus) (SEQ ID NO: 95) 6) gi400047 spo02.955 IL1R RAT INTERLEUKIN-1 RECEPTOR, TYPE I PRECURSOR (IL-1R-1) 20 (P80) (SEQ D NO: 99)

NOV7 MTGLVSLSYFPLSTRSCALQSCRQPG gil3637728 ref MTGLVSLSYFPLSTRSCALQSCRQPG gi4504663 ref 25 gi10644684gb gi123.6081gbA gi400047 apgo

NOW7 30 gi13637728 |ref gi4504663|ref gi10644684 Egb gi400047 spoo 35 3. NOW7 US 2003/0170630 A1 Sep. 11, 2003 56

gil 13637728 ref 7S gi4504663 ref 147 gillo 644684 gb 147 gil 23.6081 IgbA 50 gi400047 spq0 S. 49

NOW 135 gi 13637728 ref 233 gi4504663 ref 205 10 gi10644684gb 205 gill23.6081gbA 209 gi400047 spigo 209 Nowy 15 gil3637728 ref 144 gi4504663 ref 293 gillo 644684gb 265 gil23.6081gbA 265 gi400047 spoo 268 20 267 Now 44 gil3637728 ref 349 gi4504663 ref 321 gilio 644684gb 321 25 gi123.6081 gbA 324 gi400047 sp90 324

NOW l44 gil 1363.7728 ref FHSTETT 401 30 gi4504663 |ref FHSTET 379 gillo 644684gb FHSTET 379 gil 23.6081gbA FHTAQAP 382 gi400047 spq0 384

35 NOv7 44 gil363772B ref 467 gi4504663 ref 439 gi10644684 Egb 439 gil23.6081 gbA 442 40 gi 4 00047 splg0 444 Now gi13637728 ref 44 527 gi4504663 ref 499 45 gi10644684gb 499 gi1236.08igbA 502 gi400047 spoo SO4 Nov7 l44 50 gil3637728 ref HTPCYRTA 587 gi4504663 |ref HTPCCRTA 559 gillo 644684gb ST-- CRST 552 gil 123.6081 gbA PASSPVQ HIPCNCKA 562 gi400047 sp20 S 55 SPSKHH LDPWLDTK 564

NOV ------144 gi 13637728 ref GPELGSRRKKCTLIG 603 gi4504663 ref GPELGSRRKKCTLITG 575 gi10644684gb HLCTAPQAQN------562 60 gil 23.6081 gbA GKCNAATGTP- - - - 574 gi400047 spgo EKLQAETHLPLG - - - - 576 Tables 7E-F list the domain description from DOMAIN analysis results against NOV7. This indicates that the NOV7 sequence has properties similar to those of other proteins known 65 to contain this domain. US 2003/0170630 A1 Sep. 11, 2003 57

0213 Tables 7E-F list the domain description from while Still encoding a protein that maintains its Interleukin DOMAIN analysis results against NOV7. This indicates that 1 receptor related protein-like activities and physiological the NOV7 sequence has properties similar to those of other functions, or a fragment of Such a nucleic acid. The inven proteins known to contain this domain. tion further includes nucleic acids whose Sequences are

TABLE 7E Domain Analysis of NOV7 gnl Smart smartOO 408, IGC2, Immunoglobulin C-2 Type (SEQ ID NO: 100) C-Length = 63 residues, 85.7% aligned Score = 40.0 bits (92), Expect = Query: 64. QPFAFNCTFPPITSGEWSWTWYKNSSKIPWSKIIQSRIHQDETWILFLPMEWGDSGVYQC 123 -- ++ + + ++ + + + + Sbjct: 4 ESWTLTC--PASGDPWPNITWLKDGKPLP- - - - - ESRWWASGSTILTIKNWSLEDSGLYTC 56 Query: 124 W 124

Sbjct: 57 W. 57

0214)

TABLE 7F Domain Analysis of NOV7 nil Pfam famOOO 47 ig Immunoglobulin domain. Members of the immunoglobulin superfamily are found in hundreds of proteins of different functions. Examples include antibodies, the giant muscle kinase tiltin and receptor tyrosine kinases. Immunoglobulin-like domains may be involved in protein-protein and protein-ligand interactions. The Pfam alignments do not include the first and last strand of the immunoglobulin-like domain. (SEQ ID NO : 101) CD-Length = 68 residues, 97.1% aligned Score = 36.6 bits (83), Expect = 0.001

Query: 64 QPFAFNCTFPPITSGEVSVTWYKNSSKIPVSKIIQSRIHQDETW------ILFLPMEWGD 117 -- -- + + ++ + + ++ -- -- Sbjct: 2 ESWTILTCSWSG-YPPDPTWTWLRDGKEIELLGSSESRWSSGGRFSISSILSLTISSWTPED 60

Query: 118 SGVYQCV 124

Sbict: 61 SGTYTCV 67

0215 Interleukin-1 (IL-1) is a central regulator of the complementary to those just described, including nucleic immune and inflammatory responses. Recently, a family of acid fragments that are complementary to any of the nucleic proteins have been described that share significant homol acids just described. The invention additionally includes ogy in their Signaling domains with the Type I IL-1 receptor nucleic acids or nucleic acid fragments, or complements (IL-1RI), which includes the IL-1 receptor-related protein. thereto, whose structures include chemical modifications. The members of IL-1RI are clustered within 450 kb on Such modifications include, by way of nonlimiting example, human chromosome 2d and all of them are important in host modified bases, and nucleic acids whose Sugar phosphate responses to injury and infection. The remarkable conser backbones are modified or derivatized. These modifications Vation between diverse species indicates that the IL-1 sys are carried out at least in part to enhance the chemical tem represents an ancient Signaling machine critical for stability of the modified nucleic acid, such that they may be responses to environmental Stresses and attack by pathogens used, for example, as antisense binding nucleic acids in (O'Neill L. A., Greene, C., 1998, J. Leukoc Biol., vol. 63: therapeutic applications in a Subject. In the mutant or variant 650-657, Busfield et al., 2000, Genomics vol. 66:213-216). nucleic acids, and their complements, up to about 1% percent of the bases may be So changed. 0216) The disclosed NOV7 nucleic acid of the invention encoding a Interleukin 1 receptor related protein-like protein 0217. The disclosed NOV7 protein of the invention includes the nucleic acid whose Sequence is provided in includes the Interleukin 1 receptor related protein-like pro Table 7A or a fragment thereof. The invention also includes tein whose sequence is provided in Table 7B. The invention a mutant or variant nucleic acid any of whose bases may be also includes a mutant or variant protein any of whose changed from the corresponding base shown in Table 7A residues may be changed from the corresponding residue US 2003/0170630 A1 Sep. 11, 2003 58 shown in Table 7B while still encoding a protein that (EAE), Systemic lupus erythematosus (SLE), colitis, thy maintains its Interleukin 1 receptor related protein-like roiditis and various forms of arthritis, cancer Such as AML, activities and physiological functions, or a functional frag bacterial infectionsS, and/or other pathologies/disorders. The ment thereof. In the mutant or variant protein, up to about NOV7 nucleic acid, or fragments thereof, may further be 66% percent of the residues may be So changed. usefull in diagnostic applications, wherein the presence or 0218. The protein similarity information, expression pat amount of the nucleic acid or the protein are to be assessed. tern, and map location for the Interleukin 1 receptor related 0220 NOV7 nucleic acids and polypeptides are further protein-like protein and nucleic acid (NOV7) disclosed useful in the generation of antibodies that bind immunospe herein suggest that NOV7 may have important structural cifically to the novel substances of the invention for use in and/or physiological functions characteristic of the Interleu therapeutic or diagnostic methods. These antibodies may be kin 1 receptor related protein-like family. Therefore, the generated according to methods known in the art, using NOV7 nucleic acids and proteins of the invention are useful prediction from hydrophobicity charts, as described in the in potential diagnostic and therapeutic applications. These “Anti-NOVX Antibodies' section below. For example the include Serving as a specific or Selective nucleic acid or disclosed NOV7 protein have multiple hydrophilic regions, protein diagnostic and/or prognostic marker, wherein the each of which can be used as an immunogen. In one presence or amount of the nucleic acid or the protein are to embodiment, contemplated NOV7 epitope is from about be assessed, as well as potential therapeutic applications amino acids 15 to 30. In another embodiment, a contem Such as the following: (i) a protein therapeutic, (ii) a small plated NOV7 epitope is from about amino acids 70 to 135. molecule drug target, (iii) an antibody target (therapeutic, This novel protein also has value in development of pow diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic erful assay System for functional analysis of various human acid useful in gene therapy (gene delivery/gene ablation), disorders, which will help in understanding of pathology of and (v) a composition promoting tissue regeneration in vitro the disease and development of new drug targets for various and in vivo. disorders. 0219. The NOV7 nucleic acids and proteins of the inven tion are useful in potential diagnostic and therapeutic appli 0221) NOV8 cations implicated in various diseases and disorders 0222. A disclosed NOV8 nucleic acid of 954 nucleotides described below and/or other pathologies. For example, the (also referred to as CG50387-02) encoding a novel Con compositions of the present invention will have efficacy for nexin GJA3-like protein is shown in Table 8A. An open treatment of patients Suffering from uveitis and corneal reading frame was identified beginning with an ATG initia fibroblast proliferation, allergic encephalomyelitis, amyo tion codon at nucleotides 1-3 and ending with a TGA codon trophic lateral Sclerosis, acute pancreatitis, cerebral crypto at nucleotides 952-954. A putative untranslated region coccosis, autoimmune disease including Type 1 diabetes upstream from the initiation codon is underlined in Table mellitus (DM), experimental allergic encephalomyelitis 8A. The start and stop codons are in bold letters.

TABLE 8A

NOV8 nucleotide sequence.

(SEQ ID NO: 31) AGGGCGACTGGAGCTTTCTGGGAAGACTCTTAGAAAATGCACAGGAGCACTCCACGGTCATCGGCAAGGTT

TGGCTGACCGTGCTGTTCATCTTCCGCATTTTGGTGCTGGGGGCCGCGGCCGAGGACGTGTGCGGCGATGAG

CAGTCAGACTTCACCTGCAACACCCAGCAGCCGGOCTGCGAGAACGTCTGCTACGACAGGGCCTTCCCCATC

TCCCACATCCGCTTCTGGGCGCTGCAGATCATCTTCGTGTCCACGCCCACCCTCATCTACCTGGGCCACGTG

CTGCACATCGTGCGCATGGAGGAGAAGAAGAAAGAGAGGGACGAGGAGGAGCAGCTGAAGAGAGAGAGCCCC

AGCCCCAAGGAGCCACCGCAGGACAATCCCTCGTCGCGGGACGACCGCCGCAGGGTGCGCATCGCCGGCGCG

CTCCTCCCCACCTACCTCTTCAACATCATCTTCAGAGGGTCTTCCACCTCCCCTTCATCCCCCCCCCACTAC

TTTCTGTACGGCTTCGAGCTGAAGCCGCTCTACCGCTGCGACCGCTGGCCCTGCCCCAACACGGTGGACTGC US 2003/0170630 A1 Sep. 11, 2003 59

TABLE 8A-continued

NOV8 nucleotide sequence.

TTCATCTCCAGGCCCACGGAGAAGACCATCTTCATCATCTTCATGCTGGCGGTGGCCTGCGCGTCACTGCTG

CTCAACATGCTGGAGATATACCACCTGGGCTGGAAGAAGCTCAAGCAGGGCGTGACCAGCCGCCTCGGCCCG

GACGCCTCCGAGGCCCCGCTGGGGACAGCCGATCCCCCGCCCCTGCTGCTGGATGGGAGCGGCAGCAGTCTG

GAGGGGAGCGCCCTGGCAGGGACCCCCGAGGAGGAGGAGCAGGCCGTCACCACCGCCGCCCAGATGCACCAG

CCGCCCTTGCCCCTCGGAGACCCAGGTCGGGCCAGCAAGGCCAGCAGGGCCAGCAGCGGGCGGGCCAGACCG

GAGGACTTGGCCATCAG

0223) The NOV8 nucleic acid sequence is located on 0226 NOV8 is expressed in at least adrenal gland, bone chromsome 13, has 766 of 766 bases (100%) identical to a marrow, brain-amygdala, brain-cerebellum, brain-hip gb:GENBANK-ID:AF075290 acc:AF075290.1 mRNA from Homo Sapiens (Homo Sapiens gap-junction protein pocampus, brain-Substantia nigra, brain-thalamus, alpha 3 (GJA3) gene, complete cds) (E=1.7e') brain-whole, fetal brain, fetal kidney, fetal liver, fetal lung, 0224) The disclosed NOV8 polypeptide (SEQ ID NO: heart, kidney, lymphoma-Raji, mammary gland, pancreas, 32) encoded by SEQID NO:31 has 317 amino acid residues pituitary gland, placenta, prostate, Salivary gland, Skeletal and is presented in Table 8B using the one-letter amino acid muscle, Small intestine, Spinal cord, Spleen, Stomach, testis, code. Signal P, Psort and/or Hydropathy results predict that thyroid, trachea, uterus, lung. This information was derived NOV8 has a signal peptide and is likely to be localized to the plasma membrane with a certainty of 0.6000. In other by determining the tissue Sources of the Sequences that were embodiments, NOV8 may also be localized to the Golgi included in the invention including but not limited to Seq body with a certainty of 0.4000, the endoplasmic reticulum Calling Sources, Public EST Sources, Literature Sources, (membrane) with a certainty of 0.3000, or the microbody and/or RACE Sources. (peroxisome) with a certainty of 0.3000. The most likely cleavage site for NOV8 is between positions 41 and 42, 0227. In addition, the sequence is predicted to be AAA-ED. expressed in lens fiber cells because of the expression

TABLE 8B

Encoded NOV8 protein sequence.

(SEQ lID NO:32) MGDWSFLGRLLENAQEHSTWIGKWWLTVLFIFRILVLGAAAEDWWGDEQSDFTcNTQQPGCENVCYDRAFPI

SHIRFWALQIIFWSTPTLIYLGHVLMIWRMEEKKKEREEEEQLKRESPSPKEPPQDNPSSRDDRGRVRMAGA

LLRTYWFNIIFKTLFEWGFIAGQYFLYGFELKPLYRCDRWPCPNTVDCFISRPTEKTIFIIFMIAVACASLL

LNMLEIYHLGWKKLKOGWTSRLGPDASEAPLGTADPPPLLLDGSGSSLEGSALAGTPEEEEQAVTTAAQMHQ

PPLPLGDPGRASKASRASSGRARPEDLAI

0225. A search of sequence databases reveals that the pattern of (GENBANK-ID: gb:GENBANK NOV8 amino acid sequence has 255 of 255 amino acid ID:AF075290 acc:AF075290.1) a closely related Homo residues (100%) identical to, and 255 of 255 amino acid Sapiens gap-junction protein alpha 3 (GJA3) gene, complete residues (100%) similar to, the 435 amino acid residue cds homolog in Species Homo Sapiens. ptnr:TREMBLNEW-ACC:CAC16957 protein from Homo Sapiens (Human) (BA264J4.3 (Novel Connexin (Gap Junc 0228 NOV8 also has homology to the amino acid tion Protein)) (E=5.8e7). sequence shown in the BLASTP data listed in Table 8C. US 2003/0170630 A1 Sep. 11, 2003 60

TABLE 8C

BLAST results for NOV8 Gene Indexf Length Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi1348911OrefNP gap junction 435 233/249 233/249 e-134 068773.1 (NM 021954) protein, alpha 3, (93%) (93%) 46kD (connexin46) Homo Sapiens gi14753411 refXP gap junction 435 233/249 233/249 e-134 051651.1 (XM 051651) protein, alpha 3, (93%) (93%) 46kD (connexin46) Homo Sapiens gi8393440|refNP gap junction 417 208/256 219/256 e-116 058671.1 (NM 016975) membrane channel (81% (85%) protein alpha 3; connexin 46; alpha 3 connexin Mus musculus gi13242279 refNP connexin 46 416 207/255 218/255 e-116 O77352.1 (NM 024376) Rattus norvegicus (81%) (85% gi591913Ogb connexin 44 413 2O2/249 214/249 e-113 AAD56220.1 protein Ovis aries (81%) (85%) (AF177912)

0229. The homology of these sequences is shown graphi cally in the ClustalW analysis shown in Table 8D. US 2003/0170630 A1 Sep. 11, 2003 61

Table 8D. Information for the ClustalW proteins l) NOV8 (SEQ ID NO:32) 15 2) gil3489.10 refNP 068773 l (NM021954) gap junction protein, alpha 3, 46kD (connexin 46) (Homo sapiens) (SEQ ID NO: 102) 3) gil475341 refXP 051651.1 (XM 051651) gap junction protein, alpha 3, 46kD (connexin 46) (Homo sapiens) (SEQ ID NO: 103) 4) gi83934.40 refNP 058671. (NM 016975) gap junction membrane channel protein 20 alpha 3; connexin 46; alpha 3 connexin (Mus musculus (SEQ ID NO:104) 5) gi 13242279 refNP 077352.1 (NM 024376) connexin 46 (Rattus norvegicus) (SEQ ID NO 105) 6) gi5919130gbAAD56220.1 (AF177912) connexin 44 protein Ovis aries) (SEQ ID NO06) 25

NOW8 gill.34891.0|ref US 2003/0170630 A1 Sep. 11, 2003 62

gil 14753411 ref gi 8393440 ref gi i3242279 ref gil 591913 OgbA

Now8 2O gi1348,9110 ref 120 gii.475341.1 ref 20 gi 8393440 ref 9 10 gil 13242279 ref 19 gi. 5919130gbA 119

NOW8 E. A. s 73 gil348 9110 ref EF A. 73 15 gil4753411 ref 73 gis393440 ref F 179 gil3242279 ref 79 gis91.9130 gbA E.E.E. 70 20 NOW8 233 gi1348,9110 ref 233 gill.4753411 ref 233 gi 8393440 ref 239 gil 13242279 ref IFMLAvACASI IHGKKKQC 239 25 gis91.913 OgbA FMLAVA IYHLGWKKKOC 230 NOW8 253 gi1348,9110 ref 288 gi1475341.1 ref 288 30 gi 8393440 ref 2.94. gil 13242279 ref 294 gi S919.130 gbA 290

NOW8 ...... 253 35 gi 1348,9110 ref FKMALT 348 gil 14753411 ref FELLAT 348 gi. 8393440 ref FTWVTLNDA 352 gi. 13242279 ref FTWWTLNDA 352 gil 591913 OgbA --LPGTPG 347

40

NOW8 290 gil348,9110 ref SPWGSSSPPLAHEAEAGAAP 4.08

gil 147534ll ref SPWGSSSPPLAHEAEAGAAP AO8

gi 8393440|ref 393 45 gi13242279 ref 392 gis91.913 Ogb|A 386

NOW8 gil3489.110 ref 50 gil4753411 ref gi 8393440 ref gi13242279 ref gis 919130gbA US 2003/0170630 A1 Sep. 11, 2003

0230 Tables 8E-F list the domain description from coupled by gap junctions; this allows waves of intracellular DOMAIN analysis results against NOV8. This indicates that Ca" to propagate through nervous tissue, and may contrib the NOV8 sequence has properties similar to those of other ute to their ability to Spatially-buffer local changes in extra proteins known to contain this domain. cellular K" concentration.

TABLE 8E Domain Analysis of NOV8 nil Pfam famOOO29, connexin, Connexin. (SEQ ID NO : 107) CD-Length=218 residues, 99.5% aligned Score=355 bits (912), Expect=2e-99 Query: 3 DWSFLGRLLENAQEHSTVIGKVWLTVLFIFRILVLGAAAEDWWGDEQSDFTCNTQQPGCE 62 | | | | | | | | | + | | | | | | + |+|| || | | | | | | | | | | | | | | | | | | | | | | | | | | || 62 Sbjct: 2 DWSFLGRLLEGVNKHSTAIGKIWLSVLFIFRILVLGVAAESVWGDEQSDFVCNTQQPGCE 61 Query: 63 NVCYDRAFPISHIRFWALQIIFVSTPTLIYLGHVLHIVRMEEKKKEREEEEOLKRESPSP 122 | + || || + | + || || + | + || || + | | | | | ++ Sbjct: 62 NVCYDQFFPISHVRLWVLQLIFVSTPSLLYLGHVAYRVRREEKLREKEEEHSKGLYSEEA 121 Query: 123 KEPPQDNPSSRDDRGRVRMAGALLRTYVFNIIFKTLFEVGFIAGQYFLYGFELKPLYRCD 182 + + + + | | | | | |++++ || | | | | | + || | Sbjct: 122 KK------RCGSEDGKVRIRGGLWWTYWFSIIFKSIFEWGFLYGQYLLYGFTMSPLWWCS 175 Query: 183 RWPCPNTVDCFISRPTEKTIFIIFMLAVACASLLLNMLEIYHL 225 | + || || + | | | | | | | | | + + i + +++ Sbjct: 176 RAPCPHTVDCFVSRPTEKTIFIVFMLVVSAICLLLNLAELFYL 218

0231) 0234. The connexin protein family is encoded by at least 13 genes in rodents, with many homologues cloned from TABLE 8F other species. They show overlapping tissue expression patterns, most tissues expressing more than one connexin Domain Analysis of NOV8 type. Their conductances, permeability to different mol gnl Smart Smart00037, CNX Connexin homologues; ecules, phosphorylation and Voltage-dependence of their Connexin channels participate in the regulation of gating, have been found to vary. Possible communication signaling between developing and differentiated diversity is increased further by the fact that gap junctions cell types. (SEQ ID NO: 108) may be formed by the association of different connexin CD-Length=34 residues, 97.1% aligned isoforms from apposing cells. However, in Vitro Studies have Score=83.2 bits (204), Expect=2e-17 shown that not all possible combinations of connexins produce active channels. Query: 44 VWGDEQSDFTCNTQQPGCENVCYDRAFPISHIR 76 | | | | | | | | | | | | | | | | | | | | | | | | + | | | | | + | 0235 Hydropathy analysis predicts that all cloned con Sbjct: 2 WWGDEQSDFTCNTQQPGCENVCYDQFFPISHVR 34 nexins share a common transmembrane (TM) topology. Each connexin is thought to contain 4 TM domains, with two extracellular and three cytoplasmic regions. This model has 0232 The connexins are a family of integral membrane been validated for several of the family members by in vitro proteins that oligomerise to form intercellular channels that biochemical analysis. Both N- and C-termini are thought to are clustered at gap junctions. These channels are specialised face the cytoplasm, and the third TM domain has an amphi pathic character, Suggesting that it contributes to the lining Sites of cell-cell contact that allow the passage of ions, of the formed-channel. Amino acid Sequence identity intracellular metabolites and messenger molecules from the between the isoforms is ~50-80%, with the TM domains cytoplasm of one cell to its apposing neighbours. They are being well conserved. Both extracellular loops contain char found in almost all vertebrate cell types, and Somewhat acteristically conserved cysteine residues, which likely form Similar proteins have been cloned from plant Species. Inver intramolecular disulphide bonds. By contrast, the Single tebrates utilise a different family of molecules, innexins, that putative intracellular loop (between TM domains 2 and 3) share a similar predicted Secondary Structure to the verte and the cytoplasmic C-terminus are highly variable among brate connexins, but have no Sequence identity to them. the family members. Six connexins are thought to associate to form a hemi-channel, or conneXon. Two conneXons then 0233. Vertebrate gap junction channels are thought to interact (likely via the extracellular loops of their connexins) participate in diverse biological functions. For instance, in to form the complete gap junction channel. the heart they permit the rapid cell-cell transfer of action 0236 Two sets of nomenclature have been used to iden potentials, ensuring coordinated contraction of the cardi tify the connexins. The first, and most commonly used, omyocytes. They are also responsible for neurotransmission classifies the connexin molecules according to molecular at Specialised electrical Synapses. In non-excitable tissues, weight, Such as connexin43 (abbreviated to CX43), indicat Such as the liver, they may allow metabolic cooperation ing a connexin of molecular weight close to 43 kD. How between cells. In the brain, glial cells are extensively ever, Studies have revealed cases where clear functional US 2003/0170630 A1 Sep. 11, 2003

homologues exist acroSS Species that have quite different 0239). The disclosed NOV8 protein of the invention molecular masses; therefore, an alternative nomenclature includes the Connexin GJA3-like protein whose Sequence is was proposed based on evolutionary considerations, which provided in Table 8B. The invention also includes a mutant divides the family into two major Subclasses, alpha and beta, or variant protein any of whose residues may be changed each with a number of members. Due to their ubiquity and from the corresponding residue shown in Table 2 while still overlapping tissue distributions, it has proved difficult to encoding a protein that maintains its Connexin GJA3-like elucidate the functions of individual connexin isoforms. To activities and physiological functions, or a functional frag circumvent this problem, particular connexin-encoding ment thereof. In the mutant or variant protein, up to about genes have been Subjected to targeted-disruption in mice, 66% percent of the residues may be So changed. and the phenotype of the resulting animals investigated. 0240 The invention further encompasses antibodies and Around half the connexin isoforms have been investigated in antibody fragments, Such as F, or (F), that bind immu this manner. Further insight into the functional roles of nospecifically to any of the proteins of the invention. connexins has come from the discovery that a number of human diseases are caused by mutations in connexin genes. 0241 The above defined information for this invention For instance, mutations in CX32 give rise to a form of suggests that this Connexin GJA3-like protein (NOV8) may inherited peripheral neuropathy called X-linked dominant function as a member of a “Connexin GJA3 family'. There Charcot-Marie-Tooth disease. Similarly, mutations in Cx26 fore, the NOV8 nucleic acids and proteins identified here are responsible for both autosomal recessive and dominant may be useful in potential therapeutic applications impli forms of nonsyndromic deafness, a disorder characterised by cated in (but not limited to) various pathologies and disor loSS, with no apparent effects on other organ SyS derS as indicated below. The potential therapeutic applica temS. tions for this invention include, but are not limited to: protein therapeutic, Small molecule drug target, antibody 0237 Gap junction alpha-3 (GJA3) protein (also called target (therapeutic, diagnostic, drug targeting/cytotoxic anti connexin46, or CX46) is a connexin of ~435 amino acid body), diagnostic and/or prognostic marker, gene therapy residues. The bovine form is slightly shorter (401 residues) (gene delivery/gene ablation), research tools, tissue regen and is hence known as CX44, having a molecular mass of eration in Vivo and in Vitro of all tissues and cell types ~44 kD. Cx46 (together with CX50) is a connexin isoform expressed in the lens fibres of the eye. Here, gap junctions composing (but not limited to) those defined here. join the cells into a functional Syncytium, and also couple 0242. The NOV8 nucleic acids and proteins of the inven the fibres to the epithelial cells on the anterior surface of the tion are useful in potential therapeutic applications impli lens. The lens fibres depend on this epithelium for their cated in nonsyndromic deafneSS, keratinization disorders, metabolic Support, Since they lose their intra-cellular gap-junction-related neuropathies and other pathological organelles, and accumulate high concentrations of crystal conditions of the nervous System, where dysfunctions of lins, in order to produce their optical transparency. Geneti junctional communication are considered to play a casual cally-engineered mice deficient in CX46 demonstrate the role, demyelinating neuropathies (including Charcot-Marie importance of CX46 in forming lens fibre gap junctions, Tooth disease), erythrokeratodermia variabilis (EKV), atrio these mice develop normal lenses, but Subsequently develop ventricular (AV) conduction defects Such as arrhythmia, lens early onset Senile-type cataracts that resemble human cataracts and/or other diseases or pathologies. nuclear cataracts. Aberrant proteolysis of crystallin proteins 0243) NOV8 nucleic acids and polypeptides are further has been observed in the lenses of CX46-null mice. useful in the generation of antibodies that bind immuno 0238. The disclosed NOV8 nucleic acid of the invention specifically to the novel NOV8 Substances for use in thera encoding a Connexin GJA3-like protein includes the nucleic peutic or diagnostic methods. These antibodies may be acid whose Sequence is provided in Table 8A, or a fragment generated according to methods known in the art, using thereof The invention also includes a mutant or variant prediction from hydrophobicity charts, as described in the nucleic acid any of whose bases may be changed from the "Anti-NOVX Antibodies' section below. The disclosed corresponding base shown in Table 8A while Still encoding NOV8 protein has multiple hydrophilic regions, each of a protein that maintains its Connexin GJA3-like activities which can be used as an immunogen. In one embodiment, a and physiological functions, or a fragment of Such a nucleic contemplated NOV8 epitope is from about amino acids 40 acid. The invention further includes nucleic acids whose to 80. In another embodiment, a NOV8 epitope is from about Sequences are complementary to those just described, amino acids 90 to 150, from about amino acids 170 to 200, including nucleic acid fragments that are complementary to or from about amino acids 220 to 320. These novel proteins any of the nucleic acids just described. The invention can be used in assay Systems for functional analysis of additionally includes nucleic acids or nucleic acid frag various human disorders, which will help in understanding ments, or complements thereto, whose Structures include of pathology of the disease and development of new drug chemical modifications. Such modifications include, by way targets for various disorders. of nonlimiting example, modified bases, and nucleic acids whose Sugar phosphate backbones are modified or deriva 0244) NOV9 tized. These modifications are carried out at least in part to 0245) A disclosed NOV9 nucleic acid of 967 nucleotides enhance the chemical Stability of the modified nucleic acid, (also referred to as CG50271-01) encoding a novel Olfac Such that they may be used, for example, as antisense tory Receptor-like protein is shown in Table 9A. An open binding nucleic acids in therapeutic applications in a Subject. reading frame was identified beginning with an ATG initia In the mutant or variant nucleic acids, and their comple tion codon at nucleotides 12-14 and ending with a TGA ments, up to about 10% percent of the bases may be so codon at nucleotides 948-950. A putative untranslated region changed. upstream from the initiation codon and downstream from the US 2003/0170630 A1 Sep. 11, 2003

termination codon is underlined in Table 9A. The start and Stop codons are in bold letters. TABLE 9A-continued

TABLE 9A NOV9 nucleotide sequence. TCAACCCCCTCATATATACACTGAGGAACAAGGATG NOV9 nucleotide sequence. TGAAAGGAGCACTTAAGAAGGTGCTCTGGAAGAACT ACTAACAAAGAATGGATCAGAAAAATGGAAGTTCTT (SEQ ID NO:33) ACGACTCCAGAIGACTTGGAGAGAAAGACAT TCACTGGATTTATCCTACTGGGTTTCTCTGACAGGC

CTCAGCTGGAGCTAGTCCTCTTTGTGGTTCTTTTGA r 0246 The disclosed NOV9 polypeptide (SEQ ID NO: CTTCTATATCTTCACTTTGCTGGGGAACAAAACCA 30) encoded by SEQ ID NO: 29 has 312 amino acid TCATTGTATTATCTCACTTGGACCCACATCTTCACA residues, a molecular weight of 34977.1 and is presented in Table 9B using the one-letter amino acid code. Signal P. ATCCTATGTATTTTTTCTTCTCCAACCTAAGCTTTT Psort and/or Hydropathy results predict that NOV9 has a TGGATCTGTGTTACACAACCGGCATTGTTCCACAGC Signal peptide and is likely to be localized in the plasma membrane with a certainty of 0.6400. I The most likely CCTGGTTAATCTCAGGGGAGCAGACAAATCAATCT ceavage site for NOV9 is between positions 41 and 42, LLG-NK. CCTATGGTGGTTGTGTAGTTCAGCTGTACATCTCTC

TAGGCTTGGGATCTACAGAATGCGTTCTCTTAGGAG TABLE 9B TGATGGCATTTGACCGCTATGCAGCTGTTTGCAGGC Encoded NOV9 protein sequence. CCCTCCACTACACAGTAGTCATGCACCCTTGTCTGT MDQKNGSSFTGFILLGFSDRPQLELVLFVVLLIFYI (SEQ ID NO:34)

ATGTGCTGATGGCTTCTACTTCATGGGTCATTGGTT FTLLGNKTIIWLSHLDPHILHINPMYFFFSNILSFILDLC

TTGCCAACTCCCTATTGCAGACGGTGCTCATCTTGC YTTGIVPOLLVNLRGADKSISYGGCVWQLYISLGLG

TTTTAACACTTTGTGGAAGAAATAAATTAGAACACT STECWLLGWMAFDRYAAWCRPLHYTWWMHPCLYWLM

TTCTTTGTGAGGTTCCTCCATTGCTCAAGCTTGCCT ASTSWWIGFANSLLQTVLILLLTLCGRNKLEHFLCE

GTGTTGACACTACTATGAATGAATCTGAACTCTTCT WPPLLKLACWDTTMNESELFFWSWIILLWPWALIIF

TTGTCAGTGTCATTATTCTTCTTGTACCTGTTGCAT SYSQIVRAVMRIKSATGQRKVFGTCGSHLTVWSLFY

TAATCATATTCTCCTATAGTCAGATTGTCAGGGCAG GTAIYAYLOPGNNYSQDQGKFISLFYTIITPMINPL

TCATGAGGATAAAGTCAGCAACAGGGCAGAGAAAAG

TGTTTGGGACATGTGGCTCCCACCTCACAGTGGTTT 0247 A BLASTX of NOV9 shows a 55% (identities) and CCCTGTTCTACGGCACAGCTATCTATGCTTACCTCC 72% (positives) similarity to a Mouse Odorant Receptor AGCCCGGCAACAACTACTCTCAGGATCAGGGCAAGT MOR18 protein (E=1.2e).

TCATCTCTCTCTTCTACACCATCATTACACCCATGA 0248. The disclosed NOV9 polypeptide has homology to the amino acid sequences shown in the BLASTP data listed in Table 9C.

TABLE 9C

BLAST results for NOV9 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi17464665 refXP similar to 312 265/312 265/312 e-143 O69524.1 olfactory (84%) (84%) (XM 069524) receptor, family 2, subfamily W gi17455398 refXP similar to 252 221/249 222/249 e-119 O69445.1 olfactory (88%) (88%) (XM 069445) receptor (H. Sapiens) Homo Sapiens gi17445400 refXP similar to 309 169/301 205/301 1e-87 060573.1 olfactory (56%) (67%) (XM 060573) receptor 15 US 2003/0170630 A1 Sep. 11, 2003 66

TABLE 9C-continued

BLAST results for NOV9 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect (H. Sapiens) Homo Sapiens OLFACTORY 357 170/308 207/308 2e-87 RECEPTOR2B2 (55%) (67%) (OLFACTORY RECEPTOR 6-1) (OR6-1) (HS6M1-10) gi13624329 refNP olfactory 32O 167/305 202/305 3e-87 112165.1 receptor, family (54%) (65%) (NM 030903) 2, subfamily W. member 1 Homo Sapiens

0249. The homology between these and other sequences Sequence (i.e., regions that may be required to preserve is shown graphically in the ClustalW analysis shown in Structural or functional properties), whereas non-highlighted Table 9D. In the ClustalW alignment of the NOV9 protein, amino acid residues are leSS conserved and can potentially as well as all other ClustalW analyses herein, the black be altered to a much broader extent without altering protein outlined amino acid residues indicate regions of conserved Structure or function. US 2003/0170630 A1 Sep. 11, 2003 66

10 Table 9D. ClustalW Analysis of NOV9 1) Novel NOV9 (SEQ ID NO:34) 2) gi17464665 ref|XP 069524.1 (XM 069524) similar to olfactory receptor, family 2, subfamily W (SEQ ID NO: 109) 3) gi174553.98 ref|XP 069445.1 (XM 069445) similar to olfactory receptor (H. 15 sapiens) (Homo sapiens (SEQ ID NO:ll0) 4) gil 7445400 refXP 060573.1 (XM 06.0573) similar to olfactory receptor 15 (H. sapiens) (Homo sapiens (SEQ ID NO: lll) 5) gill 4423800 spo9GzK3|o2B2 HUMAN OLFACTORY RECEPTOR 2B2 (OLFACTORY RECEPTOR 6-1) (OR6-l) (HS6Ml-O) (SEQ ID NO: 112) 20 6) gi 13624329 refNP 112165.1 (NM 030903) olfactory receptor, family 2, subfamily W, member 1 (Homo sapiens) (SEQ ID NO:ll3)

NOW9 S gil 74.64665 ref 25 gil 1745,5398 |ref gi17445400 ref -glGNESS US 2003/0170630 A1 Sep. 11, 2003 67

gi14423800 sp gi13524329|ref NOW9 E3 M.A. 120 gi17464665 ref SYGGCVVOLY ISL SEE 120 gill 74.55398 |ref 60 gi17445400 ref EE EastECE s 20 gil4423800 sp VisiggcyACLs 120 gi. 1362.4329 ref Essie O VGCiCLY. 20 NOWS l80 gil 17464665 ref 80 gi17455398 ref TS 120 gil 17445400 ref PCG HFCE 80 15 gi 14423800 sp 18O gil 13624329 ref 80 NOW9 ASYPT FS A TS KSATGORKVFG 240 gil 17464665 ref PLLKACVDTMNESELE K 240 20 gil 174553.98 ref 80 gil 7445400 ref E. Svir TAVRIKSVEARER 240 gil 14423800 sp PALLKLSCVDTTANEAELFFSvFLPRTLI. SAV 240 gil 13624329 ref EcVisities. 240

25 NOWs CGSHITWS 9. AYA gil 17464665 ref SEWSysis SDQGKFESLFYT PLIYTLRNKDKGA 300 gil 17455398 |ref CSETvskoi...of SDQSKFSLFY PLIYTLRNKDKG 300 gil 17445400 ref CSSHLTv : y SEEEEEE 240 gi 14423800 sp cGSHLVSFYGTP ADQGKFSLFYTTPTENPIYTLRNKDKEASkDRGKMSLFccipM E. 300 30 gi 13624329 ref E. LFresnplyrillRNK5kDFCGEAPMENPLIYTLRNKEKEA 300 NOW9 gi17464665 ref gi1745.53.98 |ref 35 gi17445400 ref gi14423800 sp EISNMQMISFAKDTVLTYLTNFSASCPIFVITIENYCNLPQRKFP 357 gi13624329 ref TKIRNCKS ------320 US 2003/0170630 A1 Sep. 11, 2003

0250 Table 9E lists the domain description from tory bulb. Murine OR genes, MOR28, MOR10, and DOMAIN analysis results against NOV9. This indicates that MOR83, share 75-95% similarities in the amino acid the NOV9 sequence has properties similar to those of other Sequences and are tightly linked on . In Situ proteins known to contain this domain. hybridization has demonstrated that the three genes are

TABLE 9E Domain Analysis of NOV9 gnl Pfampfam0 0001, 7tm 1, 7 transmembrane receptor (rhodopsin family). (SEQ ID NO:114) CD-Length 254 residues, 100.0% aligned Score=88.2 bits (217), Expect=6e-19

Query: 41 GNKTIIWLSHLDPHLHNPMYFFFSNLSFLDLCYTTGIVPOLLWNLRGADKSISYGGCVWQ 1 OO +++ + i + + + Sbjct: 1 GNLLWILWILRTKKLRTPTNIFILLNLAWADLLFILLTLPPWALYYLWGGDWWFGDALCKLW 60

Query: 101 LYISLGLGSTECVLLGVMAFDRYAAVCRPLHYTVVMHPCLYVLMASTSWVIGFANSLLOT 160 -- -- + ++ i + + ---- + Sbjct: 61 GALFWWNGYASILLLTAISIDRYLAIWHPLRYRRIRTPRRAKWLILLWWWLALLLSLPPL 120

Query: 161 VLILLLTLCGRNKLEHFLCEVPPLLKLACVDTTMNESELFFVSVIILL VPVALIIFSYSQ 220 -- + ------++ ++ ++ Sbjct: 121 LFSWLRTVEEGNTTVCLID--FPEESVKRSYVLLSTLVGFV------LPLLWILWCYTR 171 Query: 221 IVRAVMR------IKSATGORKVFGTCGSHLTWWSLFYGTAIYAYLOPGNNYS---- 267 + + + +++ ++ -- -- Sbjct: 172 ILRTLRKRARSQRSLKRRSSSERKAAKMLLVVVVVFVLCWLPYHIVLLLDSLCLLSIWRV 231 Query: 268 QDQGKFISLFYTIITPMINPLIY 290 ++ + ++ Sbjct: 232 LPTALLITLWLAYVNSCLNPIIY 254

0251 G-Protein Coupled Receptor (GPCRs) have been expressed in the same Zone, at the most dorsolateral and identified as an extremely large family of protein receptors Ventromedial portions of the olfactory epithelium, and are in a number of Species. At the phylogenetic level they can be rarely expressed simultaneously in individual neurons. Fur classified into four major Subfamilies. These receptorS Share thermore, they have found that olfactory neurons expressing a Seven transmembrane domain Structure with many neu MOR28, MOR10, or MOR83 project their axons to very rotransmitter and hormone receptors. They are likely to be close but distinct Subsets of glomeruli on the medial and involved in the recognition and transduction of various lateral sides of the olfactory bulb. Similar results have been signals mediated by G-Proteins, hence their name G-Protein obtained with another murine OR gene cluster for A16 and Coupled Receptors. The human GPCR genes are generally MOR18 on chromosome 2, sharing 91% similarity in the intron-leSS and belong to four gene Subfamilies, displaying amino acid Sequences. These results may indicate an intrigu great Sequence variability. These genes are dominantly ing possibility that olfactory neurons expressing homolo expressed in olfactory epithelium. gous OR genes within a cluster tend to converge their axons 0252 Olfactory receptors (ORS) have been identified as to proximal but distinct Subsets of glomeruli. These lines of an extremely large family of GPCRs in a number of species. Study will Shed light on the molecular basis of topographical As members of the GPCR family, these receptors share a projection of olfactory neurons to the olfactory bulb. Seven transmembrane domain Structure with many neu rotransmitter and hormone receptors, and are likely to under 0255. The disclosed NOV9 nucleic acid of the invention lie the recognition and G-protein-mediated transduction of encoding a Olfactory Receptor-like protein includes the odorant signals. Like GPCRs, the ORS can be expressed in nucleic acid whose Sequence is provided in Table 9A, or a a variety of tissues where they are thought to be involved in fragment thereof. The invention also includes a mutant or recognition and transmission of a variety of Signals. The variant nucleic acid any of whose bases may be changed human OR genes are typically intron-leSS and belong to four from the corresponding base shown in Table 9A while still different gene Subfamilies, displaying great Sequence Vari encoding a protein that maintains its Olfactory Receptor-like ability. These genes are dominantly expressed in olfactory activities and physiological functions, or a fragment of Such epithelium. a nucleic acid. The invention further includes nucleic acids whose Sequences are complementary to those just described, 0253) A BLASTX of the Olfactory Receptor-like protein including nucleic acid fragments that are complementary to CG50271-01 described in this invention shows a 55% any of the nucleic acids just described. The invention (identities) and 72% (positives) similarity to a Mouse Odor additionally includes nucleic acids or nucleic acid frag ant Receptor MOR18 protein. ments, or complements thereto, whose Structures include 0254 Tsuboi et al. (J Neurosci 1999; 19:8409-18) char chemical modifications. Such modifications include, by way acterized two separate odorant receptor (OR) gene clusters of nonlimiting example, modified bases, and nucleic acids to examine how olfactory neurons expressing closely linked whose Sugar phosphate backbones are modified or deriva and homologous OR genes project their axons to the olfac tized. These modifications are carried out at least in part to US 2003/0170630 A1 Sep. 11, 2003

enhance the chemical Stability of the modified nucleic acid, Such that they may be used, for example, as antisense TABLE 10A binding nucleic acids in therapeutic applications in a Subject. NOV10 nucleotide sequence. 0256 The disclosed NOV9 protein of the invention ATGCTGCCCATCACAGACCGCCTGCTGCACCTCCTG (SEQ ID NO:35) includes the Olfactory Receptor-like protein whose sequence is provided in Table 9B. The invention also GGGCTGGAGAAGACGGCGTTCCGCATATACGCGGTG includes a mutant or variant protein any of whose residues TCCACCCTTCTCCTCTTCCTGCTCTTCTTCCTGTTC may be changed from the corresponding residue shown in Table 2 while Still encoding a protein that maintains its CGCCTGCTGCTGCGGTTCCTGAGGCTCTGCAGGAGC Olfactory Receptor-like activities and physiological func TTCTACATCACCTGCCGCCGGCTGCGCTGCTTCCCC tions, or a functional fragment thereof. In the mutant or variant protein, up to about 46% percent of the residues may CAGCCTCCCCGGCGCAACTGGCTGCTGGGCCACCTG be So changed. GGCATGTACCTTCCAAATGAGGCGGGCCTTCAAGAT 0257 The invention further encompasses antibodies and GAGAAGAAGGTACTGGACAACATGCACCATGTACTC antibody fragments, Such as F, or (F), that bind immu TTGGTATGGATGGGACCTGTCCTGCCGCTGTTGGTT nospecifically to any of the proteins of the invention. CTGGTGCACCCTGATTACATCAAACCCCTTTTGGGA 0258. The above defined information for this invention suggests that this Olfactory Receptor-like protein (NOV9) GCCTCAGCTGCCATCGCCCCCAAGGATGACCTCTTC may function as a member of a “Olfactory Receptor family”. TATGGCTTCCTAAAACCTTGGCTAGGGGATGGGCTG Therefore, the NOV9 nucleic acids and proteins identified here may be useful in potential therapeutic applications CTGCTCAGCAAAGGTGACAAGTGGAGCCGGCACCGT implicated in (but not limited to) various pathologies and CGCCTGCTGACACCCGCCTTCCACTTTGACATCCTG disorders as indicated below. The potential therapeutic appli AAGCCTTACATGAAGATCTTCAACCAGAGCGCTGAC cations for this invention include, but are not limited to: protein therapeutic, Small molecule drug target, antibody ATTATGCATGCTAAATGGCGGCATCTGGCAGAGGGC target (therapeutic, diagnostic, drug targeting/cytotoxic anti body), diagnostic and/or prognostic marker, gene therapy TCAGCGGTCTCCCTTGATATGTTTGAGCATATCAGC (gene delivery/gene ablation), research tools, tissue regen CTCATGACCCTGGACAGTCTTCAGAAATGTGTCTTC eration in Vivo and in Vitro of all tissues and cell types composing (but not limited to) those defined here. AGCTACAACAGCAACTGCCAAGAGAAGATGAGTGAT 0259. The NOV9 nucleic acids and proteins of the inven TATATCTCCGCTATCATTGAACTGAGCGCTCTGTCT tion are useful in potential therapeutic applications impli GTCCGGCGCCAGTATCGCTTGCACCACTACCTCGAC cated in various diseases and pathologies. TTCATTTACTACCGCTCGGCGGATGGGCGGAGGTTC

0260 NOV9 nucleic acids and polypeptides are further CGGCAGGCCTGTGACATGGTGCACCACTTCACCACT useful in the generation of antibodies that bind immuno specifically to the novel NOV9 Substances for use in thera GAAGTCATCCAGGAACGGCGGCGGGCACTGCGTCAG peutic or diagnostic methods. These antibodies may be generated according to methods known in the art, using CAGGGGGCCGAGGCCTGGCTTAAGGCCAAGCAGGGG prediction from hydrophobicity charts, as described in the AAGACCTTGGACTTTATTGATGTGCTGCTCCTGGCC "Anti-NOVX Antibodies' section below. The disclosed NOV9 protein has multiple hydrophilic regions, each of AGGGATGAAGATGGAAAGGAACTGTCAGACGAGGAT which can be used as an immunogen. These novel proteins ATCCGAGCCGAAGCAGACACCTTCATGTTTGAGGGT can be used in assay Systems for functional analysis of various human disorders, which will help in understanding CACGACACAACATCCAGTGGGATCTCTTGGATGCTG of pathology of the disease and development of new drug TTCAATTTGGCAAAGGATCCGGAATACCAGGAGAAA targets for various disorders. TGCCGAGAAGAGATTCAGGAAGTCATGAAAGGCCGG

0261) NOV10 GAGCTGGAGGAGCTCGAGTGGGACGATCTGACTCAG

0262. A disclosed NOV10 nucleic acid of 1596 nucle CTGCCCTTTACAACTATGTGCATTAAGGAGAGCCTG otides (also referred to as CG55844-01) encoding a novel P450-like protein is shown in Table 10A. An open reading CGCCAGTACCCACCTGTCACTCTTGTCTCTCGCCAA frame was identified beginning with an ATG initiation codon TGCACGGAGGACATCAAGCTCCCAGATGGGCGCATC at nucleotides 549-551 and ending with a TGA codon at nucleotides 1594-1596. A putative untranslated region ATCCCCAAAGGAATCATCTGCTTGGTCAGCATCTAT upstream from the initiation codon and downstream from the GGAACCCACCACAACCCCACAGTGTGGCCTGACTCC termination codon is underlined in Table 10A. The start and Stop codons are in bold letters. US 2003/0170630 A1 Sep. 11, 2003 70

TABLE 1 OA-continued TABLE 1 OB-continued NOV10 nucleotide sequence. Encoded NOV10 protein sequence.

AAGGTGTACAACCCCTACCGCTTTGACCCGGACAAC IMHAKWRHLAEGSAVSLDMFEHISLMTLDSLQKCVE CCACAGCAGCGCTCTCCACTGGCCTATGTGCCCTTC SYNSNCQEKMSDYISAIIELSALSWRROYRLHHYLD TCTGCAGGACCCAGGAATTGCATCGGACAGAGCTTC FIYYRSADGRRFROACDMWHHFTTEVIQERRRALRQ GCCATGGCCGAGTTGCGCGTGGTTGTGGCACTAACA QGAEAWLKAKQGKTLDFIDWLLLARDEDGKELSDED CTGCTACGTTTCCGCCTGAGCGTGGACCGAACGCGC IRAEADTFMFEGHDTTSSGISWMLFNLAKYPEYQEK AAGGTGCGGCGGAAGCCGGAGCTCATACTGCGCACG CREEIQEWMKGRELEELEWDDLTQLPFTTMCIKESL GAGAACGGGCTCTGGCTCAAGGTGGAGCCGCTGCCT RQYPPWTLVSRQCTEDIKLPDGRIIPKGIICLVSIY CCGCGGGCCGA GTHHNPTVWPDSKWYNPYRFDPDNPQQRSPLAYWPF 0263. In a search of public sequence databases, the SAGPRNCIGOSFAMAELRVVVALTLLRFRLSWDRTR NOV10 nucleic acid sequence, localized to chromosome 19, KWRRKIPELILRTENFLWLKWEPLPPRAX has 1111 of 1578 bases (70%) identical to agb:GENBANK ID:HSU02388acc:U02388.2 mRNA from Homo sapiens (Homo sapiens cytochrome P450 4F2 (CYP4F2) mRNA, 0265 A search of sequence databases reveals that the complete cds) (E=7.4e"). Public nucleotide databases NOV10 amino acid sequence has 339 of 505 amino acid include all GenBank databases and the GeneSeq patent residues (67%) identical to, and 415 of 505 amino acid database. residues (82%) similar to, the 520 amino acid residue 0264. The disclosed NOV10 polypeptide (SEQ ID NO: ptnr:SWISSPROT-ACC:P78329 protein from Homo sapi 36) encoded by SEQID NO:35 has 532 amino acid residues ens (Human) (Cytochrome P450 4F2 (EC 1.14.13.30) and is presented in Table 10B using the one-letter amino acid (CYPIVF2) (Leukotriene-B4 Omega-Hydroxylase) (Leu code. Signal P, Psort and/or Hydropathy results predict that kotriene-B4 20-Monooxygenase) (Cytochrome P450-LTB NOV10 has no signal peptide and is likely to be localized in Omega))(E=9.8e'). Public amino acid databases include the mitochondrial inner membrane with a certainty of 0.7491. In other embodiments, NOV10 may also be local the GenBank databases, SwissProt, PDB and PIR. ized to the plasma membrane with a certainty of 0.6000, the 0266 The Novel P450 disclosed in this invention is Golgi body with a certainty of 0.4000, or in the endoplasmic expressed in at least lung. This information was derived by reticulum (membrane) with a certainty of 0.3000. The most likely cleavage site for NOV10 is between positions 48 and determining the tissue Sources of the Sequences that were 49: CRS-FY. included in the invention including but not limited to Seq Calling Sources, Public EST Sources, Literature Sources, TABLE 10B and/or RACE Sources. Encoded NOV10 protein sequence. 0267 In addition, the sequence is predicted to be expressed in colon and liver because of the expression MLPITDRLLHLLGLEKTAFRIYAVSTLLLFLLFFLF (SEQ ID NO:36) pattern of (GENBANK-ID: gb:GENBANK RLLLRFLRLCRSFYITCRRLRCFPQPPRRNWLLGHL ID:HSUO2388acc:U02388.2) a closely related Homo sapi ens cytochrome P450 4F2 (CYP4F2) mRNA, complete cds GMYLPNEAGLQDEKKVLDNMHHVLLVWMGPWLPLLV homolog. LWHPDYIKPLLGASAAIAPKDDLFYGFLKPWLGDGL 0268. The disclosed NOV10 polypeptide has homology LLSKGDKWSRHRRLLTPAFHFDILKPYMKIFNQSAD to the amino acid sequences shown in the BLASTP data listed in Table 10C.

TABLE 10C

BLAST results for NOV10 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi14767705 refXP cytochrome P450, 520 309/481 378/481 O.O O29072.1 subfamily IVF, (64%) (78%) (XM 029072) polypeptide 3 Homo Sapiens gi2997737gb cytochrome P-450 520 305/481 379/481 O.O US 2003/0170630 A1 Sep. 11, 2003

TABLE 10C-continued

BLAST results for NOV10 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect AACO8589.1 Homo Sapiens (63%) (78%) (AFO54821) gi4503241|ref cytochrome P450, 520 308/481 378/481 NP 000887.1 subfamily IVF, (64%) (78%) (NM 000896) polypeptide 3: leukotriene B4 Omega hydroxylase; leukotriene-B4 20-monooxygenase; cytochrome P450 LTB-omega Homo Sapiens gi13435391 refNP cytochrome P450, 520 304/481 380/481 001073.3 subfamily IVF, (63%) (78%) (NM 001082) polypeptide 2: leukotriene B4 omega hydroxylase; leukotriene-B4 20-monooxygenase Homo Sapiens gi4519535ldbi Leukotriene B4 520 303/481 380/481 BAA75823.1 omega-hydroxylase (62%) (78%) (ABO15306) Homo Sapiens

0269. The homology between these and other sequences conserved sequence (i.e., regions that may be required to is shown graphically in the ClustalW analysis shown in preserve structural or functional properties), whereas non Table 10D. In the ClustalW alignment of the NOV10 highlighted amino acid residues are less conserved and can protein, as well as all other ClustalW analyses herein, the potentially be altered to a much broader extent without black outlined amino acid residues indicate regions of altering protein Structure or function.

US 2003/0170630 A1 Sep. 11, 2003 74

0270 Tables 01E-10F lists the domain description from complementary to those just described, including nucleic DOMAIN analysis results against NOV10. This indicates acid fragments that are complementary to any of the nucleic that the NOV10 sequence has properties similar to those of acids just described. The invention additionally includes other proteins known to contain this domain. nucleic acids or nucleic acid fragments, or complements

TABLE 1 OE Domain Analysis of NOV10 gnl Pfampfamo OO67, p. 450, Cytochrome P450. Cytochrome P450s are involved in the oxidative degradation of various compounds. Particularly well known for their role in the degradation of environmental toxins and mutagens. Structure is mostly alpha and binds a heme cofactor. (SEQ ID NO: 73) CD-Length = 4 45 residues, 80.0% aligned Score = 282 bits (722). Expect = 3e - 77 Query: 152 WSRHRRLLTPAFHFDILKPYMKIFNQSADIMHAKWRHLAEGSAVSLDMFEHISLMTLDSL 211 | + || || | + | + + + + ++ ++ + + Sbjct: 88 WRQLRRLLTLRF-FGMGKRS-KEERIQEEARDLVERLRKEQGSPIDITELLAPAPLNVI 145 Query: 212 QKCVFSYNSNCQEKMSDYISAIIELSALSVRRQYRLHHYLDFIYYRSADGRRFRQACDMV 271 + + ++ +++ ++ | | | | + + -- Sbjct: 146 CSLLFGVRFDYED--PEFLKLIDKLNELFFLVSPW-GQLLDFFRYLPGSHRKAFKAAKDL 2O2 Query: 272 HHFTTEVIQERRRALRQQGAEAWLKAKQGKTLDFIDWLLL-ARDEDGKELSDEDIRAEAD 330 + +++ + + + ++++ Sbjct: 203 KDYLDKLIEERRETLEP------GDPRDFLDSLLIEAKREGGSELTDEELKATWIL 251 Query: 331 TFMFEGHDTTSSGISWMLFNLAKYPEYQEKCREEIQEVMKGRELEELEWDDLTQLPFTTM 390 + | | | | | | + | + || + | | | | | | | | | + + ++ Sbjct: 252 DLLFAGTDTTSSTLSWALYLLAKHPEVQAKLREEIDEVI--GRDRSPTYDDRANMPYLDA 309 Query: 391 CIKESLRQYPPV-TLVSRQCTEDIKLPDGRIIPKGIICLVSIYGTHHNPTVWPDSKVYNP 449 + + + ++ i + i + +++ + ++ + ++ Sbict: 310 VIKETLRLHPVVPLLLPRVATEDTEI-DGYLIPKGTLVIVNLYSLHRDPKVFPNPEEFDP 368 Query: 450 YRFDPDNPQQRSPLAYVPFSAGPRNCIGQSFAMAELRVVVALTLLRFRLSVDRTRKVRRK 509 + + + ++ ++ + + + -- Sbjct: 369 ERFLDENGKFKKSYAFLPFGAGPRNCLGERLARMELFLFLATLLQRFELELVPPGDIPLT 428 Query: 510 PELILRTENGLWLKV 524 + + ---- Sbjct: 429 PKPLGLPSKPPLYQL 4 43

0271 The P450 gene Superfamily is a biologically thereto, whose structures include chemical modifications. diverse class of oxidase enzymes, members of the class are Such modifications include, by way of nonlimiting example, found in all organisms. P450 proteins are clinically and modified bases, and nucleic acids whose Sugar phosphate toxicologically important in humans, they are the principal backbones are modified or derivatized. These modifications enzymes in the metabolism of drugs and Xenobiotic com are carried out at least in part to enhance the chemical pounds, as well as in the Synthesis of cholesterol, Steroids stability of the modified nucleic acid, such that they may be and other lipids. Induction of some P450 genes can also be used, for example, as antisense binding nucleic acids in a risk factor for several types of cancer. This diversity of therapeutic applications in a Subject. In the mutant or variant function is mirrored in the diversity of nucleotide and nucleic acids, and their complements, up to about 30% protein sequences; there are currently over 100 human P450 percent of the bases may be So changed. forms described. Allelic forms of many cytochrome P450 0273) The disclosed NOV10 protein of the invention genes have been identified as causing quantitatively different includes the P450-like protein whose sequence is provided rates of drug metabolism, and hence are important to con in Table 10B. The invention also includes a mutant or variant sider in the development of Safe and effective human phar protein any of whose residues may be changed from the maceutical therapies. reviewed in E. Tanaka, J Clinical corresponding residue shown in Table 10B while still encod Pharmacy & Therapeutics 24:323-329, 1999). ing a protein that maintains its P450-like activities and 0272. The disclosed NOV10 nucleic acid of the invention physiological functions, or a functional fragment thereof. In encoding a P450-like protein includes the nucleic acid the mutant or variant protein, up to about 33% percent of the whose Sequence is provided in Table 10A or a fragment residues may be So changed. thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may be changed from the 0274 The invention further encompasses antibodies and corresponding base shown in Table 10A while still encoding antibody fragments, Such as F, or (F), that bind immu a protein that maintains its P450-like activities and physi nospecifically to any of the proteins of the invention. ological functions, or a fragment of Such a nucleic acid. The 0275. The above defined information for this invention invention further includes nucleic acids whose Sequences are suggests that this P450-like protein (NOV10) may function US 2003/0170630 A1 Sep. 11, 2003

as a member of a “P450 family”. Therefore, the NOV10 nucleic acids and proteins identified here may be useful in TABLE 11A-continued potential therapeutic applications implicated in (but not limited to) various pathologies and disorders as indicated NOV11 a nucleotide sequence. below. The potential therapeutic applications for this inven tion include, but are not limited to: protein therapeutic, Small CGTCAGAAACAGTGGCTTTCCAAGAAGTCCACCTAT molecule drug target, antibody target (therapeutic, diagnos tic, drug targeting/cytotoxic antibody), diagnostic and/or CAGGCATTATTGGATTCAGTCACAACAGATGAAGAC prognostic marker, gene therapy (gene delivery/gene abla AGCACCAGGTTCCAAATCATCAATGAAGCAAGTAAG tion), research tools, tissue regeneration in Vivo and in vitro of all tissues and cell types composing (but not limited to) GTGCCTCTCCTGGCTGAAATTTATGGTATAGAAGGA those defined here. AACATTTTCAGGCTTAAAATTAATGAAGAGACTCCT 0276. The NOV10 nucleic acids and proteins of the invention are useful in potential therapeutic applications CTAAAACCCAGATTTGAAGTTCCGGATGTCCTCACA implicated in cancer including but not limited to various AGCAAGCCAAGCACTGTAAGGATTTCATGCTCTGGG pathologies and disorders. GACACAGGCAGTCTGATATTGGCAGATGGAAAAGGA 0277 NOV10 nucleic acids and polypeptides are further useful in the generation of antibodies that bind immuno GACCTGAAGTGCCATATCACAGCAAACCCATTCAAG specifically to the novel NOV10 substances for use in therapeutic or diagnostic methods. These antibodies may be GTAGACTTGGTGTCTGAAGAAGAGGTTGTGATTAGC generated according to methods known in the art, using ATAAATTCCCTGGGCCAATTATACTTTGAGCATGGC prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies' section below. The disclosed AGGGCCCCTAGGGTCTCTTTCTCGGATAAGGTTAAT NOV10 protein has multiple hydrophilic regions, each of CTCACGCTTGGTAGCATATGGGATAAGATCAAGAAC which can be used as an immunogen. In one embodiment, a contemplated NOV10 epitope is from about amino acids 50 CTTTTCTCTAGGCAAGGATCAAAAGACCCAGCTGAG to 100. In another embodiment, a NOV10 epitope is from GGCGATGGGGCCCAGCCTGAGGAAACACCCAGGGAT about amino acids 120 to 180. In further embodiments, a NOV10 epitope is from about amino acids 200 to 420, from GGCGACAAGCCAGAGGAGACTCAGGGGAAGGCAGAG about amino acids 450 to 480, or from about amino acids 490 to 510. These novel proteins can be used in assay AAAGATGAGCCAGGAGCCTGGGAGGAGACATTCAAA. Systems for functional analysis of various human disorders, ACT CACTCTGACAGCAAGCCGTATGGCCCTTCTTCT which will help in understanding of pathology of the disease and development of new drug targets for various disorders. ATTGGTTTGGATTTCTCCTTGCATGGATTTGAGCAT 0278) NOV11 CTTTATGGGATCCCACAACATGCAGAATCACACCAA 0279 NOV11 includes three novel Integrin-like FG-GAP CTTAAAAATACTGGGGATGGAGATGCTTACCGTCTT domain containing novel protein-like proteins disclosed below. The disclosed sequences have been named NOV11 a TATAACCTGGATGTCTATGGATACCAAATATATGAT and NOV11b. AAAATGGGCATTTATGGTTCAGTACCTTATCTCCTG

0280 NOV11a GCCCACAAACTGGGCAGAACTATAGGTATTTTCTGG 0281) A disclosed NOV11 nucleic acid of 3025 nucle otides (also referred to as CG55752-01) encoding a novel CTGAATGCCTCGGAAACACTGGTGGAGATCAATACA Alpha Glucosidase 2, Alpha Neutral Subunit-like protein is GAGCCTGCAGGGATAGTCATCTTTGGTCCTGTCTCT shown in Table 11A. An open reading frame was identified beginning with an ATG initiation codon at nucleotides 28-30 TTGATTTATCAAAGCCAGGGAGATACACCTCTAACA and ending with a TGA codon at nucleotides 2929-2931. A ACTCATGTGCACTGGATGTCAGAGAGTGGCATCATT putative untranslated region upstream from the initiation codon is underlined in Table 11A. The start and stop codons GATGTTTTTCTGCTGACAGGACCTACACCTTCTGAT are in bold letters. GTCTTCAAACAGTACTCACACCTTACAGGTACACAA

TABLE 11A GCCATGCCCCCTCTTTTCTCTTTGGGATACCACCAG NOV11 a nucleotide sequence. TGCCGCTGGAACTATGAAGATGAGCAGGATGTAAAA ACAGGTGCCTGGGGGTCAGGCTTCCGCATGCGGGCT (SEQ ID NO:37) GCAGTGGATGCAGGGTTTGATGAGCATGACATTCCT

GCAGTTGCTGGCATTGCCTTCCGCAGGAGGCGTCAG TATGATGCCATGTGGCTGGACATAGAGCACACTGAG

AAACAGTGGCTTTCCAAGAAGTCCACCTATCAGGCA GGCAAGAGGTACTTCACCTGGGACAAAAACAGATTC

TTATTGGATTCAGTCACAACAGATGAAGACAGCACC CCAAACCCCAAGAGGATGCAAGAGCTGCTCAGGAGC

AGGTTCCAAATCATCAATGAAGCAAGTAAGGTGAGG AAAAAGCGTAAGCTTGTGGTCATCAGTGATCCCCAC US 2003/0170630 A1 Sep. 11, 2003 76

TABLE 11A-continued TABLE 11A-continued NOV11 a nucleotide sequence. NOV11 a nucleotide sequence.

ATCAAGATTGAACCTGACTACTCAGTATATGTGAAG TTTAGTCATTGTGCCAAAACATCCATCCTGAGCCTG

GCCAAAGATCAGGGCTTCTTTGTGAAGAATCAGGAA GAGAAGCTCTCACTCAACATTGCCACTGACTGGGAG

GGGGAAGACTTTGAAGGGGTGTGTTGGCCAGGTATG GTCCGCATCATAGACAAAGAACTGCCCCTGGTGAT

AAATCATACCTGGATTTCACCAATCCCAAGGTCAGA GTGAGCAGGGACCTGCCTGCCCCTTTCAACCTTTCC

GAGTGGTATTCAAGTATGTTCAGTTCCAATTGTGAT CCTCACCTTTTTTGAGATTTTTGCTGCAATCTGTTT

GGATCTACGGACATCCTCTTCCTTTGGAATGACATG G

AATGAGCCTTCTGTCTTTAGAGGGCCAGAGCAAACC

ATGCAGAAGAATGCCATTCATCATGGCAATTGGGAG 0282. In a search of public sequence databases, the NOV11 a nucleic acid Sequence, located on chromosome 15 CACAGAGAGCTCCACAACATCTACGGTTTTTATATG has 1839 of 2742 bases (67%) identical to agb:GENBANK ID:AF144074acc:AF144074.1 mRNA from Homo Sapiens GCTACTGCAGAAGGACTGATAAAACGATCTAAAGGG (Homo Sapiens glucosidase II alpha Subunit mRNA, com AAGGAGAGACCCTTTGTTCTTACACGTTCTTTCTTT plete cds) (E=2.7e'). Public nucleotide databases include all GenBank databases and the GeneSeq patent database. GCTGGATCACAAAAGTATGGTGCCGTGTGGACAGGC 0283) The disclosed NOV11a polypeptide (SEQ ID NO: GACAACACAGCAGAATGGAGCAACTTGAAAATTTCT 38) encoded by SEQID NO:37 has 967 amino acid residues and is presented in Table 11B using the one-letter amino acid ATCCCAATGTTACT CACTCTCAGCATTACTGGGATC code. Signal P, Psort and/or Hydropathy results predict that TCTTTTTGCGGAGCTGACATAGGCGGGTTCATTGGG NOV11 a has no signal peptide and is likely to be localized in the microbody (peroxisome) with a certainty of 0.7480. AATCCAGAGACAGAGCTGCTAGTGCGTTGGTACCAG hn other embodiments, NOV11a may also be localized to the GCTGGAGCCTACCAGCCCTTCTTCCGTGGCCATGCC mitochondrial inner membrane with acertainty of 0.7070, the mitochondrial intermembrane Space with a certainty of ACCATGAACACCAAGCGACGAGAGCCCTGGCTCTTT 0.6143, or in the mitochondrial matrix space with a certainty of O.5762. GGGGAGGAACACACCCGACTCATCCGAGAAGCCATC

AGAGAGCGCTATGGCCTCCTGCCATATTGGTATTCT TABLE 11B CTGTTCTACCATGCACACGTGGCTTCCCAACCTGTC Encoded NOV11a protein sequence. ATGAGGCCTCTGTGGGTAGAGTTCCCTGATGAACTA MRAAVAGIAFRRRRQKQWLSKKSTYQALLDSVTTDE (SEQ ID NO:38)

AAGACTTTTGATATGGAAGATGAATACATGTTAGGG DSTRFQIINEASKVRROKQWLSKKSTYQALLDSWTT

AGTGCATTATTGGTTCATCCAGTCACAGAACCAAAA DEDSTRFQIINEASKWPLLAEIYGIEGNIFRLKINE

GCCACCACAGTTGATGTGTTTCTTCCAGGATCAAAT ETPLKPRFEWPIDWLTSKPSTWRISCSGDTGSLILAD

GAGGTAGTCTGGTATGACTATAAGACATTTGCTCAT GKGDLKCHITANPFKVDLVSEEEWVISINSLGOLYF

TGGGAAGGAGGGTGTACTGTAAAGATCCCAGTACTG

TTACAGATTCCAGTGTTTCAGCGAGGTGGAAGTGTG PAEGDGAOPEETPRDGDKPEETOGKAEKDEPGAWEE

ATACCAATAAAGACAACTGTAGGAAAATCCACAGGC TFKTHSDSKPYGPSSIGLDFSLHGFEHLYGIPQHAE

TGGATGACTGAATCCTCCTATGGACTCCGGGTTGCT SHOLKNTGDGDAYRLYNLDWYGYQIYDKMGIYGSWP

CTAAGCACTCTCCAGGGTTCTTCAGTGGGTGAGTTA YLLAHKLGRTIGIFWLNASETILWEINTEPAGIWIFG

TATCTTGATGATGGCCATTCATTCCAATACCTCCAC PWSLIYQSQGDTPLTTHVHWMSESGIIDWFLLTGPT

CAGAAGCAATTTTTGCACAGGAAGTTTTCATTCTGT PSDWFKOYSHLTGTQAMPPLFSLGYHQCRWNYEDEQ

TCCAGTGTTCTGGTGGCCTCCTCTCCAGTATCTCAA DWKAWDAGFDEHDIPYDAMWLDIEHTEGKRYFTWDK

GGACACTTACATACCCCACTCAGCATGACAAAAGCC NRFPNPKRMOELLRSKKRKLWWISDPHIKIEPDYSW

CTGCTTTTCACTGTATCGTCTCCAGCCAGCGTGAAA

ATGCGGCTTCACTACAGCCCAGAGAAAAGGGCCAGG KWREWYSSMFSSNCDGSTDILFLWNDMNEPSWFRGP US 2003/0170630 A1 Sep. 11, 2003 77

TABLE 11B-continued TABLE 11C Encoded NOV11a protein sequence. NOV11b nucleotide sequence. AACGCTAGTTTGGGCCTGAAAAATTCCAGGAGCAAG (SEQ ID NO:39)

EQTMQKNAIHHGNWEHRELHNIYGFYMATAEGLIKR AGTCAAGATTTGTCACTCCATGAGAATCTGGAGGGG

SKGKERPFWLTRSFFAGSOKYCAVWTGDNTAEWSNL ACTCCCTTCCCAGAAACTTGACGATGAAGTACTGGT

KISIPMLLTLSITGISFCGADIGGFIGNPETELLWR TGTAATTTTAGAAAGACACCCAATCGGCTTTTTTAA

WYOAGAYOPFFRGHATMNTKRREPWLFGEEHTRLIR AAGATCGCCCAGGGCCCTTGTCCTGAGAGCTGGGAG CTGGTCGGAGTGACAGAGAAGCCAGGAAGCAGCAG EAIRERYGLLPYWYSLFYHAHVASQPWMRPLWWEFP TGAAAGAGGAAATAAGTGTTGAAGATGAAGCTGTAG DELKTFDMEDEYMLGSALLWHPWTEPKATTWDWFLP ATAAAAACATTTTCAGAGACTGTAACAAGATCGCAT GSNEWWWYDYKTFAHWEGGCTWKIPWLLQIPWFORG TTTACAGGCGTCAGAAACAGTGGCTTTCCAAGAAGT GSWIPIKTTVGKSTGWMTESSYGLRVALSTLQGSSW CCACCTATCAGGCATTATTGGATTCAGTCACAACAG

GELYLDDGHSFOYLHQKQFLHRKFSFCSSWLVASSP ATGAAGACAGCACCAGGTTCCAAATCATCAATGAAG

WSQGHLHTPLSMTKALLFTWSSPASVKMRLHYSPEK CAAGTAAGGTTCCTCTCCTGGCTGAAATTTATGGTA

RARFSHCAKTSILSLEKLSLNIATDWEWRII TAGAAGGAAACATTTTCAGGCTTAAAATTAACGAAG

AGACTCCTCTAAAACCCAGATTTGAAGTTCCGGATG 0284. A search of sequence databases reveals that the TCCTCACAAGCAAGCCAAGCACTGTAAGGCTGATTT NOV11a amino acid sequence has 551 of 964 amino acid CATGCTCTGGGGACACAGGCAGTCTGATATTGGCAG residues (57%) identical to, and 709 of 964 amino acid residues (73%) similar to, the 966 amino acid residue ATGGAAAAGGAGACCTGAAGTGCCATATCACAGCAA ptnr:SPTREMBL-ACC:Q9POXO protein from Homo sapi ACCCATTCAAGGTAGACTTGGTGTCTGAAGAAGAGG ens (Human) (Glucosidase II Alpha Subunit) (E=9.7e). Public amino acid databases include the GenBank databases, TTGTGATTAGCATAAATTCCCTGGGCCAATTATACT SwissProt, PDB and PIR. TTGAGCATCTACAGATTCTTCACAAACAAAGAGCTG 0285) NOV11a is expressed in at least Adrenal Gland/ CTAAAGAAAATGAGGAGGAGACATCAGTGGACACCT Suprarenal gland, Aorta, Brain, Hippocampus, Kidney, Lung, Lymph node, Ovary, Parathyroid Gland, ProState, CTCAGGAAAATCAAGAAGATCTGGGCCTGTGGGAAG Salivary Glands, Thyroid, Tonsils, Trachea, Uterus, Whole AGAAATTTGGAAAATTTGTGGATATCAAAGCTAATG Organism. This information was derived by determining the GCCCTTCTTCTATTGGTTTGGATTTCTCCTTGCATG tissue Sources of the Sequences that were included in the invention including but not limited to SeqCalling Sources, GATTTGAGCATCTTTATGGGATCCCACAACATGCAG Public EST Sources, Literature sources, and/or RACE AATCACACCAACTTAAAAATACTGGTGATGGAGATG SOUCCS. CTTACCGTCTTTATAACCTGGATGTCTATGGATACC 0286. In addition, the sequence is predicted to be expressed in Brain, Hippocampus, Kidney, Lung because of AAATATATGATAAAATGGGCATTTATGGTTCAGTAC the expression pattern of (GENBANK-ID:gb:GENBANK CTTATCTCCTGGCCCACAAACTGGGCAGAACTATAG ID:AF144074acc: AF144074.1) a closely related Homo Sapiens glucosidase II alpha Subunit mRNA, complete cds GTATTTTCTGGCTGAATGCCTCGGAAACACTGGTGG homolog. AGATCAATACAGAGCCTGCAGTAGAGTACACACTGA 0287). NOV11b CCCAGATGGGCCCAGTTGCTGCTAAACAAAAGGTCA, 0288 A disclosed NOV11b nucleic acid of 4483 nucle GATCTCGCACTCATGTGCACTGGATGTCAGAGAGTG otides (also referred to as CG55752-02) encoding a novel GCATCATTGATGTTTTTCTGCTGACAGGACCTACAC Alpha Glucosidase 2-like protein is shown in Table 11C. An open reading frame was identified beginning with an ATG CTTCTGATGTCTTCAAACAGTACT CACACCTTACAG initiation codon at nucleotides 204-206 and ending with a GCACACAAGCCATGCCCCCTCTTTTCTCTTTGGGAT TGA codon at nucleotides 2946-2948. A putative untrans ACCACCAGTGCCGCTGGAACTATGAAGATGAGCAGG lated region upstream from the initiation codon is underlined in Table 11C. The start and stop codons are in bold letters. US 2003/0170630 A1 Sep. 11, 2003 78

TABLE 11C-continued TABLE 11C-continued NOV11b nucleotide sequence. NOV11b nucleotide sequence.

ATGTAAAAGCAGTGGATGCAGGGTTTGATGAGCATG TCCAATACCTCCACCAGAAGCAATTTTTGCACAGGA

ACATTCCTTATGATGCCATGTGGCTGGACATAGAGC AGTTTTCATTCTGTTCCAGTGTTCTGATCAATAGTT

ACACTGAGGGCAAGAGGTACTTCACCTGGGACAAAA TTGCTGACCAGAGGGGTCATTATCCCAGCAAGTGTG

ACAGATTCCCAAACCCCAAGAGGATGCAAGAGCTGC TGGTGGAGAAGATCTTGGTCTTAGGCTTCAGGAAGG

TCAGGAGCAAAAAGCGTAAGCTTGTGGTCATCAGTG AGCCATCTTCTGTGACTACCCACTCATCTGATGGTA

ATCCCCACATCAAGATTGATCCTGACTACTCAGTAT AAGATCAGCCTGTGGCTTTTACGTATTGTGCCAAAA

ATGTGAAGGCCAAAGATCAGGGCTTCTTTGTGAAGA CATCCATCCTGAGCCTGGAGAAGCTCTCACTCAACA

ATCAGGAAGGGGAAGACTTTGAAGGGGTGTGTTGGC TTGCCACTGACTGGGAGGTCCGCATCATAGACAAA.

CAGGTCTCTCCTCTTACCTGGATTTCACCAATCCCA GAACTGCCCCTGGTGATGTGAGCAGGGACCTGCCTG

AGGTCAGAGAGTGGTATTCAAGTCTTTTTGCTTTCC CCCCTTTCAACCTTTCCCCTCACCTTTTTTGAGATT

CTGTTTATCAGGGATCTACGGACATCCTCTTCCTTT TTTGCTGCAATCTGTTTGCCTTCCCTGAATCAAAAT

GGAATGACATGAATGAGCCTTCTGTCTTTAGAGGGC AATCTTTCATTCGTCACCATTATACTAATGAACAAT

CAGAGCAAACCATGCAGAAGAATGCCATTCATCATG AGATTTCATGTTTCAAAATTTCAGATTTTACATGTT

GCAATTGGGAGCACAGAGAGCTCCACAACATCTACG AAGATGTACTAACAATATTCCTTGTATCAAACATCT

GTTTTTATCATCAAATGGCTACTGCAGAAGGACTGA CCTTTTCTCCCTGATACATAGCCCTGAGACATTTAT

TAAAACGATCTAAAGGGAAGGAGAGACCCTTTGTTC AGCGTTCAGGAGTCTTCTATTGCTTCCATTCCTTCA

TTACACGTTCTTTCTTTGCTGGATCACAAAAGTATG GCAGGGCTGCGTGGGTCTGTTTTAACGTGGGCCAAG

GTGCCGTGTGGACAGGCGACAACACAGCAGAATGGA CCTACCTGGGCAGCCCATTTGCCAGGGCTTGCCTCA

GCAACTTGAAAATTTCTATCCCAATGTTACT CACTC GGCCATGCAGCATTGGCGCTCTGGCTGCAGCAGCTG

TCAGCATTACTGGGATCTCTTTTTGCGGAGCTGACA AGTTGCTCAAGGCCAGTGTCCAAGTGGACAGCAGCC

TAGGCGGGTTCATTGGGAATCCAGAGACAGAGCTGC TCTGGTACTCCCCCCAGTTATCTTCCACCCACATGG

TAGTGCGTTGGTACCAGGCTGGAGCCTACCAGCCCT ACTGGGCAGAGCAGCCCTCTTCTGTGTGCACTGCAT

TCTTCCGTGGCCATGCCACCATGAACACCAAGCGAC ACGCTGCAGCCGTGGGAGTTATTCTCCCCTAGAGAT

GAGAGCCCTGGCTCTTTGGGGAGGAACACACCCGAC CGACTTGGCAGCACGAAGGATTCTTTTCTCTTTCAT

TCATCCGAGAAGCCATCAGAGAGCGCTATGGCCTCC GCTTCTCAGGCTCAATAGTTTCTAATTAATCTTAAA

TGCCATATTGGTATTCTCTGTTCTACCATGCACACG ATCCATGTCTTTTACATTGTTTTTTTAATTAAGTGC

TGGCTTCCCAACCTGTCATGAGGCCTCTGTGGGTAG TGTTTACTAACCAAATAATATTTATAACATGAGTAA

AGTTCCCTGATGAACTAAAGACTTTTGATATGGAAG GCTATAATTAATAACAATGAAATAAATACCCATGTA

ATGAATACATGCTGGGGAGTGCATTATTGGTTCATC CCCACCACTGGACTTCAGAAGTAGAACTCATGACTG

CAGTCACAGAACCAAAAGCCACCACAGTTGATGTGT GGACTAGGATGAGGCAAGGGAGACCCTGGCCTTGGG

TTCTTCCAGGATCAAATGAGGTCTGGTATGACTATA CACAAAATGTAAGGGATGCCAAAAAAATACAGTAAT

AGACATTTGCTCATTGGGAAGGAGGGTGTACTGTAA CAAAGTAAGTAATATTTCAATCCAATATTTTTAAAA

AGATCCCAGTAGCCTTGGACACTATTCCAGTGTTTC ATCAGAATTAATGCAAAAAAAACCATGATGAACAAA.

AGCGAGGTGGAAGTGTGATACCAATAAAGACAACTG ATATTAAAATTTAAAATAAAGACAGGATTAGTATTA.

TAGGAAAATCCACAGGCTGGATGACTGAATCCTCCT CTGAGTTTTCCTTTTGTCCCAGGCTTTAATATGGCT

ATGGACTCCGGGTTGCTCTAAGCACTAAGGGTTCTT TGGCATGGGGCAGAACATTACAACATACCAGTCGTG

CAGTGGGTGAGTTATATCTTGATGATGGCCATTCAT TCATGGTGCCCAAGGCTCCACAGACCTCAGTGGCTC US 2003/0170630 A1 Sep. 11, 2003 79

TABLE 11C-continued TABLE 11D-continued NOV11b nucleotide sequence. Encoded NOV11b protein sequence. CCTGCTGCCTGCCACAGCATCTGTTTTAGCAGCCTC WYGYQIYDKMGIYGSVPYLLAHKLGRTIGIFWLNAS

GACTCCTCAGCACTCCTCAGCACACACCTCTTCTTA ETLVEINTEPAVEYTLTQMGPWAAKQKVRSRTHWHW

TCAGGCTTCCTCCACTTAGCAACTTGCTAACGGCCA MSESGIIDWFLLTGPTPSDWFKOYSHLTGTQAMPPL

CCTCTGTGCCTTCTGATCCCTGGGCGCCAATATCCT FSLGYHQCRWNYEDEQDWKAVDAGFDEHDIPYDAMW

CCTGCCCTTACCATCCTTCCAGGCCCAACTTAAATC LDIEHTEGKRYFTWDKNRFPNPKRMOELLRSKKRKL

CCACTTTCCCATGAAGCCTAACTGCGTGAACACCCC WWISDPHIKIDPDYSVYWKAKDQGFFWKNQEGEDFE

TACCCCCATACCCATTAGCAGTGATTTTGCCCTTCC GWCWPGLSSYLDFTNPKVREWYSSLFAFPWYOGSTD

CCGTAATGCTGTCCCACTTATAACTGTGCTCTACTT ILFLWNDMNEPSWFRGPEQTMQKNAIHHGNWEHREL

AGCATTCTCAGGGATCATACCTTAATGTTTTCAGTA HNIYGFYHQMATAEGLIKRSKGKERPFWLTRSFFAG

TGTCTGCGTTCTCCTACTAGATTGTATGTCCCTCAA SQKYGAVWTGDNTAEWSNLKISIPMLLTLSITGISF

GAGCATGTTCTGTTTCTCTTCTGTCTGACAGAGCAC CGADIGGFIGNPETELLWRWYOAGAYOPFFRGHATM

TATTATACCTGACTTTCAGTAACTGTTAGCTGTGAT NTKRREPWLFGEEHTRLIREAIRERYGLLPYWYSILF

TAGTTAGCTGGTGGATTTAATTGATTAAAAAATTAC YHAHVASQPWMRPLWWEFPDELKTFDMEDEYMLGSA

GATTGAATGTAAAAAAAAA LLWHPWTEPKATTWDWFLPGSNEWWYDYKTFAHWEG

GCTVKIPVALDTIPVFORGGSVIPIKTTVGKSTGWM 0289. In a search of public sequence databases, the TESSYGLRVALSTKGSSVGELYLDDGHSFQYLHQKQ NOV11b nucleic acid sequence, located on chromosome 15 has 1459 of 2258 bases (64%) identical to agb:GENBANK FLHRKFSFCSSVLINSFADQRGHYPSKCVWEKILVL ID:MMU927.93|acc:U92793.1 mRNA from Mus musculus GFRKEPSSWTTHSSDGKDQPWAFTYCAKTSILSLEK (Mus musculus alpha glucosidase II alpha Subunit mRNA, complete cds) (E=7.2e'). Public nucleotide databases LSLNIATDWEWRII include all GenBank databases and the GeneSeq patent database. 0291. A search of sequence databases reveals that the 0290 The disclosed NOV11b polypeptide (SEQ ID NO: NOV11b amino acid sequence has 466 of 912 amino acid 40) encoded by SEQID NO:39 has 914 amino acid residues residues (51%) identical to, and 640 of 912 amino acid and is presented in Table 11D using the one-letter amino acid residues (70%) similar to, the 944 amino acid residue code. Signal P, Psort and/or Hydropathy results predict that ptnr:SPTREMBL-ACC:P79403 protein from Sus scrofa NOV11b has no signal peptide and is likely to be localized (Pig) (Glucosidase II) (E=7.1e'). Public amino acid data in the endoplasmic reticulum (membrane) with a certainty of bases include the GenBank databases, SwissProt, PDB and 0.8500. In other embodiments, NOV11b may also be local PIR. ized to the microbody (peroxisome) with a certainty of 0292 NOV11b is expressed in at least Adrenal Gland/ 0.7480, the plasma membrane with a certainty of 0.4400, or Suprarenal gland, Aorta, Brain, Hippocampus, Kidney, in the mitochondrial inner membrane with a certainty of Lung, Lymph node, Ovary, Parathyroid Gland, Prostate, O.1OOO. Salivary Glands, Thyroid, Tonsils, Trachea, Uterus. Expres Sion information was derived from the tissue Sources of the TABLE 11D Sequences that were included in the derivation of the Encoded NOV11b protein sequence. sequence of NOV11b. The sequence is predicted to be expressed in T cells because of the expression pattern of MEAAVKEEISVEDEAVDKNIFRDCNKIAFYRRQKQW (SEQ ID NO: 40) (GENBANK-ID: gb:GENBANK ID:MMU92793|acc:U92793.1) a closely related Mus mus LSKKSTYQALLDSWTTDEDSTRFQIINEASKWPLLA culus alpha glucosidase II alpha Subunit mRNA, complete EIYGIEGNIFRLKINEETPLKPRFEWPIDWLTSKPST cds. WRLISCSGDTGSLILADGKGDLKCHITANPFKWDLW 0293) NOV11c SEEEWVISINSLGOLYFEHLOILHKQRAAKENEEET 0294. A disclosed NOV11c nucleic acid of 3015 nucle otides (also referred to as CG55752-03) encoding a novel SWDTSQENQEDLGLWEEKFGKFWDIKANGPSSIGLD Glucosidase II-like protein is shown in Table 11E. An open FSLHGFEHLYGIPQHAESHOLKNTGDGDAYRLYNLD reading frame was identified beginning with an ATG initia tion codon at nucleotides 204-206 and ending with a TGA codon at nucleotides 2946-2948. A putative untranslated US 2003/0170630 A1 Sep. 11, 2003 80 region upstream from the initiation codon is underlined in Table 11E. The start and stop codons are in bold letters. TABLE 11A-continued

TABLE 11A NOV11c nucleotide sequence. NOV11c nucleotide sequence. ACCACCAGTGCCGCTGGAACTATGAAGATGAGCAGG

AACGCTAGTTTGGGCCTGAAAAATTCCAGGAGCAAG (SEQ ID NO:41) ATGTAAAAGCAGTGGATGCAGGGTTTGATGAGCATG

AGTCAAGATTTGTCACTCCATGAGAATCTGGAGGGG ACATTCCTTATGATGCCATGTGGCTGGACATAGAGC

ACTCCCTTCCCAGAAACTTGACGATGAAGTACTGGT ACACTGAGGGCAAGAGGTACTTCACCTGGGACAAAA

TGTAATTTTAGAAAGACACCCAATCGGCTTTTTTAA ACAGATTCCCAAACCCCAAGAGGATGCAAGAGCTGC

AAGATCGCCCAGGGCCCTTGTCCTGAGAGCTGGGAG TCAGGAGCAAAAAGCGTAAGCTTGTGGTCATCAGTG

CTGGTCGGAGTGACAGAGAAGCCAGGAAGCAGCAG ATCCCCACATCAAGATTGATCCTGACTACTCAGTAT

TGAAAGAGGAAATAAGTGTTGAAGATGAAGCTGTAG ATGTGAAGGCCAAAGATCAGGGCTTCTTTGTGAAGA

ATAAAAACATTTTCAGAGACTGTAACAAGATCGCAT ATCAGGAAGGGGAAGACTTTGAAGGGGTGTGTTGGC

TTTACAGGCGTCAGAAACAGTGGCTTTCCAAGAAGT CAGGTCTCTCCTCTTACCTGGATTTCACCAATCCCA

CCACCTATCAGGCATTATTGGATTCAGTCACAACAG AGGTCAGAGAGTGGTATTCAAGTCTTTTTGCTTTCC

ATGAAGACAGCACCAGGTTCCAAATCATCAATGAAG CTGTTTATCAGGGATCTACGGACATCCTCTTCCTTT

CAAGTAAGGTTCCTCTCCTGGCTGAAATTTATGGTA GGAATGACATGAATGAGCCTTCTGTCTTTAGAGGGC

TAGAAGGAAACATTTTCAGGCTTAAAATTAACGAAG CAGAGCAAACCATGCAGAAGAATGCCATTCATCATG

AGACTCCTCTAAAACCCAGATTTGAAGTTCCGGATG GCAATTGGGAGCACAGAGAGCTCCACAACATCTACG

TCCTCACAAGCAAGCCAAGCACTGTAAGGCTGATTT GTTTTTATCATCAAATGGCTACTGCAGAAGGACTGA

CATGCTCTGGGGACACAGGCAGTCTGATATTGGCAG TAAAACGATCTAAAGGGAAGGAGAGACCCTTTGTTC

ATGGAAAAGGAGACCTGAAGTGCCATATCACAGCAA TTACACGTTCTTTCTTTGCTGGATCACAAAAGTATG

ACCCATTCAAGGTAGACTTGGTGTCTGAAGAAGAGG GTGCCGTGTGGACAGGCGACAACACAGCAGAATGGA

TTGTGATTAGCATAAATTCCCTGGGCCAATTATACT GCAACTTGAAAATTTCTATCCCAATGTTACT CACTC

TTGAGCATCTACAGATTCTTCACAAACAAAGAGCTG TCAGCATTACTGGGGTCTCTTTTTGCGGAGCTGACA

CTAAAGAAAATGAGGAGGAGACATCAGTGGACACCT TAGGCGGGTTCATTGGGAATCCAGAGACAGAGCTGC

CTCAGGAAAATCAAGAAGATCTGGGCCTGTGGGAAG TAGTGCGTTGGTACCAGGCTGGAGCCTACCAGCCCT

AGAAATTTGGAAAATTTGTGGATATCAAAGCTAATG TCTTCCGTGGCCATGCCACCATGAACACCAAGCGAC

GCCCTTCTTCTATTGGTTTGGATTTCTCCTTGCATG GAGAGCCCTGGCTCTTTGGGGAGGAACACACCCGAC

GATTTGAGCATCTTTATGGGATCCCACAACATGCAG TCATCCGAGAAGCCATCAGAGAGCGCTATGGCCTCC

AATCACACCAACTTAAAAATACTGGTGATGGAGATG TGCCATATTGGTATTCTCTGTTCTACCATGCACACG

CTTACCGTCTTTATAACCTGGATGTCTATGGATACC TGGCTTCCCAACCTGTCATGAGGCCTCTGTGGGTAG

AAATATATGATAAAATGGGCATTTATGGTTCAGTAC AGTTCCCTGATGAACTAAAGACTTTTGATATGGAAG

CTTATCTCCTGGCCCACAAACTGGGCAGAACTATAG ATGAATACATGCTGGGGAGTGCATTATTGGTTCATC

GTATTTTCTGGCTGAATGCCTCGGAAACACTGGTGG CAGTCACAGAACCAAAAGCCACCACAGTTGATGTGT

AGATCAATACAGAGCCTGCAGTAGAGTACACACTGA TTCTTCCAGGATCAAATGAGGTCTGGTATGACTATA

CCCAGATGGGCCCAGTTGCTGCTAAACAAAAGGTCG AGACATTTGCTCATTGGGAAGGAGGGTGTACTGTAA

GATCTCGCACTCATGTGCACTGGATGTCAGAGAGTG AGATCCCAGTAGCCTTGGACACTATTCCAGTGTTTC

GCATCATTGATGTTTTTCTGCTGACAGGACCTACAC AGCGAGGTGGAAGTGTGATACCAATAAAGACAACTG

CTTCTGATGTCTTCAAACAGTACT CACACCTTACAG TAGGAAAATCCACAGGCTGGATGACTGAATCCTCCT

GCACACAAGCCATGCCCCCTCTTTTCTCTTTGGGAT ATGGACTCCGGGTTGCTCTAAGCACTAAGGGTTCTT US 2003/0170630 A1 Sep. 11, 2003

TABLE 11A-continued TABLE 1.1F-continued NOV11c nucleotide sequence. Encoded NOV11c protein sequence.

CAGTGGGTGAGTTATATCTTGATGATGGCCATTCAT LDIEHTEGKRYFTWDKNRFPNPKRMOELLRSKKRKL

TCCAATACCTCCACCAGAAGCAATTTTTGCACAGGA WWISDPHIKIDPDYSVYWKAKDQGFFWKNQEGEDFE

AGTTTTCATTCTGTTCCAGTGTTCTGATCAATAGTT GWCWPGLSSYLDFTNPKVREWYSSLFAFPWYOGSTD

TTGCTGACCAGAGGGGTCACTATCCCAGCAAGTGTG ILFLWNDMNEPSWFRGPEQTMQKNAIHHGNWEHREL

TGGTGGAGAAGATCTTGGTCTTAGGCTTCAGGAAGG HNIYGFYHQMATAEGLIKRSKGKERPFWLTRSFFAG

AGCCATCTTCTGTGACTACCCACTCATCTGATGGTA SQKYGAVWTGDNTAEWSNLKISIPMLLTLSITGVSF

AAGATCAGCCTGTGGCTTTTACGTATTGTGCCAAAA CGADIGGFIGNPETELLWRWYOAGAYOPFFRGHATM

CATCCATCCTGAGCCTGGAGAAGCTCTCACTCAACA NTKRREPWLFGEEHTRLIREAIRERYGLLPYWYSILF

TTGCCACTGACTGGGAGGTCCGCATCATATGACAAA YHAHVASQPWMRPLWWEFPDELKTFDMEDEYMLGSA

GAACTGCCCCTGGTGATGTGAGCAGGGACCTGCCTG LLWHPWTEPKATTWDWFLPGSNEWWYDYKTFAHWEG

CCCCTTTCAACCTTTCCCCTCACCTTT GCTVKIPVALDTIPVFORGGSVIPIKTTVGKSTGWM

TESSYGLRVALSTKGSSVGELYLDDGHSFQYLHQKQ

0295). In a search of public sequence databases, the FLHRKFSFCSSWLINSFADQRGHYPSKCVWEKILVL NOV11c nucleic acid Sequence, located on chromosome 15 has 1459 of 2258 bases (64%) identical to agb:GENBANK GFRKEPSSWTTHSSDGKDQPWAFTYCAKTSILSLEK ID:MMU927.93|acc:U92793.1 mRNA from Mus musculus (Mus musculus alpha glucosidase II alpha Subunit mRNA, LSLNIATDWEWRII complete cds) (E=7.2e'). Public nucleotide databases include all GenBank databases and the GeneSeq patent 0297 A search of sequence databases reveals that the database. NOV11c amino acid sequence has 467 of 912 amino acid 0296) The disclosed NOV11c polypeptide (SEQ ID NO: residues (51%) identical to, and 640 of 912 amino acid 42) encoded by SEQID NO: 41 has 914 amino acid residues residues (70%) similar to, the 944 amino acid residue and is presented in Table 11F using the one-letter amino acid ptnr:SPTREMBL-ACC:P79403 protein from Sus scrofa code. Signal P, Psort and/or Hydropathy results predict that (Pig) (Glucosidase II) (E=7.3e). Public amino acid NOV11c has no signal peptide and is likely to be localized databases include the GenBank databases, SwissProt, PDB in the microbody (peroxisome) with a certainty of 0.7480. In and PIR. other embodiments, NOV11c may also be localized to the 0298) NOV11c is expressed in at least Adrenal Gland/ nucleus with a certainty of 0.3000, the mitochondrial mem Suprarenal gland, Aorta, Brain, Hippocampus, Kidney, brane space with a certainty of 0.1000, or in the lysosome Lung, Lymph node, Ovary, Parathyroid Gland, Prostate, (lumen) with a certainty of 0.1000. Salivary Glands, Thyroid, Tonsils, Trachea, Uterus. Expres Sion information was derived from the tissue Sources of the TABLE 1.1F Sequences that were included in the derivation of the Encoded NOV11c protein sequence. sequence of NOV11c. MEAAVKEEISVEDEAVDKNIFRDCNKIAFYRRQKQW (SEQ ID NO: 42) 0299 NOV11d LSKKSTYQALLDSWTTDEDSTRFQIINEASKWPLLA 0300. A disclosed NOV11d nucleic acid of 3102 nucle otides (also referred to as CG55752-04) encoding a novel EIYGIEGNIFRLKINEETPLKPRFEWPIDWLTSKPST Glucosidase II-like protein is shown in Table 11G. An open reading frame was identified beginning with an ATG initia WRLISCSGDTGSLILADGKGDLKCHITANPFKWDLW tion codon at nucleotides 103-105 and ending with a TGA SEEEWVISINSLGOLYFEHLOILHKQRAAKENEEET codon at nucleotides 2839-2841. A putative untranslated region upstream from the initiation codon is underlined in SWDTSQENQEDLGLWEEKFGKFWDIKANGPSSIGLD Table 11G. The start and stop codons are in bold letters. FSLHGFEHLYGIPQHAESHOLKNTGDGDAYRLYNLD TABLE 11G WYGYQIYDKMGIYGSVPYLLAHKLGRTIGIFWLNAS NOV11d nucleotide sequence. ETLVEINTEPAVEYTLTQMGPWAAKOKWGSRTHWHW TACTGGTTGTAATTTTAGAAAGACACCCAATCGGCT (SEQ ID NO : 43) MSESGIIDWFLLTGPTPSDWFKOYSHLTGTQAMPPL TTTTTAAAAGATCGCCCAGGGCCCTTGTCCTGAGAG FSLGYHQCRWNYEDEQDWKAVDAGFDEHDIPYDAMW US 2003/0170630 A1 Sep. 11, 2003 82

TABLE 11G-continued TABLE 11G-continued NOV11d nucleotide sequence. NOV11d nucleotide sequence.

CTGGGAGCTGGTCGGAGTGACAGAGAAGCCAGGAA GATCCCCACATCAAGATTGAACCTGACTACTCAGTA

GCAGCAGTGAAAGAGGAAATAAGTGTTGAAGATGAA TATGTGAAGGCCAAAGATCAGGGCTTCTTTGTGAAG

GCTGTAGATAAAAACATTTTCAGAGACTGTAACAAG AATCAGGAAGGGGAAGACTTTGAAGGGGTGTGTTGG

ATCGCATTTTACAGGCGTCAGAAACAGTGGCTTTCC CCAGGTCTCTCCTCTTACCTGGATTTCACCAATCCC

AAGAAGTCCACCTATCGGGCATTATTGGATTCAGTC AAGGTCAGAGAGTGGTATTCAAGTCTTTTTGCTTTC

ACAACAGATGAAGACAGCACCAGGTTCCAAATCATC CCTGTTTATCAGGGATCTACGGACATCCTCTTCCTT

AATGAAGCAAGTAAGGTTCCTCTCCTGGCTGAAATT TGGAATGACATGAATGAGCCTTCTGTCTTTAGAGGG

TATGGTATAGAAGGAAACATTTTCAGGCTTAAAATT CCAGAGCAAACCATGCAGAAGAATGCCATTCATCAT

AACGAAGAGACTCCTCTAAAACCCAGATTTGAAGTT GGCAATTGGGAGCACAGAGAGCTCCACAACATCTAC

CCGGATGTCCTCACAAGCAAGCCAAGCACTGTAAGG GGTTTTTATCATCAAATGGCTACTGCAGAAGGACTG

CTGATTTCATGCTCTGGGGACACAGGCAGTCTGATA ATAAAACGATCTAAAGGGAAGGAGAGACCCTTTGTT

TTGGCACATGGAAAAGGAGACCTGAAGTGCCATATC CTTACACGTTCTTTCTTTGCTGGATCACAAAAGTAT

ACAGCAAACCCATTCAAGGTAGACTTGGTGTCTGAA GGTGCCGTGTGGACAGGCGACAACACAGCAGAATGG

GAAGAGGTTGTGATTAGCATAAATTCCCTGGGCCAA AGCAACTTGAAAATTTCTATCCCAATGTTACT CACT

TTATACTTTGAGCATCTACAGATTCTTCACAAACAA CTCAGCATTACTGGGATCTCTTTTTGCGGAGCTGAC

AGAGCTGCTAAAGAAAATGAGGAGGAGACATCAGTG ATAGGCGGGTTCATTGGGAATCCAGAGACAGAGCTG

GACACCTCTCAGGAAAATCAAGAAGATCTGGGCCTG CTAGTGCGTTGGTACCAGGCTGGAGCCTACCAGCCC

TGGGAAGAGAAATTTGGAAAATTTGTGGATATCAAA. TTCTTCCGTGGCCATGCCACCATGAACACCAAGCGA

GCTAATGGCCCTTCTTCTATTGGTTTGGATTTCTCC CGAGAGCCCTGGCTCTTTGGGGAGGAACACACCCGA

TTGCATGGATTTGAGCATCTTTATGGGATCCCACAA CTCATCCGAGAAGCCATCAGAGAGCGCTATGGCCTC.

CATGCAGAATCACACCAACTTAAAAATACTGGAGAT CTGCCATATTGGTATTCTCTGTTCTACCATGCACAC

GCTTACCGTCTTTATAACCTGGATGTCTATGGATAC GTGGCTTCCCAACCTGTCATGAGGCCTCTGTGGGTA

CAAATATATGATAAAATGGGCATTTATGGTTCAGTA GAGTTCCCTGATGAACTAAAGACTTTTGATATGGAA

CCTTATCTCCTGGCCCACAAACTGGGCAGAACTATA GATGAATACATGTTAGGGAGTGCATTATTGGTTCAT

GCTATTTTCTGGCTGAATGCCTCGGAAACACTGGTG CCAGTCACAGAACCAAAAGCCACCACAGTTGATGTG

GAGATCAATACAGAGCCTGCAGTAGAGTACACACTG TTTCTTCCAGGATCAAATGAGGTATGGTATGACTAT

ACCCAGATGGGCCCAGTTGCTGCTAAACAAAAGGTC AAGACATTTGCTCATTGGGAAGGAGGGTGTACTGTA

AGATCTCGCACTCATGTGCACTGGATGTCAGAGAGT AAGATCCCAGTAGCCTTGGACACTATTCCAGTGTTT

GGCATCATTGATGTTTTTCTGCTGACAGGACCTACA CAGCGAGGTGGAAGTGTGATACCAATAAAGACAACT

CCTTCTGATGTCTTCAAACAGTACT CACACCTTACA GTAGGAAAATCCACAGGCTGGATGACTGAATCCTCC

GGTACGCAAGCCATGCCCCCTCTTTTCTCTTTGGGA TATGGACTCCGGGTTGCTCTAAGCACTCAGGGTTCT

TACCACCAGTGCCGCTGGAACTATGAAGATGAGCAG TCAGTGGGTGAGTTATATCTTGATGATGGCCATTCA

GATGTAAAAGCAGTGGATGCAGGGTTTGATGAGCAT TTCCAATACCTCCACCAGAAGCAATTTTTGCACAGG

GACATTCCTTATGATGCCATGTGGCTGGACATAGAG AAGTTTTCATTCTGTTCCAGTGTTCTGATCAATAGT

CACACTGAGGGCAAGAGGTACTTCACCTGGGACAAA. TTTGCTGACCAGAGGGGTCATTATCCCAGCAAGTGT

AACAGATTCCCAAACCCCAAGAGGATGCAAGAGCTG GTGGTGGAGAAGATCTTGGTCTTAGGCTTCAGGAAG

CTCAGGAGCAAAAAGCGTAAGCTTGTGGTCATCAGT GAGCCATCTTCTGTGACTACCCACTCATCTGATGGT US 2003/0170630 A1 Sep. 11, 2003

TABLE 11G-continued TABLE 11H-continued

NOV11d nucleotide sequence. Encoded NOV11d protein sequence. AAAGATCAGCCTGTGGCTTTTACGTATTGTGCCAAA LGYHQCRWNYEDEQDVKAWDAGFDEHDIPYDAMWLD ACATCCATCCTGAGCCTGGAGAAGCTCTCACTCAAC IEHTEGKRYFTWDKNRFPNPKRMOELLRSKKRKLWW ATTGCCACTGACTGGGAGGTCCGCATCATAGACAA

AGAACTGCCCCTGGTGATGTGAGCAGGGACCTGCCT ISDPHIKIEPDYSVYWKAKDQGFFWKNQEGEDFEGV

GCCCCTTTCAACCTTTCCCCTCACCTTTTTTGAGAT CWPGLSSYLDFTNPKVREWYSSLFAFPWYOGSTDIL

TTTTGCTGCAATCTGTTTGTCTTCCCTGAATCAAAA FLWNDMNEPSWFRGPEQTMQKNAIHHGNWEHRELHN TAATCTTTCATTCGTCACCATTATACTAATGAACAA IYGFYHQMATAEGLIKRSKGKERPFWLTRSFFAGSQ TAGATTTCATGTTTCAAAATTTCAGATTTTACATGT KYGAWWTGDNTAEWSNLKISIPMLLTLSITGISFCG TAAGATGTACTAACAATATTCCTTGTATCAAACATC

TCCTTTTCTCCCTGATACATAGCCCTGAGACATTAT ADIGGFIGNPETELLWRWYOAGAYOPFFRGHATMNT

AGCGTC KRREPWLFGEEHTRLIREAIRERYGLLPYWYSLFYH

AHVASQPWMRPLWWEFPDELKTFDMEDEYMLGSALL

0301 In a search of public sequence databases, the WHPWTEPKATTWDWFLPGSNEWWYDYKTFAHWEGGC NOV11d nucleic acid sequence, located on chromosome 15 has 1427 of 2214 bases (64%) identical to agb:GENBANK TVKIPVALDTIPVFORGGSVIPIKTTVGKSTGWMTE ID:MMU927.93|acc:U92793.1 mRNA from Mus musculus (Mus musculus alpha glucosidase II alpha Subunit mRNA, SSYGLRVALSTOGSSWGELYLDDGHSFOYLHQKQFL complete cds) (E=5.9e'"). Public nucleotide databases include all GenBank databases and the GeneSeq patent HRKFSFCSSWLINSFADQRGHYPSKCVWEKILVLGF database. 0302) The disclosed NOV11d polypeptide (SEQ ID NO: RKEPSSWTTHSSDGKDQPWAFTYCAKTSILSLEKLS 44) encoded by SEQID NO: 43 has 912 amino acid residues LNIATDWEWRII and is presented in Table 11H using the one-letter amino acid code. Signal P, Psort and/or Hydropathy results predict that NOV11d has no signal peptide and is likely to be localized in the endoplasmic reticulum (membrane) with a certainty of 0303 A search of sequence databases reveals that the 0.8500. In other embodiments, NOV11d may also be local NOV11d amino acid sequence has 636 of 653 amino acid ized to the microbody (peroxisome) with a certainty of residues (97%) identical to, and 644 of 653 amino acid 0.7480, the plasma membrane with a certainty of 0.4400, or residues (98%) similar to, the 653 amino acid residue in the mitochondrial inner membrane with a certainty of ptnr:TREMBLNEW-ACC:BAB39324 protein from Macaca O.1OOO. fascicularis (Crab eating macaque) (Cynomolgus monkey) (Hypothetical 74.7 KDA Protein) (E=0.0). Public amino TABLE 11H acid databases include the GenBank databases, SwissProt, Encoded NOV11d protein sequence. PDB and PIR. MEAAVKEEISVEDEAVDKNIFRDCNKIAFYRRQKQW (SEQ ID NO:44) 0304) NOV11d is expressed in at least the adrenal gland, bone marrow, brain-amygdala, brain-cerebellum, LSKKSTYRALLDSWTTDEDSTRFQIINEASKWPLLA brain-hippocampus, brain-Substantia nigra, brain-thala EIYGIEGNIFRLKINEETPLKPRFEWPIDWLTSKPST mus, brain-whole, fetal brain, fetal kidney, fetal liver, fetal WRLISCSGDTGSLILADGKGDLKCHITANPFKWDLW lung, heart, kidney, lymphoma-Raji, mammary gland, pan creas, pituitary gland, placenta, prostate, Salivary gland, SEEEWVISINSLGOLYFEHLOILHKQRAAKENEEET skeletal muscle, Small intestine, Spinal cord, Spleen, Stom SWDTSQENQEDLGLWEEKFGKFWDIKANGPSSIGLD ach, testis, thyroid, trachea and uterus. Expression informa tion was derived from the tissue Sources of the Sequences FSLHGFEHLYGIPQHAESHOLKNTGDAYRLYNLDVY that were included in the derivation of the Sequence of GYQIYDKMGIYGSWPYLLAHKLGRTIGIFWLNASET NOV11d. LVEINTEPAVEYTLTQMGPWAAKQKVRSRTHVHWMS 0305) The disclosed NOV11 polypeptide has homology ESGIIDWFLLTGPTPSDWFKOYSHLTGTQAMPPLFS to the amino acid sequences shown in the BLASTP data listed in Table 11 I. US 2003/0170630 A1 Sep. 11, 2003 84

TABLE 11

BLAST results for NOV11 Gene Indexf Length. Identity Positives Identifier Protein/Organism (aa) (%) (%) Expect gi7672977gb glucosidase II 966 547/969 7O6/969 O.O AAF66685.1 alpha subunit (56%) (72%) (AF144074) Homo Sapiens gi6679891 refNP alpha glucosidase 966 538/969 707/969 O.O O32086.1 2, alpha neutral (55%) (72%) (NM 008060) subunit Mus musculus gi7661898 refNP KIAAO088 protein; 944 524/969 684/969 O.O O55425.1 likely ortholog (54%) (70%) (NM 014610) of mouse G2an alpha glucosidase 2, alpha neutral subunit Homo Sapiens gi577295ldbi The ha1225 gene product 943 524/969 684/969 O.O BAAO7642.1 related to is (54%) (70%) (D42041) human alpha glucosidase. Homo Sapiens gi1890664gb glucosidase II 944 525/969 684/969 O.O AAB49757.1 Sus scrofa (54%) (70%) (U71273)

0306 The homology between these and other sequences Sequence (i.e., regions that may be required to preserve is shown graphically in the ClustalW analysis shown in Structural or functional properties), whereas non-highlighted Table 11J. In the ClustalWalignment of the NOV11 protein, amino acid residues are leSS conserved and can potentially as well as all other ClustalW analyses herein, the black be altered to a much broader extent without altering protein outlined amino acid residues indicate regions of conserved structure or function. US 2003/0170630 A1 Sep. 11, 2003 85

uCO.

Table 11J. ClustalW Analysis of NOV11 l) Novel NOV11a (SEQ ID NO:38) 15 2) Novel NOVlb (SEQ D NO: 40) M 3) Novel NOV11c (SEQ Id NO: 42) 4) Novel NOWilld (SEO ID NO:44) 5) gi7672977gb|AAF66685. l. (AF144074) glucosidase II alpha subunit (Homo sapiens (SEQ ID NO: 120) 20 6) gi6679891 refNP 032086.1 (NM 008060) alpha glucosidase 2, alpha neutral subunit (Mus musculus) (SEQ ID NO: l2l) 7) gi7661898 ref NP_0554.25.1 (NM 014610) KIAA0088 protein; likely ortholog of mouse G2an alpha glucosidase 2, alpha neutral subunit (Homo sapiens) (SEQ ID NO: 122) 8) gi577295dbjBAA07642.1 (D4204.1) The hal225 gene product is related to human 25 alpha-glucosidase. (Homo sapiens) (SEQ ID NO:123) 104

US 2003/0170630 A1 Sep. 11, 2003 89

0307 Table 1K lists the domain description from that the NOV11 sequence has properties similar to those of DOMAIN analysis results against NOV11. This indicates other proteins known to contain this domain.

TABLE 11K Domain Analysis of NOV11 gnl Pfampfam01055, Glyco hydro 31, Glycosyl hydrolases family 31. Glycosyl hydrolases are k ey enzymes of carbohydrate metabolism. Family 31 comprises of enzymes t hat are or similar to alpha galactosidases. (SEQ ID NO : 125) CD-Length = 707 residues, 91.9% aligned Score = 642 bits (1657), Expect = 0.0 Query: 244 KDEPGAWEETFKTHSDSKPYGPSSIGLDFSLHGFEHLYGIPQHAESHQLKNTGDGDAYRL 303 ++ + + + + ++ + Sbjct: 33 STGDWLFDTTFGP----LVFSDQFLQLSTSLPSEYI-YGLGEHAHKLFRRDTNE--TYTL 85 Query: 304 YNLDVYGYQIYDKMGIYGSVPYLLAHK-LGRTIGIFWLNASETLVEINTEPAGIVIFGPV 362 + + + i + ++ + i + ++ + Sbjct: 86 WNRDWGPYSGDNNL.--YGSHPFYMSLEDSGNAHGWFLLNSNAMEWDIGPGPA------135 Query: 36.3 SLIYQSQGDTPLTTHVHWMSESGIIDVFLLTGPTPSDVFKQYSHLTGTQAMPPLFSLGYH 422 -- -- + + | | | | | ++ + ++ Sbct: 136 ------LTYRVIGGILDFYFFLGPTPEDVLQQYTELIGRPALPPYWSLGFH 180 Query: 423 QCRWNYEDEQDVKAVDAGFDEHDIPYDAMWLDIEHTEGKRYFTWDKNRFPNPKRMQELLR 482 + + + + ++ + + || | | | | | | + + Sbjct: 181 LCRWGYTNVSEWKTVVDGMRKANIPLDVQWLDIDYMDGYKDFTWDPVRFPGPEDFVKKLH 240 Query : 483 SKKRKLVVISDPHIKIEPD-YSVYVKAKDQGFFVKNQEGEDFEGVCWPGMKSYLDFTNPK 541 + + || | | | | ++ + ++ || | | | + | | | | ++ + Sbjct: 241 AKGQKYVVILDPAISVDSASYYPYERGKEKGVFVKNPNGSDYIGEVWPGYTAFPDFTNPE 300 Query: 542 VREWYSSMFSSNCDGSTDILFLWNDMNEPSVFRGP------EOT 579 +++ + || || || + Sbict: 301 ARKWWADEIKDFHD-SLPFDGIWIDMNEPSSFSEPGPNDSNLNYPPYAPNDGDGPLSSKT 359 Query: 580 MQKNAIHHGNWEHRELHNIYGFYM--ATAEGLIKRSKGKERPFVLTRSFFAGSQKYGAVW 637 +++ i +++ | | | | | + | | | | | | + | | | | | + | Sbjct: 360 MCMDAVHYGGVEHYDVHNLYGLSEAKATYEALKKVTGGK-RPFVLSRSTFAGSGRYAGHW 418 Query: 638 TGDNTAEWSNLKISIPMLLTLSITGISFCGADIGGFIGNPETELLVRWYQAGAYQPFFRG 697 | | | | | | | + || || ++ ++ || | | | | | | | | | | | | | | | | | + || | Sbjct: 419 TGDNTASWDDLKYSIPGVLSFNLFGIPFVGADICGFNGNTTEELCVRWMQLGAFYPFSRN 478 Query: 698 HATMNTKRREPWLFGEEHTRLIREAIRERYGLLPYWYSLFYHAHVASQPVMRPLWVEFPD 757 | + | + || ++ || | | | | | + | + || + || || + || || Sbjct: 479 HNHLGTIPQEPWLFDSVAAEASRKALNLRYTLLPYLYTLFHEAHVSGLPVMRPLFFEFPD 538 Query: 758 ELKTFDMEDEYMLGSALLVHPVTEPKATTVDVFLPGSNEVVWYDYKTFA--HWEGGCTVK 815 + ++++ +++ || | | | | | | | | | | + | + || | | | | | Sbjct: 539 DAETYDIDRQFLWGSALLVAPVLEPGATSVKAYLPGGR---WYDLYTGAGEASRGGNVTL 595 Query: 816 IPVLLQIPVFQRGGSVIPIKTTVGKSTGWMTESSYGLRVALSTLQGSSVGELYLDDGHSF 875 | + | | | | | | + + + ++ + ++ || | | | | | | | Sbjct: 596 SAPLDKIPVHVRGGSIIPTQEP-ALTTTESRDNPFHLLVALDD-NGTASGELYLDDGESI 653 Query: 876 QYLHQKQFLHRKFSFCSSVLVASSPVSQGH 905 + + + ++ + + + Sbjct: 654 DTQ-RGDYLLVQFSANNNTLTGTEVVTGYY 682

0308 The gene sequence of invention described herein 0309 Alpha-glucosidase which active at neutral pH encodes for a novel member of the glucosidase family of appears as a doublet of enzyme activity on native gel enzymes. Specifically, the Sequence encodes a novel alpha electrophoresis and was termed neutral alpha-glucosidase glucosidase2 neutral Subunit-like protein. Processing gly AB. Neutral alpha-glucosidase AB is synonymous with the cosidases also play a role in the folding of newly formed glycoprotein processing enzyme glucosidase II. A mutant glycoproteins and in endoplasmic reticulum quality control. mouse lymphoma line which is deficient in glucosidase II is Glucosidases are also useful for the treatment of diabetes. also deficient in neutral alpha-glucosidase AB, as defined By inhibiting the glucosidase enzymes of the golgi, the electrophoretically and quantitatively (less than 0.5% of requirement for insulin decreases. Therefore the novel parental). In contrast, both mutant and parental cell lines Alpha-Glucosidase2, Alpha Neutral Subunit-like protein exhibited SeverallySOSomal hydrolases which are processed could be useful for the treatment of metabolic and endocrine by glucosidase II. Both glucosidase II and neutral alpha disorderS Such as diabetes type I and II. glucosidase AB are high-molecular mass (greater than 200, US 2003/0170630 A1 Sep. 11, 2003 90

000 dalton) anionic glycoproteins which bind to concanava agents Such as biguanides and thiazolidinediones which lin A, have a broad pH optima (5.5-8.5), and have a similar enhance insulin Sensitivity, or agents that decrease insulin Km for maltose (4.8 versus 2.1 mM) and the artificial requirements like the alpha-glucosidase inhibitors. Type 2 Substrate 4-methylumbelliferyl-alpha-D-glucopyranoside diabetes mellitus is a heterogeneous disease with multiple (35 versus 19 microM). Similar to human neutral alpha underlying pathophysiological processes. Therapy should be glucosidase AB, purified rat glucosidase II migrates as a individualised based on the degree of hyperglycaemia, doublet of enzyme activity on native gel electrophoresis. hyperinsulinaemia or insulin deficiency. In addition, Several Although rat glucosidase II has been reported to have a factors have to be considered when prescribing a specific Subunit Size of 67 kDa, pig glucosidase II has been found to therapeutic agent. These factors include efficacy, Safety, have a subunit size of 100 kDa, like the 98-kDa major affordability and ease of administration. PMID: 10929931, protein in purified human neutral alpha-glucosidase A. glu UI: 2O383756 cosidase II is localized to the long arm of human chromo 0312 The prevalence of Type 2 diabetes rises steeply some II. PMID: 3881423, UI: 85104919 with age and involves beta-cell dysfunction and diminished 0310 Processing glycosidases play an important role in Sensitivity to insulin. beta-cell dysfunction is important in N-glycan biosynthesis in mammalian cells by trimming the development of hyperglycaemia while insulin resistance Glc(3)Man(9)GlcNAc(2) and thus providing the substrates Seems to play a major role in the atherogenic process for the formation of complex and hybrid structures by Golgi resulting in cardiovascular disease. Current therapeutic glycosyltransferases. Membrane-bound alpha-glucosidase I options include lifestyle adjustments (exercise and diet), oral and Soluble alpha-glucosidase II of the endoplasmic reticu hypoglycaemic agents (Sulphonylureas, newer beta-cell lum remove the alpha1,2-glucose and alpha1,3-glucose resi mediated insulin releasing drugs, alpha-glucosidase inhibi dues, respectively, beginning immediately following trans tors, biguanides and thiazolidinediones) and insulin treat fer of Glc(3)Man(9)GlcNAc(2) to nascent polypeptides. The ment. Oral hypoglycaemic agents are effective only tempo alpha-glucosidases participate in glycoprotein folding medi rarily in maintaining good glycaemic control, their efficacy ated by calnexin and calreticulin by forming the monoglu should be determined from changes in fasting and postpran cosylated high mannose oligosaccharides required for the dial glucose levels. Recent Studies have shown that the early interaction with the chaperones. In Some mammalian cells, initiation of insulin therapy can establish good glycaemic Golgi endo alpha-mannosidase provides an alternative path control. PMID: 1038.3606, UI: 993 15525 way for removal of glucose residues. Removal of alpha1,2- linked mannose residues begins in the endoplasmic reticu 0313 Genetic deficiency of lysosomal acid alpha-glu lum where trimming of mannose residues in the cosidase (acid maltase) results in the autosomal recessive endoplasmic reticulum has been implicated in the targeting disorder glycogen storage disease type II (GSDII) in which of malfolded glycoproteins for degradation. Removal of intralysosomal accumulation of glycogen primarily affects mannose residues continues in the Golgi with the action of function of skeletal and cardiac muscle. This report identi alpha1,2-mannosidases IA and IB that can form fies 2 of 35 GSDII patients with co-occurence of cleft lip, Man(5)GlcNAc(2) and of alpha-mannosidase II that considerably greater than the estimated frequency of non removes the alpha1,3- and alpha1,6-linked mannose from syndromic cleft lip with or without cleft palate of 1 in 700 GlcNAcMan(5)GlcNAc(2) tO form to 1,000. Because several lines of evidence Support a minor GlcNAcMan(3)GlcNAc(2). These membrane-bound Golgi cleft lip/palate (Cl/P) locus on chromosome 17q close to the enzymes have been cloned and shown to have very distinct locus for GSDII. Patient I (of Dutch descent) was homozy patterns of tissue-specific expression. There are also broad gous and the parents heterozygous for an intragenic deletion Specificity alpha-mannosidases that can trim Man(4- of exon 18 (deltaex 18), common in Dutch patients. Patient 9)GlcNAc(2) to Man(3)GlcNAc(2), and provide an alterna II was heterozygous for deltaS25T, a mutation also common tive pathway toward complex oligosaccharide formation. in Dutch patients and a novel nonsense mutation (172 Cloning of the remaining alpha-mannosidases will be degrees C.-->T, Glns8Stop) in exon 2, the first coding exon. required to evaluate their specific functions in glycoprotein The mother was heterozygous for the delta525T and the maturation. PMID: 1058O131, UI: 20047733 father for the 172 degrees C.-->T, Glni58Stop. The finding 0311. Several new pharmacological agents have recently that both patients carried intragenic mutations eliminates a been developed to optimize the management of type 2 contiguous gene Syndrome. Whereas the presence of cleft (non-insulin-dependent) diabetes mellitus. There are three lip/cleft palate in a patient with GSDII could be coincidental, general therapeutic modalities relevant to diabetes care. The these co-occurences could represent a modifying action of first modality is lifestyle adjustments aimed at improving acid alpha-glucosidase deficiency on unlinked or linked endogenous insulin Sensitivity or insulin effect. This can be genes that result in increased Susceptibility for cleft lip. achieved by increased physical activity and bodyweight PMID: 10377006, UI: 99303499 reduction with diet and behavioral modification, and the use 0314 Diabetes mellitus is the most common endocrine of pharmacological agents or Surgery. This first modality is disease, accounting for over 200 million people affected not discussed in depth in this article. The Second modality Worldwide. It is characterized by a lack of insulin Secretion involves increasing insulin availability by the administration and/or increased cellular resistance to insulin, resulting in of exogenous insulin, insulin analogues, Sulphonylureas and hyperglycemia and other metabolic disturbances. People the new insulin Secretagogue, repaglinide. The most fre with diabetes suffer from increased morbidity and premature quently encountered adverse effect of these agents is mortality related to cardiovascular, microvascular and neu hypoglycaemia. Bodyweight gain can also be a concern, ropathic complications. The Diabetes Control and Compli especially in patients who are obese. The association cation Trial (DCCT) has convincingly demonstrated the between hyperinsulinaemia and premature atherosclerosis is relationship of hyperglycemia to the development and pro still a debatable question. The third modality consists of gression of complications and showed that improved glyce US 2003/0170630 A1 Sep. 11, 2003 mic control reduced these complications. Although the disorders as indicated below. The potential therapeutic appli DCCT exclusively studied patients with Type 1 diabetes, cations for this invention include, but are not limited to: there is ample evidence to Support the belief that the same protein therapeutic, Small molecule drug target, antibody relationship between metabolic control and clinical outcome target (therapeutic, diagnostic, drug targeting/cytotoxic anti exists in patients with Type 2 diabetes. Therefore, a major body), diagnostic and/or prognostic marker, gene therapy effort should be made to develop and implement more (gene delivery/gene ablation), research tools, tissue regen effective treatment regimes. This article reviews those novel eration in Vivo and in Vitro of all tissues and cell types drugs that have been recently introduced for the manage composing (but not limited to) those defined here. ment of Type 2 diabetes, or that have reached an advanced level of study and will soon be proposed for preliminary 0319. The NOV11 nucleic acids and proteins of the clinical trials. They include: (i) compounds that promote the invention are useful in potential therapeutic applications synthesis/secretion of insulin by the beta-cell; (ii) inhibitors implicated in various diseases and pathologies. of the alpha-glucosidase activity of the Small intestine; (iii) 0320 NOV11 nucleic acids and polypeptides are further Substances that enhance the action of insulin at the level of useful in the generation of antibodies that bind immuno the target tissues; and (iv) inhibitors of free fatty acid specifically to the novel NOV11 Substances for use in oxidation. PMID: 98.16470, UI: 99033258 therapeutic or diagnostic methods. These antibodies may be 0315) The disclosed NOV11 nucleic acid of the invention generated according to methods known in the art, using encoding a Alpha Glucosidase 2, Alpha Neutral Subunit-like prediction from hydrophobicity charts, as described in the protein includes the nucleic acid whose Sequence is provided "Anti-NOVX Antibodies' section below. The disclosed in Table 11A, 11C, 11E or a fragment thereof. The invention NOV11 protein has multiple hydrophilic regions, each of also includes a mutant or variant nucleic acid any of whose which can be used as an immunogen. In one embodiment, a bases may be changed from the corresponding base shown contemplated NOV11 epitope is from about amino acids 5 in Table 11A, 11C, or 11E while still encoding a protein that to 90. In another embodiment, a NOV11 epitope is from maintains its Alpha Glucosidase 2, Alpha Neutral Subunit about amino acids 180 to 350. In additional embodiments, a like activities and physiological functions, or a fragment of NOV11 epitope is from about amino acids 400 to 670, from Such a nucleic acid. The invention further includes nucleic about amino acids 680 to 780, from about amino acids 860 acids whose Sequences are complementary to those just to 900, and from about amino acids 920 to 950. These novel described, including nucleic acid fragments that are comple proteins can be used in assay Systems for functional analysis mentary to any of the nucleic acids just described. The of various human disorders, which will help in understand invention additionally includes nucleic acids or nucleic acid ing of pathology of the disease and development of new drug fragments, or complements thereto, whose Structures targets for various disorders. include chemical modifications. Such modifications include, 0321) NOV12 by way of nonlimiting example, modified bases, and nucleic acids whose Sugar phosphate backbones are modified or 0322 NOV12 includes three novel Mechanical stress derivatized. These modifications are carried out at least in induced protein-like proteins disclosed below. The disclosed part to enhance the chemical Stability of the modified nucleic sequences have been named NOV12a, NOV12b, and acid, Such that they may be used, for example, as antisense NOV12c. binding nucleic acids in therapeutic applications in a Subject. In the mutant or variant nucleic acids, and their comple 0323) NOV12a ments, up to about 33% percent of the bases may be so 0324) A disclosed NOV12 nucleic acid of 7876 nucle changed. otides (also referred to as Curagen Accession No. CG55776 01) encoding a novel Mechanical stress induced protein-like 0316) The disclosed NOV11 protein of the invention protein is shown in Table 12A. An open reading frame was includes the Alpha Glucosidase 2, Alpha Neutral Subunit identified beginning with an ATG initiation codon at nucle like protein whose sequence is provided in Table 11B. 11D, otides 6-8 and ending with a TGA codon at nucleotides or 11F. The invention also includes a mutant or variant 7857-7859. Putative untranslated regions upstream from the protein any of whose residues may be changed from the initiation codon and downstream of the termination codon corresponding residue shown in Table 11B, 11D, or 11F are underlined in Table 12A. The start and stop codons are while Still encoding a protein that maintains its Alpha in bold letters. Glucosidase 2, Alpha Neutral Subunit-like activities and physiological functions, or a functional fragment thereof. In the mutant or variant protein, up to about 43% percent of the TABLE 12A residues may be So changed. NOV12 nucleotide sequence (SEQ ID NO: 45). 0317. The invention further encompasses antibodies and TCAGGAGAAGGTAAAAGGCAGAGGAATCACCTGCTTGCTGGTCTCCTTT antibody fragments, Such as Fab or (F), that bind immu nospecifically to any of the proteins of the invention. GCTGTGATCTGCCTGGTCGCCACCCCTGGGGGCAAGGCCTGTCCTCGCCG 0318. The above defined information for this invention CTGTGCCTGTTATATGCCTACGGAGGTACACTGCACATTTCGGTACCTGA Suggests that this Alpha Glucosidase 2, Alpha Neutral Sub CTTCCATCCCAGACAGCATCCCGCCCAATGTGGAACGCATCAATTTAGGG unit-like protein (NOV11) may function as a member of a “Alpha Glucosidase 2, Alpha Neutral Subunit family”. TACAACAGCTTGGTTAGATTGATGGAAACAGATTTTTCTGGCCTGACCAA Therefore, the NOV11 nucleic acids and proteins identified ACTGGAGTTACTCATGCTTCACAGCAATGGCATTCACACAATCCCTGACA here may be useful in potential therapeutic applications implicated in (but not limited to) various pathologies and US 2003/0170630 A1 Sep. 11, 2003 92

TABLE 12A-continued TABLE 12A-continued NOV12 nucleotide sequence (SEQ ID NO: 45). NOV12 nucleotide sequence (SEQ ID NO: 45).

AGACCTTCTCAGATTTGCAGGCCTTGCAGGTGAGACTGATGGTCTTAAAA GAGGCTGAGGTTGGAAAACACACCTCAAGCACAAGTAAGAGGCACAACTA.

ATGAGCTATAATAAAGTCCGAAAACTTCAGAAAGATACTTTTTATGGCCT TCGGGAATTAACACTCCAGCGACGTGGAGATTCA ACACATCGACGTTTTA.

CAGGAGCTTGACACGATTGCACATGGACCACAACAATATTGAGTTTATAA GGGAGAATAGGAGGCATTTCCCTCCCTCTGCTAGGAGAATTGACCCACAA

ACCCAGAGGTTTTTTATGGGCTCAACTTTCTCCGCCTGGTGCACTTGGAA CATTGGGCGGCACTGTTGGAGAAAGCTAAAAAGAATGCTATGCCAGACAA

GGAAATCAGCTCACTAAGCTCCACCCAGATACATTTGTCTCTTTGAGCTA GCGAGAAAATACCACAGTGAGCCCACCCCCAGTGGTCACCCAACTCCCAA

CCTCCAGATATTTAAAATCTCTTTCATTAAGTTCCTATACTTGTCTGATA ACATACCTGGTGAAGAAGACGATTCCTCAGGCATGCTCGCTCTACATGAG

ACTTCCTGACCTCCCTCCCTCAAGAGATGGTCTCCTATATGCCTGACCTA GAATTTATGGTCCCGGCCACTAAAGCTTTGAACCTTCCAGCAAGGACAGT

GACAGCCTTTACCTGCATGGAAACCCATGGACCTGTGATTGCCATTTAAA GACTGCTGACTCCAGAACAATATCTGATAGTCCTATGACAAACATAAATT

GTGGTTGTCTGACTGGATACAGGAGAAGCCAGGTATCTATATTGTNTTAC ATGGCACAGAATTCTCTCCTGTTGTGAATTCACAAATACTACCACCTGAA

CAGATGTAATAAAATGCAAAAAAGATAGAAGTCCCTCTAGTGCTCAGCAG GAACCCACAGATTTCAAACTGTCTACTGCTATTAAAACTACAGCCATGTC

TGTCCACTTTGCATGAACCCTAGGACTTCTAAAGGCAAGCCGTTAGCTAT AAAGAATATAAACCCAACCATGTCAAGCCAAATACAAGGCACAACCAATC

GGTCTCAGCTGCAGCTTTCCAGTGTGCCAAGCCAACCATTGACTCATCCC AACATTCATCCACTGTCTTTCCACTGCTACTTGGAGCAACTGAATTTCAG

TGAAATCAAAGAGCCTGACTATTCTGGAAGACAGTAGTTCTGCTTTCATC GACTCTGACCAGATGGGAAGAGGAAGAGAGCATTTCCAAAGTAGACCCCC

TCTCCCCAAGGTTTCATGGCACCCTTTGGCTCCCTCACTTTGAATATGAC AATAACAGTAAGGACTATGATCAAAGATGTCAATGTCAAAATGCTTAGTA

AGATCAGTCTGGAAATGAAGCTAACATGGTCTGCAGTATTCAAAAGCCCT GCACCACCAACAAACTATTATTAGAGTCAGTAAATACCACAAATAGTCAT

CAAGGACATCACCCATTGCATTCACTGAAGAAAATGACTACATCGTGCTA CAGACATCTGTAAGAGAAGTGAGTGAACCCAGGCACAATCACTTCTATTC

AATACTTCATTTTCAACATTTTTGGTGTGCAACATAGATTACGGTCACAT TCACACTACT CAAATACTTAGCACCTCCACGTTCCCTTCAGATCCACACA

TCAGCCAGTGTGGCAAATTTTGGCTTTGTACAGTGATTCTCCTCTGATAC CAGCTGCTCATTCTCAGTTTCCGATCCCTAGAAATAGTACAGTTAACATC

TAGAAAGGAGCCACTTGCTTAGTGAAACACCGCAGCTCTATTACAAATAT CCGCTGTTCAGACGCTTTGGGAGGCAGAGGAAAATTGGCGGAAGGGGGCG

AAACAGGTGGCTCCTAAGCCTGAAGACATTTTTACCAACATAGAGGCAGA GATTATCAGCCCATATAGAACTCCAGTTCTGCGACGGCATAGATACAGCA

TCTCAGAGCAGATCCCTCTTGGTTAATGCAAGACCAAATTTCCTTGCAGC TTTTCAGGTCAACAACCAGAGGTTCTTCTGAAAAAAGCACTACTGCATTC

TGAACAGAACTGCCACCACATTCAGTACATTACAGATCCAGTACTCCAGT TCAGCCACAGTGCTCAATGTGACATGTCTGTCCTGTCTTCCCAGGGAGAG

GATGCTCAAATCACTTTACCAAGAGCAGAGATGAGGCCAGTGAAACACAA GCTCACCACTGCCACAGCAGCATTGTCTTTTCCAAGTGCTGCTCCCATCA

ATGGACTATGATTTCAAGGGATAACAATACTAAGCTGGAACATACTGTCT CCTTCCCCAAAGCTGACATTGCTAGAGTCCCATCAGAAGAGTCTACAACT

TGGTAGGTGGAACCGTTGGCCTGAACTGCCCAGGCCAAGGAGACCCCACC CTAGTCCAGAATCCACTATTACTACTTGAGAACAAACCCAGTGTAGAGAA

CCACACGTGGATTGGCTTCTAGCTGATGGAAGTAAAGTGAGAGCCCCTTA. AACAACACCCACAATAAAATATTTCAGGACTGAAATTTCCCAAGTGACTC

TGTCAGTGAGGATGGACGGATCCTAATAGACAAAAGTGGAAAATTGGAAC CAACTGGTGCAGTCATGACATATGCTCCAACATCCATACCCATGGAAAAA

TCCAGATGGCTGATAGTTTTGACACAGGCGTATATCACTGTATAAGCAGC ACTCACAAAGTAAACGCCAGTTACCCACGTGTGTCTAGCACCAATGAAGC

AATTATGATGATGCAGATATTCTCACCTATAGGATAACTGTGGTAGAACC TAAAAGAGATTCAGTGATTACATCGTCACTTTCAGGTGCTATCACCAAGC

TTTGGTCGAAGCCTATCAGGAAAATGGGATTCATCACACAGTTTTCATTG CACCAATGACTATTATAGCCATTACAAGGTTTTCAAGAAGGAAAATTCCC

GTGAAACACTTGATCTTCCATGCCATTCTACTGGTATCCCAGATGCCTCT TGGCAACAGAACTTTGTAAATAACCATAACCCAAAAGGCAGATTAAGGAA

ATTAGCTGGGTTATTCCAGGAAACAATGTGCTCTATCAGTCATCAAGAGA TCAACATAAAGTTAGTTTACAAAAAAGCACAGCTGTGATGCTTCCTAAAA

CAAGAAAGTTCTAAACAATGGCACATTAAGAATATTACAGGTCACCCCGA CATCTCCTGCTTTACCACAGAGACAAAGTCTCCCCTCGCACCACACTACG

AAGACCAAGGTTATTATCGCTGTGTGGCAGCCAACCCATCAGGGGTTGAT ACCAAAACACACAATCCTGGAAGTCTTCCAACAAAGAAGGAGCTTCCCTT

TTTTTGATTTTCCAAGTTTCAGTCAAGATGAAAGGACAAAGGCCCTTGGA CCCACCCCTTAACCCTATGCTTCCTAGTATTATAAGCAAAGACTCAAGTA

GCATGATGGAGAAACAGAGGGATCTGGACTTGATGAGTCCAATCCTATTG CAAAAAGCATCATATCAACGCAAACAGCAATACCAGCAACAACTCCTACC

CTCATCTTAAGGAGCCACCAGGTGCACAACTCCGTACATCTGCTCTGATG TTCCCTGCATCTGTCATCACTTATGAAACCCAAACAGAGAGATCTAGAGC US 2003/0170630 A1 Sep. 11, 2003

TABLE 12A-continued TABLE 12A-continued NOV12 nucleotide sequence (SEQ ID NO: 45). NOV12 nucleotide sequence (SEQ ID NO: 45).

ACAAACAATACAAAGAGAACAGGAGCCTCAAAAGAAGAACAGGACTGACC TTGGGGACAAATTACTACTGAACTGCTCAGCCACTGGGGAGCCCAAACCC

CAAACATCTCTCCAGACCAGAGTTCTGGCTTCACTACACCCACTGCTATG CAAATAATGTGGAGGTTACCATCCAAGGCTGTGGTCGACCAGCAGCATAG

ACACCTCCTGTTCTAACCACAGCCGAAACTTCAGTCAAGCCCAGTGTCTC GGTGGGCAGCTGGATCCACGTCTACCCTAATGGATCCCTGTTTATTGGAT

TGCATTCACTCATTCCCCACCAGAAAACACAACTGGGATTTCAAGCACAA CAGTAACAGAAAAAGACAGTGGTGTCTACTTGTGTGTGGCAAGAAACAAA.

TCAGTTTTCATTCAAGAACTCTTAATCTGACAGATGTGATTGAAGAACTA ATGGGGGATGATCTGATACTGATGCATGTTAGCCTAAGACTGAAACCTGC

GCCCAAGCAAGTACTCAGACTTTGAAGAGCACAATTGCTTCTGAAACAAC CAAAATTGACCACAAGCAGTATTTTAGAAAGCAAGTGCTCCATGGGAAAG

TTTGTCCAGCAAATCACACCAGAGTACCACAACTAGGAAAGCAATCATTA. ATTTCCAAGTAGATTGCAAAGCTTCCGGCTCCCCAGTGCCAGAGATATCT

GACACTCAACCATACCACCATTCTTGAGCAGCAGTGCTACTCTAATGCCA TGGAGTTTGCCTGATGGAACCATGATCAACAATGCAATGCAAGCCGATGA

GTTCCCATCTCCCCTCCCTTTACTCAGAGAGCAGTTACTGACAACGTGGC CAGTGGCCACAGGACTAGGAGATATACCCTTTTCAACAATGGAACTTTAT

GACTCCCATTTCCGGGCTTATGACAAATACAGTGGTCAAGCTGCACGAAT ACTTCAACAAAGTTGGGGTAGCGGAGGAAGGAGATTATACTTGCTATGCC

CCTCAAGGCACAATGCTAAACCACAGCAATTAGTAGCAGAGGTTGCA ACA CAGAACACCCTAGGGAAAGATGAAATGAAGGTCCACTTAACAGTTATAAC

TCCCCCAAGGTTCACCCAAATGCCAAGTTCACAATTGGAACCACTCACTT AGCTGCTCCCCGGATAAGGCAGAGTAACAAAACCAACAAGAGAATCAAAG

CATCTACTCTAATCTGTTACATTCTACTCCCATGCCAGCACTAACAACAG CTGGAGACACAGCTGTCCTTGACTGTGAGGTCACTGGGGATCCCAAACCA.

TTAAATCACAGAATTCTAAATTAACTCCATCTCCCTGGGCAGAAAACCAA AAAATATTTTGGTTGCTGCCTTCCAATGACATGATTTCCTTCTCCATTGA

TTTTGGCACAAACCATACTCAGAAATTGCTGAAAAAGGCAAAAAGCCAGA TAGGTACACATTTCATGCCAATGGGTCTTTGACCATCAACAAAGTGAAAC

AGTAAGCATGTTGGCTACTACAGGCCTGTCCGAGGCCACCACTCTTGTTT TGCTCGATTCTGGAGAGTACGTATGTGTAGCCCGAAATCCCAGTGGGGAT

CAGATTGGGATGGACAGAAGAACACAAAGAAGAGTGACTTTGATAAGAAA GACACCAAAATGTACAAACTGGATGTGGTCTCTAAACCTCCATTAATCAA

CCAGTTCAAGAAGCAACAACTTCCAAACTCCTTCCCTTTGACTCTTTGTC TGGTCTGTATACAAACAGAACTGTTATTAAAGCCACAGCTGTGAGACATT

TAGGTATATATTTGAAAAGCCCAGGATAGTTGGAGGAAAAGCTGCAAGTT CCAAAAAACACTTTGACTGCAGAGCTGAAGGGACACCATCTCCTGAAGTC

TTACTATTCCAGCTAACTCAGATGCCTTTCTTCCCTGTGAAGCTGTTGGA ATGTGGATCATGCCAGACAATATTTTCCTCACAGCCCCATACTATGGAAG

AATCCCCTGCCCACCATTCATTGGACCAGAGTCCCATCAGGTATGTCAGG CAGAATCACAGTCCATAAAAATGGAACCTTGGAAATTAGGAATGTGAGGC

ACTTGATTTATCTAAGAGGAAACAGAATAGCAGGGTCCAGGTTCTCCCCA TTTCAGATTCAGCCGACTTTATCTGTGTGGCCCGAAATGAAGGTGGAGAG

ATGGTACCCTGTCCATCCAGAGGGTGGAAATTCAGGACCGCGGACAGTAC AGCGTGTTGGTAGTACAGTTAGAAGTACTGGAAATGCTGAGAAGACCGAC

TTGTGTTCCGCATCCAATCTGTTTGGCACAGACCACCTTCATGTCACCTT ATTTAGAAATCCATTTAATGAAAAAATAGTTGCCCAGCTGGGAAAGTCCA

GTCTGTGGTTTCCTATCCTCCCAGGATCCTGGAGAGACGTACCAAAGAGA CAGCATTGAATTGCTCTGTTGATGGTAACCCACCACCTGAAATAATCTGG

TCACAGTTCATTCCGGAAGCACTGTGGAACTGAAGTGCAGAGCAGAAGGT ATTTTACCAAATGGCACACGATTTTCCAATGGACCACAAAGTTATCAGTA

AGGCCAAGCCCTACAGTTACCTGGATTCTTGCAAACCAAACAGTTGTCTC TCTGATAGCAAGCAATGGTTCTTTTATCATTTCTAAAACAACTCGGGAGG

AGAATCATCCCAGGGAAGTAGGCAGGCTGTGGTGACGGTTGACGGAACAT ATGCAGGAAAATATCGCTGTGCAGCTAGGAATAAAGTTGGCTATATTGAG

TGGTCCTCCACAATCTCAGTATTTATGACCGTGGCTTTTACAAATGTGTG AAATTAGTCATATTAGAAATTGGCCAGAAGCCAGTTATTCTTACCTATGC

GCCAGCAACCCAGGTGGCCAGGATTCACTGCTGGTTAAAATACAAGTCAT ACCAGGGACAGTAAAAGGCATCAGTGGAGAATCTCTATCACTGCATTGTG

TGCAGCACCACCTGTTATTCTAGAGCAAAGGAGGCAAGTCATTGTAGGCA TGTCTGATGGAATCCCTAAGCCAAATATCAAATGGACTATGCCAAGTGGT

CTTGGGGTGAAAGTTTAAAACTGCCCTGTACTGCAAAAGGAACTCCTCAG TATGTAGTAGACAGGCCTCAAATTAATGGGAAATACATATTGCATGACAA

CCCAGCGTTTACTGGGTCCTCTCTGATGGCACTGAAGTGAAACCATTACA TGGCACCTTAGTCATTAAAGAAGCAACAGCTTATGACAGAGGAAACTATA

GTTTACCAATTCCAAGTTGTTCTTATTTTCAAATGGGACTTTGTATATAA TCTGTAAGGCTCAAAATAGTGTTGGTCATACACTGATTACTGTTCCAGTA

GAAACCTAGCCTCTTCAGACAGGGGCACTTATGAATGCATTGCTACCAGT ATGATTGTAGCCTACCCTCCCCGAATTACAAATCGTCCACCCAGGAGTAT

TCCACTGGTTCGGAGCGAAGAGTAGTAATGCTTACAATGGAAGAGCGAGT TGTCACCAGGACAGGGGCAGCCTTTCAGCTCCACTGTGTGGCCTTGGGAG

GACCAGCCCCAGGATAGAAGCTGCATCCCAGAAAAGGACTGAAGTGAATT TTCCCAAGCCAGAAATCACATGGGAGATGCCTGACCACTCCCTTCTCTCA

US 2003/0170630 A1 Sep. 11, 2003 residues (75%) similar to, the 2507 of 2597 amino acid residue ptnr: patp-ACC:Y53664 protein from Rattus species TABLE 12C-continued (Rat mechanical stress induced protein 608) (E=0.0). Public amino acid databases include the GenBank databases, Swis NOV12b nucleotide sequence (SEQ ID NO: 47). sProt, PDB and PIR. TGAACCCTAGGACTTCTAAAGGCAAGCCGTTAGCTATGGTCTCAGCTGCA 0328 NOV12 is expressed in at least adrenal gland, bone marrow, brain-amygdala, brain-cerebellum, brain-hip GCTTTCCAGTGTGCCCTCGAG pocampus, brain-Substantia nigra, brain-thalamus, brain-whole, fetal brain, fetal kidney, fetal liver, fetal lung, 0332) The disclosed NOV12b polypeptide (SEQ ID NO: heart, kidney, lymphoma-Raji, mammary gland, pancreas, 48) encoded by SEQID NO: 47 has 257 amino acid residues pituitary gland, placenta, prostate, Salivary gland, Skeletal and is presented in Table 12D using the one-letter amino muscle, Small intestine, Spinal cord, Spleen, Stomach, testis, acid code. thyroid, trachea, uterus. This information was derived by determining the tissue Sources of the Sequences that were TABLE 12D included in the invention including but not limited to Seq Encoded NOV12b protein sequence (SEQ ID NO: 48). Calling sources, Public EST sources, and/or RACE sources. KLACPRRCACYMPTEWHCTFRYLTSIPDSIPPNWERINLGYNSLWRLMET 0329. In addition, the sequence is predicted to be expressed in Osteoblasts because of the expression pattern of DFSGLTKLELLMLHSNGIHTIPDKTFSDLQALOWLKMSYNKVRKLQKDTF (GENBANK-ID: Z36321) a closely related homolog in YGLRSLARLHMDHNNIEFINPEVFDGLNFLRLVHLEGNOLTKLHPDTFWS Rattus species (Rat mechanical stress induced cDNA encod ing protein 608). LSYLQIFKISFIKFLYLSDNFLTSLPQEMVSYMPDLDSLYLHGNPWTCDC HLKWLSDWIQEKPDWIKCKKDRSPSSAQQCPLCMNPRTSKGKPLAMVSAA 0330 NOV12b AFQCALE 0331) A disclosed NOV12b nucleic acid of 771 nucle otides (also referred to as Curagen Accession No. 174124289) encoding a novel Mechanical stress induced 0333) NOV12c protein-like protein is shown in Table 12C. An open reading 0334) A disclosed NOV12c nucleic acid of 771 nucle frame was identified beginning with an AAG initiation otides (also referred to as Curagen Accession No. codon at nucleotides 1-3 and ending at nucleotides 769-771. 174124313) encoding a novel Mechanical stress induced The start codon is in bold letters in Table 12E. Because protein-like protein is shown in Table 12E. An open reading NOV12b has no traditional initiation or termination codons, frame was identified beginning with an AAG initiation NOV12b could be a partial reading frame extending into the codon at nucleotides 1-3 and ending with nucleotides 769 5' and 3' directions. 771. The start codon is in bold letters in Table 12E. Because NOV12b has no traditional initiation or termination codons, TABLE 12C NOV12c could be a partial, reading frame extending into the 5' and 3' directions. NOV12b nucleotide sequence (SEQ ID NO: 47). AAGCTTGCCTGTCCTCGCCGCTGTGCCTGTTATATGCCTACGGAGGTACA TABLE 12E CTGCACATTTCGGTACCTGACTTCCATCCCAGACAGCATCCCGCCCAATG NOV12c nucleotide sequence (SEQ ID NO: 49).

TGGAACGCATCAATTTAGGATACAACAGCTTGGTTAGATTGATGGAAACA AAGCTTGCCTGTCCTCGCCGCTGTGCCTGTTATATGCCTACGGAGGTACA

GATTTTTCTGGCCTGACCAAACTGGAGTTACTCATGCTTCACAGCAATGG CTGCACATTTCGGTACCTGACTTCCATCCCAGACAGCATCCCGCCCAATG

CATTCACACAATCCCTGACAAGACCTTCTCAGATTTGCAGGCCTTGCAGG TGGAACGCATCAATTTAGGATACAACAGCTTGGTTAGATTAATGGAAACA

TCTTAAAAATGAGCTATAACAAAGTCCGAAAACTTCAGAAAGATACTTTT GATTTTTCTGGCCTGACCAAACTGGAGTTACTCATGCTTCACAGCAATGG

TATGGCCTCAGGAGCTTGGCACGATTGCACATGGACCACAACAATATTGA CATTCACACAATCCCTGACAAGACCTTCTCAGATTTGCAGGCCTTGCAGG

GTTTATAAACCCAGAGGTTTTTGATGGGCTCAACTTTCTCCGCCTGGTGC TCTTAAAAATGAGCTATAATAAAGTCCGAAAACTTCAGAAAGATACTTTT

ACTTGGAAGGAAATCAGCTCACTAAGCTCCACCCAGATACATTTGTCTCT TATGGCCTCAGGAGCTTGACACGATTGCACATGGACCACAACAATATTGA

TTGAGCTACCTCCAGATATTTAAAATCTCTTTCATTAAGTTCCTATACTT GTTTATAAACCCAGAGGTTTTTTATGGGCTCAACTTTCTCCGCCTGGTGC

GTCTGATAACTTCCTGACCTCCCTCCCTCAAGAGATGGTCTCCTATATGC ACTTGGAAGGAAATCAGCTCACTAAGCTCCACCCAGATACATTTGTCTCT

CTGACCTAGACAGCCTTTACCTGCATGGAAACCCATGGACCTGTGATTGC TTGAGCTACCTCCAGATATTTAAAATCTCTTTCATTAAGTTCCTATACTT

CATTTAAAGTGGTTGTCTGACTGGATACAGGAGAAGCCAGATGTAATAAA GTCTGATAACTTCCTGACCTCCCTCCCTCAAGAGATGGTCTCCTATATGC

ATGCAAAAAAGATAGAAGTCCCTCTAGTGCTCAGCAGTGTCCACTTTGCA CTGACCTAGACAGCCTTTACCTGCATGGAAACCCATGGACCTGTGATTGC US 2003/0170630 A1 Sep. 11, 2003

TABLE 12E-continued TABLE 1.2G-continued NOV12c nucleotide sequence (SEQ ID NO:49). NOV12d nucleotide sequence (SEQ ID NO:51).

CATTTAAAGTGGTTGTCTGACTGGATACAGGAGAAGCCAGATGTAATAAA GTCTGATAACTTCCTGACCTCCCTCCCTCAAGAGATGGTCTCCTATATGC ATGCAAAAAAGATAGAAGTCCCTCTAGTGCTCAGCAGTGTCCACTTTGCA CTGACCTAGACAGCCTTTACCTGCATGGAAACCCATGGACCTGTGATTGC TGAACCCTAGGACTTCTAAAGGCAAGCCGTTAGCTATGGTCTCAGCTGCA CATTTAAAGTGGTTGTCTGACTGGATACAGGAGAAGCCAGATGTAATAAA GCTTTCCAGTGTGCCCTCGAG ATGCAAAAAAGATAGAAGTCCCTCTAGTGCTCAGCAGTGTCCACTTTGCA

0335) The disclosed NOV12c polypeptide (SEQ ID NO: TGAACCCTAGGACTTCTAAAGGCAAGCCGTTAGCTATGGTCTCAGCTGCA 50) encoded by SEQID NO: 49 has 257 amino acid residues and is presented in Table 12F using the one-letter amino acid GCTTTCCAGTGTGCCCTCGAG code.

TABLE 1.2F 0338. The reverse complement og NOV12d is shown in Table 12H. Encoded NOV12c protein sequence (SEQ ID NO: 50). KLACPRRCACYMPTEWHCTFRYLTSIPDSIPPNWERINLGYNSLWRLMET TABLE 12H DFSGLTKLELLMLHSNGIHTIPDKTFSDLQALOWLKMSYNKVRKLQKDTF NOV12d reverse complement nucleotide sequence (SEQ ID NO: 60). YGLRSLTRLHMDHNNIEFINPEWFYGLNFLRLVHLEGNOLTKLHPDTFWS CTCGAGGGCACACTGGAAAGCTGCAGCTGAGACCATAGCTAACGGCTTGC LSYLQIFKISFIKFLYLSDNFLTSLPQEMVSYMPDLDSLYLHGNPWTCDC CTTTAGAAGTCCTAGGGTTCATGCAAAGTGGACACTGCTGAGCACTAGAG HLKWLSDWIQEKPDWIKCKKDRSPSSAQQCPLCMNPRTSKGKPLAMVSAA GGACTTCTATCTTTTTTGCATTTTATTACATCTGGCTTCTCCTGTATCCA AFQCALE GTCAGACAACCACTTTAAATGGCAATCACAGGTCCATGGGTTTCCATGCA 0336) NOV12d GGTAAAGGCTGTCTAGGTCAGGCATATAGGAGACCATCTCTTGAGGGAGG GAGGTCAGGAAGTTATCAGACAAGTATAGGAACTTAATGAAAGAGATTTT 0337. A disclosed NOV12d nucleic acid of 771 nucle otides (also referred to as Curagen Accession No. AAATATCTGGAGGTAGCTCAAAGAGACAAATGTATCTGGGTGGAGCTTAG 174124322) encoding a novel Mechanical stress induced protein-like protein is shown in Table 12G. An open reading TGAGCTGATTTCCTTCCAAGTGCACCAGGCGGAGAAAGTTGAGCCCATCA frame was identified beginning with an AAG initiation AAAACCTCTGGGTTTATAAACT CAATATTGTTGTGGTCCATGTGCAATCG codon at nucleotides 1-3 and ending with nucleotides 769 771. The start codon is in bold letters in Table 12G. Because TGTCAAGCTCCTGAGGCCATAAAAAGTATCTTTCTGAAGTTTTCGGACTT NOV12d has no traditional initiation or termination codons, TGTTATAGCTCATTTTTAAGACCTGCAAGGCCTGCAAATCTGAGAAGGTC NOV12d could be a partial reading frame extending into the 5' and 3' directions. TTGTCAGGGATTGTGTGAATGCCATTGCTGTGAAGCATGAGTAACTCCAG

TTTGGTCAGGCCAGAAAAATCTGTTTCCATCAATCTAACCAAGCTGTTGT TABLE 1.2G ATCCTAAATTGATGCGTTCCACATTGGGCGGGATGCTGTCTGGGATGGAA NOV12d nucleotide sequence (SEQ ID NO:51). GTCAGGTACCGAAATGTGCAGTGTACCTCCGTAGGCATATAACAGGCACA AAGCTTGCCTGTCCTCGCCGCTGTGCCTGTTATATGCCTACGGAGGTACA GCGGCGAGGACAGGCAAGCTT CTGCACATTTCGGTACCTGACTTCCATCCCAGACAGCATCCCGCCCAATG

TGGAACGCATCAATTTAGGATACAACAGCTTGGTTAGATTGATGGAAACA 0339) The disclosed NOV12d polypeptide (SEQ ID NO: GATTTTTCTGGCCTGACCAAACTGGAGTTACTCATGCTTCACAGCAATGG 52) encoded by SEQID NO:51 has 257 amino acid residues CATTCACACAATCCCTGACAAGACCTTCTCAGATTTGCAGGCCTTGCAGG and is presented in Table 121 using the one-letter amino acid code. TCTTAAAAATGAGCTATAACAAAGTCCGAAAACTTCAGAAAGATACTTTT

TATGGCCTCAGGAGCTTGACACGATTGCACATGGACCACAACAATATTGA TABLE 12 GTTTATAAACCCAGAGGTTTTTGATGGGCTCAACTTTCTCCGCCTGGTGC Encoded NOV12d protein sequence (SEQ ID NO:52).

ACTTGGAAGGAAATCAGCTCACTAAGCTCCACCCAGATACATTTGTCTCT KLACPRRCACYMPTEWHCTFRYLTSIPDSIPPNWERINLGYNSLWRLMET

TTGAGCTACCTCCAGATATTTAAAATCTCTTTCATTAAGTTCCTATACTT DFSGLTKLELLMLHSNGIHTIPDKTFSDLQALOWLKMSYNKVRKLQKDTF US 2003/0170630 A1 Sep. 11, 2003 97

0340 NOV12e TABLE 12- COLeti d 0341) A disclosed NOV12e nucleic acid of 771 nucle Encoded NOV12d protein sequence (SEQ ID NO:52). otides (also referred to as Curagen Accession No. 174124322) encoding a novel Mechanical stress induced YGLRSLTRLHMDHNNIEFINPEVFDGLNFLRLVHLEGNOLTKLHPDTFWS protein-like protein is shown in Table 12.J. An open reading frame was identified beginning with an AAG initiation LSYLQIFKISFIKFLYLSDNFLTSLPQEMVSYMPDLDSLYLHGNPWTCDC codon at nucleotides 1-3 and ending with nucleotides 769 HLKWLSDWIQEKPDWIKCKKDRSPSSAQQCPLCMNPRTSKGKPLAMVSAA 771. The start codon is in bold letters in Table 12.J. Because NOV12e has no traditional initiation or termination codons, AFQCALE NOV12e could be a partial reading frame extending into the 5' and 3' directions.

TABLE 12J

NOV12e nucleotide sequence. (SEQ ID NO:53) AAGCTTGCCTGTCCTCGCCGCTGTGCCTGTTATATGCCTACGGAGGTACACTGCACATTTCCGTACCTGACT

TCCATCCCAGACAGCATCCCGCCCAATGTGGAACGCATCAATTTAGGATACAACAGCTTGGTTAGATTGATG

GAAACAGATTTTTCTGGCCTGACCAAACTGGAGTTACTCATGCTTCACAGCAATGGCATTCACACAATCCCT

GGCAAGACCTTCTCAGATTTGCAGGCCTTGCAGGTCTTAAAAATGAGCTATAACAAAGTCCGAAAACTTCAG

AAAGATACTTTTTATGGCCTCAGGAGCTTGACACGATTGCACATGGACCACAACAATATTGAGTTTATAAAC

CCAGAGGTTTTTGATGGGCTCAACTTTCTCCGCCTGGTGCACTTGGAAGGAAATCAGCTCACTAAGCTCCAC

CCAGATACATTTGTCTCTTTGAGCTACCTCCAGATATTTAAAATCTCTTTCATTAAGTTCCTATACTTGTCT

GATAACTTCCTGACCTCCCTCCCTCAAGAGATGGTCTCCTATATGCCTGACCTAGACAGCCTTTACCTGCAT

GGAAACCCATGGACCTGTGATTGCCATTTAAAGTGGTTGTCTGACTGGATACAGGAGAAGCCAGATGTAATA

AAATGCAAAAAAGATAGAAGTCCCTCTAGTGCTCAGCAGTGTCCACTTTGCATGAACCCTAGGACTTCTAAA

GGCAAGCCGTTAGCTATGGTCTCAGCTGCAGCTTTCCAGTGTGCCCTCGAG

0342. The disclosed NOV12e polypeptide (SEQID NO: 54) encoded by SEQID NO:53 has 257 amino acid residues and is presented in Table 12K using the one-letter amino acid code.

TABLE 12K

Encoded NOV12e protein sequence. (SEQ m NO:54) KLACPRRCACYMPTEWHCTFRYLTSIPDSIPPNWERINLGYNSLWRLMETDFSGLTKLELLMLHSNGIHTIP

GKTFSDLOALOWLKMSYNKVRKLQKDTFYGLRSLTRLHMDHNNIEFINPEWFDGLNFLRLWHLEGNOLTKLH

PDTFWSLSYLQIFKISFIKFLYLSDNFLTSLPQEMVSYMPDLDSLYLHGNPWTCDCHLKWLSDWIQEKPDVI

KCKKDRSPSSAQQCPLCMNPRTSKGKPLANVSAAAFQCALE US 2003/0170630 A1 Sep. 11, 2003

0343) NOV12f identified beginning with an ATG initiation codon at nucle otides 6-8 and ending with a TGA codon at nucleotides 0344) A disclosed NOV12f nucleic acid of 8270 nucle 7779-7781. Putative untranslated regions upstream from the otides (also referred to as Curagen Accession No. CG55776 initiation codon and downstream of the termination codon 03) encoding a novel Mechanical stress induced protein-like are underlined in Table 12L. The start and stop codons are protein is shown in Table 12L. An open reading frame was in bold letters.

TABLE 12I NOV12 nucleotide sequence. (SEQ ID NO: 55) TCAGGAGAAGGTAAAAGGCAGAGGAATCACCTGCTTGCTGGTCTCCTTTGCTGTGATCTGCCTGGTCGCCA

CCCCTGCGGGCAAGGCCTGTCCTCGCCGCTGTGCCTGTTATATGCCTACGGAGGTACACTGCACATTTCGGT

ACCTGACTTCCATCCCAGACAGCATCCCGCCCAATGTGGAACGCATCAATTTAGGGTACAACAGCTTGGTTA

GATTGATGGAAACAGATTTTTCTGGCCTGACCAAACTGGAGTTACTCATGCTTCACAGCAATGGCATTCACA

CAATCCCTGACAAGACCTTCTCAGATTTGCAGGCCTTGCAGGTGAGACTGATGGTCTTAAAAATGAGCTATA

ATAAAGTCCGAAAACTTCAGAAAGATACTTTTTATGGCCTCAGGAGCTTTACACGATTGCACATGGACCACA

ACAATATTGAGTTTATAAACCCAGAGGTTTTTTATGGGCTCAACTTTCTCCGCCTGGTGCACTTGGAAGGAA

ATCACCTCACTAAGCTCCACCCAGATACATTTGTCTCTTTGAGCTACCTCCAGATATTTAAAATCTCTTTCA

TTAAGTTCCTATACTTGTCTGATAACTTCCTGACCTCCCTCCCTCAAGAGATGGTCTCCTATATGCCTGACC

TAGACAGCCTTTACCTGCATGGAAACCCATGGACCTGTGATTGCCATTTAAAGTGGTTGTCTGACTGGATAC

AGGAGAAGCCAGGTATCTATATTGTNTTACCAGATGTAATAAAATGCAAAAAAGATAGAAGTCCCTCTAGTG

CTCAGCAGTGTCCACTTTGCATGAACCCTAGGACTTCTAAAGGCAAGCCGTTACCTATGGTCTCAGCTGCAG

CTTTCCAGTGTGCCAAGCCAACCATTGACTCATCCCTGAAATCAAAGAGCCTGACTATTCTGGAAGACAGTA

GTTCTGCTTTCATCTCTCCCCAACGTTTCATGGCACCCTTTGGCTCCCTCACTTTOAATATGACAGATCAGT

CTGGAAATGAAGCTAACATGGTCTGCAGTATTCAAAAGCCCTCAAGGACATCACCCATTGCATTCACTGAAG

AAAATGACTACATCGTGCTAAATACTTCATTTTCAACATTTTTCGTGTGCAACATAGATTACGGTCACATTC

AGCCAGTGTGGCAAATTTTGGCTTTGTACAGTGATTCTCCTCTGATACTAGAAAGGACCCACTTGCTTAGTG

AAACACCGCAGCTCTATTACAAATATAAACAGGTGGCTCCTAAGCCTGAAGACATTTTTACCAACATAGAGG

CAGATCTCAGAGCAGATCCCTCTTGGTTAATGCAAGACCAAATTTCCTTGCAGCTGAACAGAACTGCCACCA.

CATTCACTACATTACAGATCCAGTACTCCAGTGATGCTCAAATCACTTTACCAAGAGCAGAGATGAGGCCAG

TGAAACACAAATGGACTATGATTTCAAGGGATAACAATACTAAGCTGGAACATACTGTCTTGGTAGGTGGAA

CCGTTGGCCTGAACTGCCCAGGCCAAGGAGACCCCACCCCACACGTGGATTGGCTTCTAGCTGATGGAAGTA

AAGTGAGAGCCCCTTATGTCAGTGAGGATGGACCGATCCTAATAGACAAAAGTGGAAAATTGGAACTCCAGA

TGGCTGATAGTTTTGACACAGGCGTATATCACTGTATAAGCAGCAATTATGATGATGCAGATATTCTCACCT

ATAGGATAACTCTGGTAGAACCTTTGGTCGAAGCCTATCAGGAAAATGGGATTCATCACACAGTTTTCATTG

GTGAAACACTTGATCTTCCATGCCATTCTACTGGTATCCCAGATGCCTCTATTAGCTGGGTTATTCCAGGAA

ACAATGTGCTCTATCAGTCATCAAGAGACAAGAAAGTTCTAAACAATGGCACATTAAGAATATTACAGGTCA,

CCCCGAAAGACCAAGGTTATTATCGCTGTGTGGCAGCCAACCCATCAGGGGTTGATTTTTTGATTTTCCAAG

TTTCAGTCAAGATGAAAGGACAAAGGCCCTTGGAGCATGATGOAGAAACAGAGGGATCTGGACTTGATGAGT

CCAATCCTATTGCTCATCTTAAGGAGCCACCAGGTCCACAACTCCGTACATCTGCTCTGATGGAGGCTGAGG

TTGGAAAACACACCTCAAGCACAAGTAAGAGGCACAACTATCGGGAATTAACACTCCAGCGACGTGGAGATT

CAACACATCGACGTTTTAGCGAGAATAOGAGGCATTTCCCTCCCTCTGCTAGGAGAATTGACCCACAACATT

GGGCOGCACTGTTGGAGAAAGCTAAAAAGAATGCTATGCCAGACAAGCGAGAAAATACCACAGTGAGCCCAC