(12) Patent Application Publication (10) Pub. No.: US 2006/0053498A1 Bejanin Et Al
Total Page:16
File Type:pdf, Size:1020Kb
US 20060053498A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0053498A1 Bejanin et al. (43) Pub. Date: Mar. 9, 2006 (54) FULL-LENGTH HUMAN CDNAS ENCODING Publication Classification POTENTIALLY SECRETED PROTEINS (51) Int. Cl. (76) Inventors: Stephane Bejanin, Rochechouart (FR); AOIK 67/00 (2006.01) Jean-Baptiste Dumas Milne Edwards, C07K I4/47 (2006.01) CI2O I/68 (2006.01) Paris (FR); Jean-Yves Giordano, Paris C07H 21/04 (2006.01) (FR); Severin Jobert, Paris (FR); CI2P 21/06 (2006.01) Hiroaki Tanaka, Antony (FR) (52) U.S. Cl. ............ 800/8; 435/6; 435/69.1; 435/320.1; Correspondence Address: 435/325; 530/350, 536/23.5 SALIWANCHIK LLOYD & SALWANCHIK (57) ABSTRACT A PROFESSIONAL ASSOCATION PO BOX 142950 The invention concerns GENSET polynucleotides and GAINESVILLE, FL 32614-2950 (US) polypeptides. Such GENSET products may be used as reagents in forensic analyses, as chromosome markers, as (21) Appl. No.: 10/475,075 tissue/cell/organelle-Specific markers, in the production of expression vectors. In addition, they may be used in Screen (22) PCT Filed: Apr. 18, 2001 ing and diagnosis assays for abnormal GENSET expression and/or biological activity and for Screening compounds that (86) PCT No.: PCT/IB01/00914 may be used in the treatment of GENSET-related disorders. Patent Application Publication Mar. 9, 2006 Sheet 1 of 4 US 2006/0053498A1 D 125A 125B 125C E." 1OO COMPUTER SYSTEM PROCESSOR MEMORY 120 DATA RETRIEMNG DISPLAY DEVICE Figure 1 Patent Application Publication Mar. 9, 2006 Sheet 2 of 4 US 2006/0053498A1 2O2 STORE NEW SEQUENCE TO A MEMORY 204 OPEN DATABASE OF SEQUENCES 2O6 READ FIRST SECUENCE IN DATABASE 21 O PERFORM COMPARISON OF NEW SEQUENCE AND 224 STORED SEQUENCE GO TO NEXT 212 SEQUENCE IN DATABASE YES MORE SEQUENCESN DATABASE? Patent Application Publication Mar. 9, 2006 Sheet 3 of 4 US 2006/0053498A1 252 250 \N, START 254 STORE A FIRST SEQUENCE TO A MEMORY 256 STORE A SECOND SECUENCE TO A MEMORY , 260 READ FIRST CHARACTER OF FIRST SEQUENCE 262 READ FIRST CHARACTER OF SECOND SEQUENCE 264 NO YES MORE CHARACTERS TO READ? NO 276 Figure 3 Patent Application Publication Mar. 9, 2006 Sheet 4 of 4 US 2006/0053498A1 30 3OO\N 304 STORE AFIRST SEQUENCE TO A MEMORY . 306 OPEN ADATABASE OF SEQUENCE FEATURES 308 READ FIRST FEATURE FROMDATABASE r 310 COMPARE FEATURE AT TRIBUTES WITH THE FIRST SEQUENCE 326 a. READ NEXT 316 FEATURE IN . DATABASE YES 38 DISPLAY FOUND FEATURE TO THE USER 320 MORE FEATURES IN DATABASE US 2006/0053498A1 Mar. 9, 2006 FULL-LENGTH HUMAN CDNAS ENCODING matics Software may mischaracterize the genomic Sequences POTENTIALLY SECRETED PROTEINS obtained, i.e., labeling non-coding DNA as coding DNA and Vice versa. CROSS REFERENCE TO RELATED APPLICATIONS 0006 An alternative approach takes a more direct route to identifying and characterizing human genes. In this 0001. The present application is related to and claims priority from U.S. provisional Application Nos. 60/197.873, approach, complementary DNAS (cDNAS) are Synthesized filed Apr. 18, 2000, 60/224,009, filed Aug. 7, 2000, 60/260, from isolated messenger RNAs (mRNAs) which encode 328, filed Jan. 8, 2001, and 60/224,006, filed Aug. 4, 2000, human proteins. Using this approach, Sequencing is only the entire disclosures of each of which is herein incorporated performed on DNA which is derived from protein coding by reference. fragments of the genome. Often, only short Stretches of the cDNAS are Sequenced to obtain Sequences called expressed FIELD OF THE INVENTION sequence tags (ESTs). The ESTs may then be used to isolate or purify cDNAS which include Sequences adjacent to the 0002 The present invention is directed to GENSET polypeptides, fragments thereof, and the regulatory regions EST sequences. The cDNAS may contain all of the sequence located in the 5'- and 3'-ends of the genes encoding the of the EST which was used to obtain them or only a fragment polypeptides. The invention also concerns polypeptides of the sequence of the EST which was used to obtain them. encoded by GENSET genes and fragments thereof. The In addition, the cDNAS may contain the full coding present invention also relates to recombinant vectors includ sequence of the gene from which the EST was derived or, ing the polynucleotides of the present invention, particularly alternatively, the cDNAS may include fragments of the recombinant vectors comprising a GENSET gene regulatory coding Sequence of the gene from which the EST was region or a Sequence encoding a GENSET polypeptide, and derived. It will be appreciated that there may be several to host cells containing the polynucleotides of the invention, cDNAS which include the EST sequence as a result of as well as to methods of making Such vectors and host cells. alternate splicing or the activity of alternative promoters. The present invention further relates to the use of these recombinant vectors and host cells in the production of the 0007. In the past, these short EST sequences were often polypeptides of the invention. The invention further relates obtained from oligo-dT primed cDNA libraries. Accord to antibodies that specifically bind to the polypeptides of the ingly, they mainly corresponded to the 3' untranslated region invention and to methods for producing Such antibodies and of the mRNA. In part, the prevalence of EST sequences fragments thereof. The invention also provides methods of derived from the 3' end of the mRNA is a result of the fact detecting the presence of the polynucleotides and polypep that typical techniques for obtaining cDNAS are not well tides of the present invention in a Sample, methods of Suited for isolating cDNA sequences derived from the 5' diagnosis and Screening of abnormal GENSET polypeptide ends of mRNAs (Adams et al, Nature 377:3-174, 1996, expression and/or biological activity, methods of Screening Hillier et al., Genome Res. 6:807-828, 1996). In addition, in compounds for their ability to modulate the activity or those reported instances where longer cDNA sequences have expression of the GENSET polypeptides, and uses of Such been obtained, the reported Sequences typically correspond compounds. to coding Sequences and do not include the full 5' untrans lated region (5'UTR) of the mRNA from which the cDNA is derived. Indeed, 5'UTRS have been shown to affect either the BACKGROUND OF THE INVENTION stability or translation of mRNAS. Thus, regulation of gene 0003) The estimated 30,000-12,000 genes scattered along expression may be achieved through the use of alternative the human chromosomes offer tremendous promise for the 5'UTRS as shown, for instance, for the translation of the understanding, diagnosis, and treatment of human diseases. tissue inhibitor of metalloprotease mRNA in mitogenically In addition, probes capable of Specifically hybridizing to loci activated cells (Waterhouse et al., J Biol Chem. 265:5585-9. distributed throughout the human genome find application in 1990). Furthermore, modification of 5'UTR through muta the construction of high resolution chromosome maps and in tion, insertion or translocation events may even be implied the identification of individuals. in pathogenesis. For instance, the Fragile X Syndrome, the most common cause of inherited mental retardation, is partly 0004. In the past, the characterization of even a single due to an insertion of multiple CGG trinucleotides in the human gene was a painstaking process, requiring years of 5'UTR of the Fragile X mRNA resulting in the inhibition of effort. Recent developments in the areas of cloning vectors, protein Synthesis via ribosome Stalling (Feng et al., Science DNA sequencing, and computer technology have merged to 268:731-4, 1995). An aberrant mutation in regions of the greatly accelerate the rate at which human genes can be 5'UTR known to inhibit translation of the proto-oncogene isolated, Sequenced, mapped, and characterized. c-myc was shown to result in upregulation of c-myc protein 0005 Currently, two different approaches are being pur levels in cells derived from patients with multiple myelomas Sued for identifying and characterizing the genes distributed (Willis et al., Curr Top Microbiol Immunol 224:269-76, along the human genome. In one approach, large fragments 1997). In addition, the use of oligo-dT primed cDNA librar of genomic DNA are isolated, cloned, and Sequenced. Poten ies does not allow the isolation of complete 5' UTRS since tial open reading frames in these genomic Sequences are Such incomplete Sequences obtained by this process may not identified using bio-informatics Software. However, this include the first exon of the mRNA, particularly in situations approach entails Sequencing large Stretches of human DNA where the first exon is short. Furthermore, they may not which do not encode proteins in order to find the protein include Some exons, often short ones, which are located encoding Sequences Scattered throughout the genome. In upstream of Splicing Sites. Thus, there is a need to obtain addition to requiring extensive Sequencing, the bio-infor sequences derived from the 5' ends of mRNAs. US 2006/0053498A1 Mar. 9, 2006 0008 Moreover, despite the great amount of EST data techniques. In Such applications, the extracellular Secretion that large-scale Sequencing projects have yielded (Adams et of the desired protein greatly facilitates purification by al., Nature 377:174, 1996; Hillier et a..., Genome Res. 6:807 reducing the number of undesired proteins from which the 828, 1996), information concerning the biological function desired protein must be Selected. Thus, there exists a need to of the mRNAS corresponding to such obtained cDNAS has identify and characterize the 5' fragments of the genes for revealed to be limited. Indeed, whereas the knowledge of the Secretory proteins which encode Signal peptides. complete coding Sequence is absolutely necessary to inves tigate the biological function of mRNAS, ESTs yield only 0011 Sequences coding for secreted proteins may also partial coding Sequences.