US009002652B1

(12) United States Patent (10) Patent No.: US 9,002,652 B1 H00d et al. (45) Date of Patent: Apr. 7, 2015

(54) METHODS FOR IDENTIFYING AND USING Blomberg et al., “Interlaboratory reproducibility of yeast ORGAN-SPECIFIC IN BLOOD patterns analyzed by immobilized pH gradient two-dimensional gel electrophoresis.” Electrophoresis (1995) 16:1935-1945. (75) Inventors: Leroy Hood, Seattle, WA (US); Brenner et al., “ expression analysis by massively parallel sig nature sequencing (MPSS) on microbead arrays.” Nat. Biotechnol. Biaoyang Lin, Bothell, WA (US) (2000) 18:630-634. Chen et al., “Secreted protein prediction system combining (73) Assignee: Institute for Systems Biology, Seattle, CJ-SPHMM, TMHMM, and PSORT.” Mamm Genome (2003) WA (US) 14:859-865. Corbett et al., “Positional reproducibility of protein spots in two (*) Notice: Subject to any disclaimer, the term of this dimensional polyacrylamide gel electrophoresis using immobilised patent is extended or adjusted under 35 pH gradient isoelectric focusing in the first dimension: an U.S.C. 154(b) by 1402 days. interlaboratory comparison.” Electrophoresis (1994) 15:1205-1211. Corthals et al., “Prefractionation of protein samples prior to two (21) Appl. No.: 11/342,366 dimensional electrophoresis.” Electrophoresis (1997) 18:317-323. Han et al., “Ouantum-dot-tagged microbeads for multiplexed optical (22) Filed: Jan. 27, 2006 coding of biomolecules.” Nat. Biotechnol. (2001) 19:631-635. Han et al., “Quantitative profiling of differentiation-induced Related U.S. Application Data microsomal proteins using isotope-coded affinity tags and mass spec trometry.” Nat. Biotechnol. (2001) 19:946-951. (60) Provisional application No. 60/647,685, filed on Jan. Heller et al., “Discovery and analysis of inflammatory disease-related 27, 2005, provisional application No. 60/683,071, using cDNA microarrays.” PNAS USA (1997) 94:2150-2155. filed on May 20, 2005. Jongeneel et al., “Comprehensive sampling of gene expression in human cell lines with massively parallel signature sequencing.” PNAS USA (2003) 100(8):4702-4705. (51) Int. Cl. Lopez and Patton, “Reproducibility of polypeptide spot positions in GOIN 33/48 (2006.01) two-dimensional gels run using carrier ampholytes in the isoelectric CI2O 1/70 (2006.01) focusing dimension.” Electrophoresis (1997) 18:338-343. GOIN33/68 (2006.01) Man et al., “POWER SAGE: comparing statistical tests for SAGE GO6K 9/OO (2006.01) experiments.” Bioinformatics (2000) 16:953-959. (52) U.S. Cl. Meyers et al., “The use of MPSS for whole-genome transcriptional CPC ...... G0IN33/68 (2013.01); G0IN 33/6893 analysis in Arabidopsis.” Genome Res. (2004) 14:1641-1653. (2013.01); G06K 9/00127 (2013.01) Ramsby et al., “Differential detergent fractionation of isolated hepatocytes: biochemical, immunochemical and two-dimensional (58) Field of Classification Search gel electrophoresis characterization of cytoskeletal and CPC ...... G06K9/00127; G01N33/68 noncytoskeletal compartments.” Electrophoresis (1994) 15:265-277. See application file for complete search history. Schena et al., “Parallel analysis: microarray-based expression monitoring of 1000 genes.” PNAS USA (1996)93: 10614 (56) References Cited 10619. Tuteja et al., “Serial analysis of gene expression (SAGE): unraveling U.S. PATENT DOCUMENTS the bioinformatics tools.” Bioessays. (2004) 26(8):916-922. 5,863,722 A 1/1999 Brenner Velculescu et al., “Analysing uncharted transcriptomes with SAGE.” 6,013,445. A 1/2000 Albrecht et al. Trends Genet (2000) 16:423-425. 6,140,489 A 10, 2000 Brenner 6,172.214 B1 1/2001 Brenner * cited by examiner 6,172,218 B1 1/2001 Brenner 6,539,102 B1* 3/2003 Anderson et al...... 382/128 Primary Examiner — Anna Skibinsky 2002fOO952.59 A1 7/2002 Hood et al. 2006, OOO9633 A9* 1/2006 Dumas Milne Edwards (74) Attorney, Agent, or Firm — Pabst Patent Group LLP et al...... 536,235 (57) ABSTRACT OTHER PUBLICATIONS The present invention relates generally to methods for iden Anderson and Seilhamer, "A comparison of selected mRNA and tifying organ-specific secreted proteins and for identifying protein abundances in human liver.” Electrophoresis (1997) 18:533 organ-specific molecular blood fingerprints therefrom. As 537. Such, the present invention provides compositions compris Bao et al., “High-sensitivity detection of DNA hybridization on ing Such proteins, detection reagents for detecting such pro microarrays using resonance light scattering.” Anal. Chem. (2002) teins, and panels, and arrays for determining organ-specific 74: 1792-1797. molecular blood fingerprints. Bendtsen et al., “Improved prediction of signal peptides: SignalP 3.0.” J. Mol. Biol. (2004) 340:783-795. 4 Claims, No Drawings US 9,002,652 B1 1. 2 METHODS FOR DENTIFYING AND USING density and electric field strength in cast gels can further ORGAN-SPECIFIC PROTEINS IN BLOOD distort the spatial pattern of resolved proteins. Another prob lem is the inability to resolve low abundance proteins neigh STATEMENT OF GOVERNMENT INTEREST boring high abundance proteins in a gel because of the high staining background and limited dynamic range of gel stain This invention was made with government Support under ing and imaging techniques. Limitations with staining also Grant Nos. P50 CA097.186 and P01 CA085857 awarded by make it difficult to obtain reproducible and quantifiable pro the National Cancer Institute. The government may have tein concentration values, with average standard variations in certain rights in this invention. relative protein abundance between replicate (2-DE) gels 10 reported to be 20% and as high as 45% (Anderson, L. and J. STATEMENT REGARDING SEQUENCE Seilhamer, Electrophoresis, 18:533 (1997)). For example, LISTING SUBMITTED ON CD-ROM investigators were only able to match 62% of the spots formed on 3-7 gels run under similar conditions (Lopez, M. F., and W. The Sequence Listing associated with this application is F. Patton, supra; see also Blomber, A., et al., Electrophoresis, provided on CD-ROM in lieu of a paper copy, and is hereby 15 16:1935 (1995) and Corbett, J. M., et al., Electrophoresis, incorporated by reference into the specification. Three CD 15:1205 (1994)). Additionally, many proteins are not soluble ROMs are provided, containing identical copies of the in buffers compatible with acrylamide gels, or fail to enter the sequence listing: CD-ROM No. 1 is labeled COPY 1, con gel efficiently because of their high molecular weight (see tains the file 401.app.txt which is 3.31 MB and created on Jan. e.g., Ramsby, M., et al., Electrophoresis, 15:265 (1994)). 27, 2006; CD-ROM No. 2 is labeled COPY 2, contains the file Thus, a major stumbling block in the diagnostic proteomic 401.app.txt with is 3.31 MB and created on Jan. 27, 2006; analysis of the blood is the high degree of complexity of the CD-ROM No. 3 is labeled CRF (Computer Readable Form), blood proteome. Another major challenge is the large contains the file 401.app.txt which is 3.31 KB and created on dynamic range across which proteins are expressed—about Jan. 27, 2006. 10e'. This means that one protein may be presentatone copy 25 in a given volume, whereas another may be present at 10e' BACKGROUND OF THE INVENTION copies. Additionally, pattern analysis using techniques such as 2-DGE and other similar techniques has been problematic 1. Field of the Invention primarily as a result of the irreproducibility of the gel pat The present invention relates generally to methods for terns, inability to detect very low abundance proteins, diffi identifying organ-specific proteins that are secreted into the 30 culty in quantitating the individual spots (e.g., proteins) that blood. The invention further relates to methods of diagnosis make up a complex proteomic pattern and the inability to and methods of use of such proteins. identify the individual proteins that constitute the complex 2. Description of the Related Art pattern. Further, the ability to extend these techniques to easy, The ability to detect the onset of disease very early has been consistent, and high throughput diagnostic assays has been a longtime goal of the diagnostic field. Early detection will in 35 extremely limited. Thus, there is a need in the art to provide most cases permit the disease to be effectively dealt with. For Such diagnostic assays. The present invention provides for example, with most cancers, early detection would permit a methods and assays that fulfill these and other needs. patient to be cured by conventional therapies (chemotherapy, radiation, Surgery). Hence early diagnosis is the cornerstone BRIEF SUMMARY OF THE INVENTION of dealing effectively with many diseases. 40 Differentially expressed proteins, particularly proteins One aspect of the invention provides a method for identi found in blood, may serve as biological markers that can be fying organ-specific proteins secreted into the blood compris measured for diagnostic (or therapeutic) purposes. Different ing, generating a signature sequence from transcripts from a approaches for measuring blood proteins have been used with sample from a specific organ; identifying transcripts that are varying degrees of Success. In particular, two-dimensional 45 specifically expressed in the organ; identifying from the tran (2-DE) gel electrophoresis is widely used for analysis of Scripts in (b) those transcripts that encode secreted proteins; proteomic patterns in blood and other tissues. However, sev and thereby identifying organ-specific proteins secreted into eral limitations restrict its utility in diagnostic proteomics. the blood. First, because (2-DE) gels are limited to spatial resolution, it Another aspect of the invention provides a method for is difficult to resolve large numbers of proteins such as are 50 identifying organ-specific proteins secreted into the blood expressed in the average cell (1,000 to 10,000 proteins) or comprising, generating a signature sequence from Substan even worse—blood. High abundance proteins can distort car tially all transcripts from a sample from a specific organ; rier ampholyte gradients in capillary isoelectric focusing comparing the signature sequences to a database of known electrophoresis (CGE) and result in crowding in the gel sequences to determine the identity of the transcript; compar matrix of size sieving electrophoretic methods (e.g., the sec 55 ing the identified transcripts to transcripts expressed in other ond dimension of (2-DE) gel electrophoresis and CGE), thus organs; removing any transcripts that are substantially causing irreproducibility in the spatial pattern of resolved expressed in other organs; identifying computationally from proteins (see e.g., Corthals, G. L., et al. Electrophoresis, the remaining transcripts those that encode a signal peptide; 18:317 (1997). Lopez, M. F., and W. F. Patton, Electrophore confirming the presence of the secreted proteins in a blood sis, 18:338 (1997)). Note, for example, that albumen consti 60 sample; and thereby identifying organ-specific proteins tutes about 51% of the blood protein. Indeed, 22 proteins secreted into the blood. constitute about 99% of the blood protein and most of these In a further aspect, the present invention provides a method will not be useful diagnostic markers—those will be present for diagnosing a biological condition in a subject comprising in the 1% of the remaining proteins that are often hidden by measuring the level of a plurality of organ-specific proteins in the abundant proteins. High abundance proteins can also pre 65 the blood of the subject, wherein the plurality of organ-spe cipitate in a gel and cause streaking of fractionated proteins cific proteins are secreted from the same organ and wherein (Corthals, G. L., et al., Supra). Variations in the crosslinking the levels of the plurality of organ-specific proteins together US 9,002,652 B1 3 4 provide a diagnostic fingerprint for the biological condition in prising, contacting a blood sample with a plurality of detec the subject. In one embodiment of the method, the level of the tion reagents each specific for an organ-specific protein plurality of organ-specific proteins is measured using any one secreted into blood, wherein each organ-specific protein is or more methods, Such as mass spectrometry, an immunoas secreted from the same organ; measuring the amount of the say such as an ELISA, Western blot, microfluidics/nanotech organ-specific protein detected in the blood sample by each nology sensors, and aptamer capture assay. In this regard, an detection reagent, comparing the amount of the organ-spe aptamer may be used in a similar manner to an antibody in a cific protein detected in the blood sample by each detection variety of appropriate binding assays known to the skilled reagent to a predetermined control amount for each organ artisan and described herein. In certain embodiments, the specific protein; wherein a statistically significant altered plurality of organ-specific proteins is measured using tandem 10 level in one or more of the organ-specific proteins indicates a mass spectrometry or other spectrometry-based techniques. perturbation in the normal biological state. Thus, in one In one embodiment, the plurality of organ-specific proteins embodiment, the predetermined control amount is deter comprises from at least about 1 or 2 organ-specific proteins to mined from one or more normal blood samples. The skilled about 100, 150, 160, 170, 180, 190, 200, or more organ artisan would readily appreciate that a variety of statistical specific proteins. In this regard, the plurality of organ-specific 15 tests can be used to determine if an altered level of a given proteins may comprise at least 2, 3, 4, 5, 6, 78, 9, 10, or more protein is significant. The Z-test (Man, M. Z. et al., Bioinfor organ-specific proteins. The plurality of organ-specific pro matics, 16: 953-959, 2000) or other appropriate statistical teins may comprise about 10 or 20 organ-specific proteins. In tests can be used to calculate P values for comparison of one embodiment, the organ-specific proteins comprise pros protein expression levels. In certain embodiments, the level of tate-specific proteins. In one embodiment, the prostate-spe each of the plurality of organ-specific proteins in the blood cific proteins are selected from the proteins listed in Table 4 sample from the Subject is compared to a previously deter and Table 5. In other embodiments, the organ-specific pro mined normal control level of each of the plurality of organ teins may be from any organ, such as liver, kidney, breast, specific proteins taking into account standard deviation (see ovary, etc. In one embodiment, the method is used to diagnose e.g., U.S. Patent Application No. 20020095259). In an addi any of a variety of biological conditions, such as cancer. In 25 tional embodiment the plurality of detection reagents com this regard, the cancer can be any cancer, Such as, but not prises from at least about 2 detection reagents to about 100, limited to, brain cancer, bladder cancer, prostate cancer, ova 150, 160, 170, 180, 190, 200, or more detection reagents. In a rian cancer, breast cancer, liver cancer, lung cancer, pancre further embodiment, the plurality of detection reagents com atic cancer, kidney cancer, and colon cancer. In a further prises about 5, 10 or about 20 detection reagents. In one embodiment, the biological condition is any one or combina 30 embodiment, the organ-specific proteins comprise prostate tion of the following: cardiovascular disease, metabolic dis specific proteins, liver-specific proteins, or breast-specific ease, infectious disease, genetic disease, autoimmune dis proteins. In this regard, the organ-specific proteins can be ease, and immune-related disease. from any organ, tissue, cell, or system as described further Another aspect of the present invention provides a method herein. for determining the presence or absence of disease in a subject 35 A further aspect of the present invention provides a diag comprising, detecting a level of each of a plurality of organ nostic panel for determining the presence or absence of dis specific proteins in a blood sample from the Subject, wherein ease in a Subject comprising, a plurality of detection reagents the plurality of organ-specific proteins are secreted from the each specific for detecting one of a plurality of organ-specific same organ; comparing the level of each of the plurality of proteins presentina blood sample; wherein the organ-specific organ-specific proteins in the blood sample from the Subject 40 proteins are secreted from the same organ and wherein detec to a level of the plurality of organ-specific proteins in a normal tion of the plurality of organ-specific proteins with the plu control sample of blood; wherein an altered level of one or rality of detection reagents results in a fingerprint indicative more of the plurality of organ-specific proteins in the blood is of the presence or absence of disease in the Subject. As noted indicative of the presence or absence of disease. As would be elsewhere herein, the term "subject' is intended to include readily appreciated by the skilled artisan, an altered level can 45 humans Thus, as further described herein, the organ-specific mean an increase in the level or a decrease in the level. In this molecular blood fingerprint is unique for a given disease and regard, the skilled artisan would readily appreciate that a further for a given stage of the disease and thus is a powerful variety of statistical tests can be used to determine if an altered diagnostic indicator. In one embodiment, the detection level is significant. The Z-test (Man, M. Z. et al., Bioinfor reagents comprise antibodies or antigen-binding fragments matics, 16: 953-959, 2000) or other appropriate statistical 50 thereof. In a further embodiment, the antibodies are mono tests can be used to calculate P values for comparison of clonal antibodies, or antigen-binding fragments thereof. In protein expression levels. In certain embodiments, the level of one embodiment, the panel comprises one or more detection each of the plurality of organ-specific proteins in the blood reagents. In yet a further embodiment, the plurality of detec sample from the Subject is compared to a previously deter tion reagents comprises from at least about 1 detection mined normal control level of each of the plurality of organ 55 reagent to about 100, 150, 160, 170, 180, 190, 200 or more specific proteins taking into account standard deviation. In detection reagents. In yet a further embodiment, the plurality one embodiment, the level of each of the plurality of organ of detection reagents comprises at least 2, 3, 4, 5, 6, 7, 8, 9, or specific proteins is detected using any one or more of a variety 10 detection reagents. In certain embodiments, the plurality of methods. Such as, but not limited to mass spectrometry, and of detection reagents comprises about 5, 10, or 20 detection immunoassays. In certain embodiments, the level of each of 60 reagents. In an additional embodiment, the organ-specific the plurality of organ-specific proteins is measured using proteins comprise prostate-specific, liver-specific, or breast mass spectrometry (e.g., tandem mass spectrometry) or an specific proteins. As would be recognized by the skilled arti immunoassay Such as an ELISA. In an additional embodi San upon reading the present disclosure, the organ-specific ment, the level of each of the plurality of organ-specific protein may be derived from any organ, tissue, cell, as proteins is measured using an antibody array. 65 described further herein. In a further embodiment, the panel is A further aspect of the present invention provides a method used for determining the presence or absence of a cancer. In for detecting perturbation of a normal biological state com this regard, the panel can be used to determine the presence or US 9,002,652 B1 5 6 absence of any cancer, including but not limited to any one or are secreted from the same organ and wherein the levels of the more of prostate cancer, ovarian cancer, breast cancer, liver one or more organ-specific proteins together provide a fin cancer, lung cancer, pancreatic cancer, kidney cancer, and gerprint for the biological condition in the subject; thereby colon cancer. In an additional embodiment, the panel can be diagnosing the biological condition. used to determine the presence or absence of any disease 5 A further aspect of the invention provides a method for including but not limited to the following diseases: cardio determining the presence or absence of disease in a subject vascular disease, metabolic disease, infectious disease, comprising, a) detecting the level of each of a plurality of genetic disease, autoimmune disease, immune-related dis organ-specific proteins in a blood sample from the Subject, ease, neurological disease and cancer. wherein the plurality of organ-specific proteins are secreted An additional aspect of the present invention provides an 10 from the same organ; b) comparing said level of each of the assay device comprising a panel of detection reagents plurality of organ-specific proteins in the blood sample from wherein each detection reagent in the panel, with the excep the subject to a previously-determined normal level of each of tion of a negative and positive control, is capable of specific the plurality of each organ-specific protein; wherein a statis interaction with one of a plurality of organ-specific proteins tically significant altered level of one or more of the plurality secreted into the blood, wherein the plurality of organ-spe 15 of organ-specific proteins in the blood of the Subject as com cific proteins are secreted from the same organ and wherein pared to the previously-determined normal level is indicative the pattern of interaction between the detection reagents and of the presence or absence of disease. In this regard, the the organ-specific proteins present in a blood sample is plurality of organ-specific proteins may be detected using any indicative of a biological condition. In certain embodiments, method described herein, such as mass spectrometry or an the pattern of interaction is the combination of a Snapshot of immunoassay. In one embodiment, the plurality of organ sorts, of the different quantitative levels of the organ-specific specific proteins is measured using an antibody array. proteins detected. Thus, in certain embodiments, the pattern A further aspect of the invention provides a method for of interaction is a set of numbers, each number corresponding detecting perturbation of a normal biological state in a subject to a level of a particular organ-specific protein. This set of comprising, a) contacting a blood sample from the Subject numbers and the specific organ-specific proteins that they 25 with a plurality of detection reagents each specific for an correspond to together make up the pattern of interaction organ-specific protein secreted into blood, wherein each (e.g., fingerprint) that defines a biological condition. organ-specific protein is secreted from the same organ; b) A further aspect of the present invention provides a method measuring the amount of the organ-specific protein detected for diagnosing a biological condition in a subject comprising in the blood sample by each detection reagent; c) comparing measuring the level of a plurality of organ-specific proteins in 30 the amount of the organ-specific protein detected in the blood the blood of the subject, wherein the organ-specific proteins sample by each detection reagent to a predetermined control are secreted from the same organ or specific to the same organ amount for each respective organ-specific protein; wherein a and wherein the levels of the plurality of organ-specific pro statistically significant altered level in one or more of the teins together provide a fingerprint for the biological condi organ-specific proteins indicates a perturbation in the normal tion in the Subject; thereby diagnosing the biological condi 35 biological state. tion. In one embodiment, a statistically significant altered Another aspect of the invention provides a method for level in one or more of the organ-specific proteins as com detecting perturbation of a normal biological state in a Sub pared to a predetermined normal level classifies the Subject as ject, comprising, a) contacting a blood sample with one or having a perturbation from the normal biological State. In this more detection reagents wherein the one or more detection regard, identifying altered levels in one or more of the organ 40 reagents are each specific for an organ-specific protein specific proteins as compared to predetermined normal levels secreted into blood, wherein the organ-specific proteins are can be used for classifying Subjects by disease and disease secreted from the same organ; b) measuring the amount of the stage or generally as having a perturbation from the normal organ-specific protein detected in the blood sample by the one biological state. In a further embodiment, the fingerprint is or more detection reagents; c) comparing the amount of the measured in the blood, serum or plasma of the Subject. In 45 organ-specific protein detected in the blood sample by the one certain embodiments, the plurality of organ-specific proteins or more detection reagents to a predetermined controlamount comprises at least 2 or more organ-specific proteins. In this for each respective organ-specific protein; wherein a statisti regard, the plurality of organ-specific comprises about 5, 6, 7, cally significant altered level in the one or more of the organ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 organ-specific specific proteins indicates a perturbation in the normal bio proteins. In certain embodiments, the biological condition 50 logical state. In this regard, the plurality of detection reagents affects the prostate and wherein the organ-specific proteins may comprises about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, are prostate-specific proteins. In a further embodiment, the 16, 17, 18, 19, or 20 detection reagents. In one embodiment, biological condition affects the breast and wherein the organ the perturbation from normal comprises perturbation of the specific proteins are breast-specific proteins. In yet a further prostate the organ-specific proteins are prostate-specific pro embodiment, the biological condition comprises a cancer. In 55 teins. In another embodiment, the perturbation comprises this regard, a cancer may include, but is not limited to, pros perturbation of the liver and the organ-specific proteins are tate cancer, ovarian cancer, breast cancer, liver cancer, lung liver-specific proteins. In yet a further embodiment, the per cancer, pancreatic cancer, kidney cancer, or colon cancer. In turbation comprises perturbation of the breast and the organ another embodiment, the biological condition may include specific proteins are breast-specific proteins. In this regard, but is not limited to cardiovascular disease, metabolic dis 60 the perturbation may comprise a perturbation of any organ as ease, infectious disease, genetic disease, autoimmune dis described herein. ease, immune-related disease, neurological disease and can Another aspect of the invention provides a diagnostic panel C. for determining the presence or absence of disease in a subject Another aspect of the invention provides a method for comprising, a plurality of detection reagents each specific for diagnosing a biological condition in a Subject comprising 65 detecting one of a plurality of organ-specific proteins present measuring the level of one or more organ-specific proteins in in a blood sample; wherein the organ-specific proteins are the blood of the subject, wherein the organ-specific proteins secreted from the same organ and wherein detection of the US 9,002,652 B1 7 8 plurality of organ-specific proteins with the plurality of detec BRIEF DESCRIPTION OF THE SEQUENCE tion reagents results in a fingerprint indicative of the presence IDENTIFIERS or absence of disease in the Subject. In one embodiment, the detection reagents comprise antibodies or antigen-binding SEQ ID NO:1 is the cDNA sequence that encodes the fragments thereof and in certain embodiments, the antibodies 5 WDR19 prostate specific secreted protein. or antigen-binding fragments thereof are monoclonal anti SEQ ID NO:2. is the amino acid sequence of the WDR19 bodies, or antigen-binding fragments thereof. prostate specific secreted protein. A further aspect of the invention provides a diagnostic SEQ ID NOS:3-72 are MPSS signature sequences that panel for determining the presence or absence of disease in a correspond to differentially expressed genes in LNCaP cells Subject comprising, one or more detection reagents each spe 10 (early prostate cancer phenotype) to androgen-independent cific for detecting an organ-specific protein present in a blood CL1 cells (late prostate cancer phenotype) (see Table 1). SEQ ID NOs:73-593 are MPSS signature sequences that sample; wherein the organ-specific proteins are secreted from correspond to differentially expressed genes in prostate can the same organ and wherein detection of the one or more cer cell lines LNCaP and CL1 that encode secreted proteins organ-specific proteins with the one or more of detection 15 (see Table 3). reagents results in a fingerprint indicative of the presence or SEQ ID NOS:594-1511 are the GENBANK sequences of absence of disease in the Subject. In one embodiment, the differentially expressed genes that encode predicted secreted plurality of detection reagents comprises about 5, 6, 7, 8, 9. proteins as referred to in Table 3. Both polynucleotide and 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 detection reagents. amino acid sequences are provided for each GENBANK In a further embodiment, the organ-specific proteins com accession number. prise prostate-specific proteins, liver-specific proteins, SEQ ID NOs: 1512-1573 are the amino acid sequences breast-specific proteins. In another embodiment, the disease from GENBANK of prostate-specific proteins potentially comprises a cancer. In this regard, the cancer may include but secreted into blood as described in Table 4. is not limited to prostate cancer, ovarian cancer, breast cancer, SEQID NOs: 1574-1687 are the GENBANK sequences of liver cancer, lung cancer, pancreatic cancer, kidney cancer, or 25 examples of differentially expressed genes as described in colon cancer. In another embodiment the disease may Table 1. Both polynucleotide and amino acid sequences are include, but is not limited to, cardiovascular disease, meta provided where available for each GENBANK accession bolic disease, infectious disease, genetic disease, autoim number. mune disease, immune-related disease, neurological disease SEQ ID NOs: 1688-1796 are MPSS signature sequences 30 O CaC. that correspond to prostate-specific/enriched genes as Another aspect of the invention provides a method for described in Table 5. SEQID NOs: 1797-1947 are the GEN identifying organ-specific proteins secreted or shed into the BANK sequences of prostate-specific genes as described in blood comprising, generating a signature sequence from tran Table 5. Both polynucleotide and amino acid sequences are Scripts from a sample from a specific organ; identifying tran provided where available for each GENBANK accession Scripts that are specifically expressed in the organ; identifying 35 number. from the transcripts in (b) those transcripts that encode DETAILED DESCRIPTION OF THE INVENTION secreted proteins; thereby identifying organ-specific proteins secreted or shed into the blood. A powerful new systems approach to disease is revealing A further aspect of the invention provides a method for 40 powerful new blood diagnostics approaches. Particularly, in identifying organ-specific proteins secreted or shed into the specific cells there are protein and gene regulatory networks blood comprising, generating a signature sequence from tran that mediate the normal functions of the cell. The disease Scripts from a sample from a specific organ; identifying tran process causes one or more of these networks to be perturbed, Scripts that are expressed in the specific organ at at least 1.5 either genetically or environmentally (e.g. infections). The fold as compared to the level of expression of the transcript 45 disease-altered networks result in altered patterns of protein observed in other organs; identifying from the transcripts in expression—and some of the transcripts with altered expres (b) those transcripts that encode secreted proteins; thereby sion levels are organ (cell)-specific and some of these organ identifying organ-specific proteins secreted or shed into the specific transcripts encode secreted proteins. Thus, disease blood. leads to altered expression patterns of organ-specific, Another aspect of the invention provides a computer sys 50 secreted proteins in the blood. tem for processing data relating to organ-specific molecular Hence the blood may be viewed as a window into the health blood fingerprints, comprising: means operable to receive and disease of an individual. The levels of organ-specific input identifying an organ-specific molecular blood finger secreted proteins present in the blood taken together represent print; an organ-specific molecular blood fingerprint database, molecular fingerprints in the blood that reflect the operation the organ-specific molecular blood fingerprint database being 55 of normal organs. Each organ has a specific quantitative a computer-readable collection of information about a set of molecular fingerprint. When disease attacks an organ, that organ-specific molecular blood fingerprints, the set including blood fingerprint changes, for example, in the levels of these defined normal blood fingerprints from normal samples and proteins expressed in the blood and the change in the finger defined disease blood fingerprints from Samples from indi print correlates with the specific disease. The changes in the viduals diagnosed with a particular disease; means operable 60 fingerprints occuras a consequence of virtually any disease or to receive organ-specific fingerprint information from a Sub organ perturbation with each disease fingerprint being ject; means operable to use the organ-specific molecular unique. The changes in the fingerprints are sufficiently infor blood fingerprint database and the organ-specific fingerprint mative to carry out disease stratification, follow the progres information from the Subject to match the Subject fingerprint sion of the particular disease stratification or type and follow to a disease fingerprint, to a normal fingerprint, or to identify 65 responses to therapy. These fingerprints also allow one to a fingerprint that is perturbed from normal but does not match stratify patients with regard to their ability to respond to to a disease fingerprint in the database. particular therapies and even to visualize adverse effects of US 9,002,652 B1 9 10 drugs. The disease fingerprints are determined by comparing will be perturbed. Hence the present invention allows detec the blood from normal individuals against that from patients tion of virtually any type of disease and detection of each with specific diseases at known stages. Not only will the disease at a very early stage. absolute levels of the changes in the proteins constituting Methods for Identifying Organ-Specific Proteins Secreted individual fingerprints be determined, but all the protein Into the Blood changes (e.g. N changed proteins) will be compared against The invention provides methods for identifying organ-spe one another to generate an N-dimensional shape space that cific secreted proteins. In this regard, as used herein, the term will correlate even more powerfully with the disease stratifi “organ” is defined as would be understood in the art. Thus, the cations and progression states described above (see e.g., U.S. term, "organ-specific' as used herein generally refers to pro 10 teins (or transcripts) that are primarily expressed in a single Patent Application No. 20020095259). organ. It should be noted that the skilled artisan would readily In the studies described herein, the transcriptomes of two appreciate upon reading the instant specification that cell prostate cancer cell lines were analyzed: LNCaP, an androgen specific transcripts and proteins and tissue-specific tran sensitive cell line, and hence a model for early stage of pros Scripts and proteins are also contemplated in the present tate cancer, and a variant of this cell, CL1, an androgen 15 invention. As such, and as discussed further herein, in certain unresponsive cell line, thus, a model for late stage of prostate embodiments, organ-specific protein is defined as a protein cancer. Analyses of the transcriptomes of these two cell lines encoded by a transcript that is expressed at a level of at least revealed changes in cellular states that occur with the pro 3 copies/million (as measured, for example, by massively gression of prostate cancer. These transcriptomes were also parallel signature sequencing (MPSS) in the cell/tissue/organ compared to normal prostate tissue, prostate cancer tissues of interest but is expressed at less than 3 copies/million in and prostate cancer metastases. These prostate transcrip other cells/tissues/organs. In a further embodiment, an organ tomes were compared against their counterparts from 29 specific protein is one that is encoded by a transcript that is other tissues to identify those transcripts that are primarily expressed 95% in one organ and the remaining 5% in one or expressed in the prostate. Computational approaches were more other organs. (In this context, total expression across all used to predict which of these transcripts encode secreted 25 organs examined is taken as 100%). proteins. Further, a prostate protein, referred to as WDR19, In certain embodiments, an organ-specific protein is one that was previously shown by microarray and northern analy that is encoded by a transcript that is expressed at about 50%, sis to be prostate-specific, was used in a multiparameter 55%, 60%. 65%, 70%, 75%, 80% to about 90% in one organ analysis of prostate cancer samples. and wherein the remaining 10%-50% is expressed in one or Thus, the present invention is generally directed to meth 30 more other organs. As would be readily recognized by the ods for identifying organ-specific secreted proteins present in skilled artisan upon reading the present disclosure, in certain the blood. The present invention is also directed to methods embodiments, an organ-specific molecular blood fingerprint for defining organ-specific molecular blood fingerprints and can readily be discerned even if some expression of an further provides defined examples of predicted organ-specific "organ-specific' protein from a particular organ is detected at molecular blood fingerprints. Additionally, the present inven 35 Some level in another organ, or even more than one organ. For tion is directed to panels of reagents or proteomic techniques example, the organ-specific molecular blood fingerprint from employing mass spectrometry that detect organ-specific prostate can conclusively identify a particular prostate dis secreted proteins in the blood for use in diagnostics and other ease (and stage of disease) despite expression of one or more Settings. protein members of the fingerprint in one or more other The blood fingerprints described herein enable physicians 40 organs. Thus, an organ-specific protein as described herein to develop a powerful new predictive medicine that can serve may be predominantly or differentially expressed in an organ as one of the cornerstones for a revolution in medicine, mov of interest rather than uniquely or specifically expressed in the ing it from a reactive mode (treating after the patient is sick) organ. In this regard, in certain embodiments, differentially to more predictive, preventive and personalized modes. expressed means at least 1.5 fold expression in the organ of By predefining the components of a given molecular blood 45 interest as compared to other organs. In another embodiment, fingerprint using the methods described herein, the present differentially expressed means at least 2 fold expression in the invention alleviates the need to blindly search for protein organ of interest as compared to expression in other organs. In patterns using blood proteomics. Thus, the present invention yet a further embodiment, differentially expressed means at enables the skilled artisan to 1) identify blood proteins which least 2.5, 3, 3.5, 4, 4.5, 5 fold or higher expression in the organ collectively constitute unique molecular blood fingerprints 50 of interest as compared to expression of the protein in other for healthy and diseased individuals; 2) identify unique fin organs. As described elsewhere herein, “protein’ expression gerprints for each different disease; 3) identify fingerprints can be determined by analysis of transcript expression using that can uniquely distinguish the different types of aparticular a variety of methods. disease (e.g., for prostate cancer, the ability to distinguish In one embodiment, the organ-specific proteins are identi between benign disease, slowly growing disease and rapidly 55 fied by preparing a cDNA library from an organ of interest. metastatic disease); 4) identify fingerprints that can reveal the Any organ of a mammalian body is contemplated herein. stage of progression of each type of disease, and 5) finger Illustrative organs include, but are not limited to, heart, kid prints that will allow one to assess the response to therapy. ney, ureter, bladder, urethra, liver, prostate, heart, blood ves Importantly, the potential organ-specific, secreted disease sels, bone marrow, skeletal muscle, Smooth muscle, brain detecting blood fingerprints can be predicted from a combi 60 (amygdala, caudate nucleus, cerebellum, corpus callosum, nation of quantitative comparative transcriptome studies and fetal, hypothalamus, thalamus), spinal cord, peripheral computational methods to predict which transcripts encode nerves, retina, nose, trachea, lungs, mouth, salivary gland, secreted proteins. The methods for determining the organ esophagus, stomach, Small intestines, large intestines, hypo specific, blood fingerprints for all organs described herein thalamus, pituitary, thyroid, pancreas, adrenal glands, ova allow disease detection at very early stages, since even in the 65 ries, Oviducts, uterus, placenta, Vagina, mammary glands, earliest disease stages, the cellular networks which control testes, seminal vesicles, penis, lymph nodes, PBMC, thymus, the expression patterns of these blood molecular signatures and spleen. As noted above, upon reading the present disclo US 9,002,652 B1 11 12 Sure, the skilled artisan would recognize that cell-specific and sample taken of template-tag conjugates ensures that essen tissue-specific proteins are contemplated herein and thus, tially every template in the sample is conjugated to a unique proteins specifically expressed in cells or tissues that make up tag and that at least one of each of the different template Such organs are also contemplated herein. In certain embodi cDNAs is represented in the sample with >99% probability ments, in each of these organs transcriptomes are obtained for 5 (U.S. Pat. No. 6,013,445 and Brenner, P., et al., supra). The the cell types in which the disease of interest arises. For conjugates are then amplified and hybridized under stringent example, in the prostate there are two dominant types of conditions to microbeads each of which has attached thereto cells—epithelial cells and stromal cells. About 98% of pros a unique complementary, minimally cross-hybridizing oligo tate cancers arise in epithelial cells. As such, in certain nucleotide tag. The transcripts are then directly sequenced embodiments, “organ-specific’ means the transcripts that are 10 simultaneously in a flow cell using a ligation-based sequenc expressed in particular cell types of the organ of interest (e.g., ing method (see e.g., U.S. Pat. No. 6,013.445). A short sig prostate epithelial cells). In this regard, any cell type that nature sequence of about 17-20 base pairs is generated simul makes up any of the organs described herein is contemplated taneously from each of the hundreds of thousands of beads (or herein. Illustrative cell types include, but are not limited to, more) in the flow cell, each having attached thereto copies of epithelial cells, stromal cells, endothelial cells, endodermal 15 a unique transcript from the sample. This technique is termed cells, ectodermal cells, mesodermal cells, lymphocytes (e.g., massively parallel signature sequencing (MPSS). B cells and T cells including CD4+ T helper 1 or T helper 2 In certain embodiments, other techniques may be used to type cells, CD8+ cytotoxic T cells), erythrocytes, kerati evaluate the transcripts from a particular cDNA library, nocytes, and fibroblasts. Particular cell types within organs or including microarray analysis (Han, M., et al., Nat Biotech tissues may be obtained by histological dissection, by the use mol, 19:631-635, 2001; Bao, P., et al., Anal Chem, 74: 1792 of specific cell lines (e.g., prostate epithelial cell lines), by cell 1797, 2002; Schena et al., Proc. Natl. Acad. Sci. USA sorting or by a variety of other techniques known in the art. 93: 10614-19, 1996; and Heller et al., Proc. Natl. Acad. Sci. It should be noted that in certain embodiments, fingerprints USA 94:2150-55, 1997) and SAGE (serial analysis of gene can be determined from "organ-specific' proteins from mul expression). Like MPSS, SAGE is digital and can generate a tiple organs, such as from organs that share a common func 25 large number of signature sequences. (see e.g., Velculescu, V. tion or make up a system (e.g., digestive system, circulatory E., et al., Trends Genet, 16: 423-425, 2000; Tuteja R. and system, respiratory system, cardiovascular system, the Tuteja N. Bioessays. 2004 August; 26(8):916-22) although immune system (including the different cells of the immune the coverage is not nearly as deep as with MPSS. system, Such as, but not limited to, B cells, T cells including The resulting sequences, (e.g., MPSS signature CD4+ T helper 1 or T helper 2 type cells, regulatory T cells, 30 sequences), are generally about 20 bases in length. However, CD8+ cytotoxic T cells, NK cells, dendritic cells, macroph in certain embodiments, the sequences can be about 10, 15. ages, monocytes, neutrophils, granulocytes, mast cells, etc.). 20, 25, 30,35, 40, 45, 50,55, 60, 65,70, 75, 80, 85,90, 95, or the sensory system, the skin, brain and the nervous system, 100 or more bases in length. The sequences are annotated and the like). using annotated human genome sequence (Such as human Complementary DNA (cDNA) libraries can be generated 35 genome release hg16, released in November, 2003, or other using techniques known in the art, Such as those described in public or private databases) and the human Unigene (Unigene Ausubeletal. (2001 Current Protocols in Molecular Biology, build #184) using methods known in the art, such as the Greene Publ. Assoc. Inc. & John Wiley & Sons, Inc., NY. method described by Meyers, B.C., et al., Genome Res, 14: N.Y.); Sambrooketal. (1989 Molecular Cloning, Second Ed., 1641-1653, 2004. Other databases useful in this regard Cold Spring Harbor Laboratory, Plainview, N.Y.); Maniatiset 40 include Genbank, EMBL, or other publicly available data al. (1982 Molecular Cloning, Cold Spring Harbor Labora bases. In certain embodiments, transcripts are considered tory, Plainview, N.Y.) and elsewhere. Further, a variety of only for those with 100% matches between an MPSS or other commercially available kits for constructing cDNA libraries type of signature and a genome signature. As would be readily are useful for making the cDNA libraries of the present inven appreciated by the skilled artisan upon reading the present tion. Libraries are constructed from organs/tissues/cells pro- 45 disclosure, this is a stringent match criterion and in certain cured from normal Subjects. embodiments, it may be desirable to use less stringent match All or substantially all of the transcripts of the cDNA criteria. Indeed, polymorphisms could lead to variations in library, e.g., representing virtually or Substantially all genes transcripts that would be missed if only exact matches were functioning in the organ of interest, are cloned and sequenced used. For example, it may be desirable to consider signature using any of a variety of techniques known in the art. In this 50 sequences that match a genome signature with 80%, 85%, regard, in certain embodiments, Substantially all refers to a 90%. 95%, 96%, 97%, 98%, or 99% identity. In one embodi sample representing at least 80% of all genes functioning in ment, signatures that are expressed at less than 3 transcripts the organ of interest. In a further embodiment, substantially per million in libraries of interest are disregarded, as they all refers to a sample representing at least 85%, 90%. 95%, might not be reliably detected since this, in effect, represents 96%, 97%, 98% 99% or higher of all genes functioning in the 55 less than one transcript per cell (see for example, Jongeneel, organ of interest. In one embodiment, Substantially all the C. V., et al., Proc Natl AcadSci USA, 2003). cDNA signatures transcripts from a cDNA library are amplified, sorted and are classified by their positions relative to polyadenylation signature sequences generated therefrom according to the signals and poly (A) tails and by their orientation relative to methods described in U.S. Pat. Nos. 6,013,445; 6,172,218: the 5'-->3' orientation of source mRNA. Full-length sequences 6,172,214, 6,140,489 and Brenner, P., et al., Nat Biotechnol, 60 corresponding to the signature sequences can be thus identi 18:630-634 2000. Briefly, polynucleotide templates from a fied. cDNA library of interest are cloned into a vector system that In order to identify organ-specific transcripts, the resulting contains a vast set of minimally cross-hybridizing oligo annotated transcripts are compared against public and/or pri nucleotide tags (see U.S. Pat. No. 5,863,722). The number of vate sequence databases. Such as a variety of annotated tags is usually at least 100 times greater than the number of 65 human genome sequence databases (e.g., Genebank, the cDNA templates (see e.g., U.S. Pat. No. 6,013,445 and Bren EMBL and Japanese databases and databases generated and ner, P. et al., Supra). Thus, the set of tags is such that a 1% compiled from other normal tissues, to identify those tran US 9,002,652 B1 13 14 Scripts that are expressed primarily in the organ of interest but fingerprint (e.g., the combination of the levels of multiple are not expressed in other organs. As noted elsewhere herein, proteins; the pattern of the expression levels of multiple Some expression in organs other than the organ of interest markers) itself is unique despite that the expression levels of does not necessarily preclude the use of a particular transcript one or more individual members of the fingerprint may not be in a blood molecular signature panel of the present invention. 5 unique to a particular organ. For example, the organ-specific Comparisons of the transcripts between databases can be molecular blood fingerprint from prostate can conclusively made using a variety of computer analysis algorithms known identify a particular prostate disease (and stage of disease) in the art. As such, alignment of sequences for comparison despite some expression of one or more members of the may be conducted by the local identity algorithm of Smith fingerprint in one or more other organs. Thus the present and Waterman (1981) Add. APL. Math 2:482, by the identity 10 invention relates to determining the presence or absence of a alignment algorithm of Needleman and Wunsch (1970) J. disease or condition or stage of disease based on a pattern Mol. Biol. 48:443, by the search for similarity methods of (e.g., fingerprint) of markers measured concurrently using Pearson and Lipman (1988) Proc. Natl. Acad. Sci. USA 85: any one or more of a variety of methods described herein 2444, by computerized implementations of these algorithms (e.g., antibody binding, mass spectrometry, and the like), (GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wis- 15 rather than the measure of individual markers. consin Genetics Software Package, Genetics Computer In further embodiments, specificity can be confirmed at the Group (GCG), 575 Science Dr. Madison, Wis.), or by inspec protein level using immunohistochemistry (IHC) and/or tion. As would be understood by the skilled artisan, many other protein measurement techniques known in the art (e.g., algorithms are available and are continually being developed. isotope-coded affinity tags and mass spectrometry, Such as Appropriate algorithms can be chosen based on the specific 20 described by Han, D. K., et al., Nat Biotechnol, 19:946-951, needs for the comparisons being made (See also, e.g., J. A. 2001). The Z-test (Man, M. Z. et al., Bioinformatics, 16: Cuff, et al., Bioinformatics, 16(2):111-116, 2000; S. F Alts 953-959, 2000) or other appropriate statistical tests can be chul and B. W. Erickson. Bulletin of Mathematical Biology, used to calculate P values for comparison of gene and protein 48(5/6):603-616, 1986; S. F. Altschul and B. W. Erickson. expression levels between libraries from organs of interest. Bulletin of Mathematical Biology, 48(5/6):633-660, 1986; S. 25 Organ-specific sequences identified as described hereinare F. Altschul, et al., J. Mol. Bio., 215:403-410, 1990; K. Bucka further analyzed to determine which of the sequences encode Lassen, et al., BIOINFORMATICS, 15(2):122-130, 1999; secreted proteins. Proteins with signal peptides (classical K.-M. Chao, et al., Bulletin of Mathematical Biology, 55(3): secretory proteins) can be predicted using computation analy 503-524, 1993: W. M. Fitch and T. F. Smith. Proceedings of sis known in the art. Illustrative methods include, but are not the National Academy of Sciences, 80: 1382-1386, 1983: A. 30 limited to the criteria described by Chen et al., Mamm D. Gordon. Biometrika, 60:197-200, 1973; O. Gotoh. J Mol Genome, 14: 859-865, 2003. In certain embodiments, such Biol, 162:705-708, 1982: O. Gotoh. Bulletin of Mathematical analyses are carried out using prediction servers, for example Biology, 52(3):359-373, 1990; X. Huang, et al., CABIOS, SignalP 3.0 server developed by The Center for Biological 6:373-381, 1990; X. Huang and W. Miller. Advances in Sequence Analysis, Lyngby, Denmark (httpcolon double Applied Mathematics, 12:337-357, 1991; J. D. Thompson, et 35 slash dot www dot cbs dot ditu dot dkslash services slash al., Nucleic Acids Research, 27(13):2682-2690, 1999). SignalP-3.0; see also, J. D. Bendtsen, et al., J. Mol. Biol., In certain embodiments, a particular transcript is consid 340:783-795, 2004.) and the TMHMM2.0 server (see for ered to be organ-specific when the number of transcripts/ example A. Krogh, et al., Journal of Molecular Biology, 305 million as determined by MPSS is 3 or greater in the organ of (3):567-580, January 2001; E. L. L. Sonnhammer, et al. In J. interest but is less than 3 in all other organs. In another 40 Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and embodiment, a transcript is considered organ-specific if it is C. Sensen, editors, Proceedings of the Sixth International expressed in the organ of interest at a detectable level using a Conference on Intelligent Systems for Molecular Biology, standard measurement (e.g., microarray analysis, quantita pages 175-182, Menlo Park, Calif., 1998. AAAI Press). Other tive real-time RT-PCR, MPSS, etc.) in the organ of interest prediction methods that can be used in the context of the but is not detectably expressed in other organs, using appro- 45 present invention include those described for example, in S. priate negative and positive controls as would be familiar to Moller, M. D. R. et al., Bioinformatics, 17(7):646-653, July the skilled artisan. In a further embodiment, an organ-specific 2001. Nonclassical secretory secreted proteins (without sig transcript is one that is expressed 95% in one organ and the nal peptides) can be predicted using, for example, the Secre remaining 5% in one or more other organs. (In this context, tomeP 2.0 server, (http colon double slash www dot cbs dot total expression across all organs examined is taken as 100%). 50 dtu dot dk slash services slash SecretomeP-2.0 slash/) with an In certain embodiments, an organ-specific transcript is one odds ratio score >3.0. Updated versions of these analysis that is expressed at about 50%, 55%, 60%. 65%, 70%, 75%, programs are also contemplated for use in the present meth 80% to about 90% in one organ and wherein the remaining ods as are other methods known in the art (e.g., PSORT (http 10%-50% is expressed in one or more other organs. colon double slash psort dot nibb dot acdot jp slash/) and In another embodiment, organ-specific transcripts are 55 Sigfind (http colon double slash 139.91.72.10 slash sigfind identified by determining the ratio of expression of a tran slash sigfind dot html). Script in the organ of interest as compared to other organs. In Confirmation that the identified secreted proteins are this regard, expression levels in the organ of interest of at least present in blood can be carried out using a variety of methods 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0 fold or higher known in the art. For example, the proteins can be expressed, as compared to expression in all other organs is considered to 60 purified, and specific antibodies can be made against them. be organ-specific expression. The specificantibodies can then be used to test the presence of As would be readily recognized by the skilled artisan upon the protein in blood/serum/plasma by a variety of immunoaf reading the present disclosure, in certain embodiments, an finity based techniques (e.g., immunoblot, Western analysis, organ-specific molecular blood fingerprint can readily be dis immunoprecipitation, ELISA, etc.). Antibodies specific for cerned even if some expression of an “organ-specific' protein 65 the organ-specific protein identified hereincan also be used to from a particular organ is detected at Some level in another study expression patterns of the identified proteins. It should organ, or even more than one organ. This is because the be noted that in certain circumstances, the secreted protein US 9,002,652 B1 15 16 may not be detectable in normal blood samples but will be provided herein. In addition, or alternatively, variants may detected in the blood as a result of perturbation due to disease contain additional amino acid sequences (such as, for or other environmental factors. Accordingly, both normal and example, linkers, tags and/or ligands), usually at the amino disease samples are tested for the presence of the Secreted and/or carboxy termini. Such sequences may be used, for protein and particularly for changes in levels of expression in example, to facilitate purification, detection or cellular uptake the two states. As an alternative, aptamers (short DNA or of the polypeptide. RNA fragments with binding complementarity to the proteins When comparing polypeptide sequences, two sequences of interest) may be used in assays similar to those described are said to be “identical” if the sequence of amino acids in the for antibodies (see for example, Biotechniques. 2001 Febru two sequences is the same when aligned for maximum cor ary: 30(2):290-2,294–5; Clinical Chemistry. 1999: 45:1628 10 respondence, as described below. Comparisons between two 1650). In addition, antibodies or aptamers may be used in sequences are typically performed by comparing the connection with nanowires to create highly sensitive detec sequences over a comparison window to identify and com tions systems (see e.g., J. Heath et al., Science. 2004 Dec. 17: pare local regions of sequence similarity. A "comparison 306(5704):2055-6). In further embodiments, mass spectrom window' as used herein, refers to a segment of at least about etry-based methods can be used to confirm the presence of a 15 20 contiguous positions, usually 30 to about 75, 40 to about particular protein in the blood. 50, in which a sequence may be compared to a reference As would be recognized by the skilled artisan, while the sequence of the same number of contiguous positions after organ-specific secreted proteins, the levels of which make up the two sequences are optimally aligned. a given fingerprint, need not be isolated, in certain embodi Optimal alignment of sequences for comparison may be ments, it may be desirable to isolate such proteins (e.g., for conducted using the Megalign program in the Lasergene Suite antibody production). As such, the present invention provides of bioinformatics software (DNASTAR, Inc., Madison, for isolated organ-specific secreted proteins or fragments or Wis.), using default parameters. This program embodies sev portions thereof and polynucleotides that encode such pro eral alignment schemes described in the following refer teins. As used herein, the terms protein and polypeptide are ences: Dayhoff, M. O. (1978) A model of evolutionary used interchangeably. The terms “polypeptide' and “protein’ 25 change in proteins—Matrices for detecting distant relation encompass amino acid chains of any length, including full ships. In Dayhoff, M.O. (ed.) Atlas of Protein Sequence and length endogenous (i.e., native) proteins and variants of Structure, National Biomedical Research Foundation, Wash endogenous polypeptides described herein. Illustrative ington D.C. Vol. 5, Suppl. 3, pp. 345-358; Hein J. (1990) polypeptides of the present invention are described in Table 1 Unified Approach to Alignment and Phylogenes pp. 626-645 and Tables 3-5, the section entitled “Brief Description of the 30 Methods in Enzymology Vol. 183, Academic Press, Inc., San Sequence Identifiers' and are set forth in the sequence listing. Diego, Calif.; Higgins, D. G. and Sharp, P. M. (1989) “Variants” are polypeptides that differ in sequence from the CABIOS 5:151-153; Myers, E. W. and Muller W. (1988) polypeptides of the present invention only in Substitutions, CABIOS 4:11-17: Robinson, E. D. (1971) Comb. Theor deletions and/or other modifications, such that either the vari 11:105; Saitou, N. Nei, M. (1987) Mol. Biol. Evol. 4:406-425; ants disease-specific expression patterns are not significantly 35 Sneath, P. H. A. and Sokal, R. R. (1973) Numerical Tax altered or the polypeptides remain useful for diagnostics/ Onomy—the Principles and Practice of Numerical Taxonomy, detection of organ-specific blood fingerprints as described Freeman Press, San Francisco, Calif.; Wilbur, W.J. and Lip herein. For example, modifications to the polypeptides of the man, D.J. (1983) Proc. Natl. Acad, Sci. USA 80:726-730. present invention may be made in the laboratory to facilitate Alternatively, optimal alignment of sequences for com expression and/or purification and/or to improve immunoge 40 parison may be conducted by the local identity algorithm of nicity for the generation of appropriate antibodies and other Smith and Waterman (1981) Add. APL. Math 2:482, by the binding agents, etc. Modified variants (e.g., chemically modi identity alignment algorithm of Needleman and Wunsch fied) of the polypeptides of organ-specific, Secreted proteins (1970).J. Mol. Biol. 48:443, by the search for similarity meth may be useful herein, (e.g., as standards in mass spectrometry ods of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. USA analyses of the corresponding proteins in the blood, and the 45 85: 2444, by computerized implementations of these algo like). As such, in certain embodiments, the biological func rithms (GAP, BESTFIT, BLAST, FASTA, and TFASTA in the tion of a variant protein is not relevant for utility in the Wisconsin Genetics Software Package, Genetics Computer methods for detection and/or diagnostics described herein. Group (GCG), 575 Science Dr. Madison, Wis.), or by inspec Polypeptide variants generally encompassed by the present tion. invention will typically exhibit at least about 70%, 75%,80%, 50 Illustrative examples of algorithms that are suitable for 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, determining percent sequence identity and sequence similar 95%, 96%, 97%, 98%, or 99% or more identity along its ity include the BLAST and BLAST 2.0 algorithms, which are length, to a polypeptide sequence set forth herein. Within a described in Altschuletal. (1977) Nucl. Acids Res. 25:3389 polypeptide variant, amino acid Substitutions are usually 3402 and Altschulet al. (1990).J. Mol. Biol. 215:403-410, made at no more than 50% of the amino acid residues in the 55 respectively. BLAST and BLAST 2.0 can be used, for native polypeptide, and in certain embodiments, at no more example, to determine percent sequence identity for the poly than 25% of the amino acid residues. In certain embodiments, nucleotides and polypeptides of the invention. Software for Such substitutions are conservative. A conservative substitu performing BLAST analyses is publicly available through the tion is one in which an amino acid is substituted for another National Center for Biotechnology Information. amino acid that has similar properties, such that one skilled in 60 An isolated polypeptide is one that is removed from its the art of peptide chemistry would expect the secondary struc original environment. For example, a naturally occurring pro ture and hydropathic nature of the polypeptide to be substan tein or polypeptide is isolated if it is separated from Some or tially unchanged. In general, the following amino acids rep all of the coexisting materials in the natural system. In certain resent conservative changes: (1) ala, pro, gly, glu, asp. gln, embodiments, such polypeptides are also purified, e.g., are at asn, ser, thr; (2) cys, ser, tyr, thr; (3) Val, ile, leu, met, ala, phe; 65 least about 90% pure, in some embodiments, at least about (4) lys, arg, his; and (5) phe, tyr, trp, his. Thus, a variant may 95% pure and in further embodiments, at least about 99% comprise only a portion of a native polypeptide sequence as pure. US 9,002,652 B1 17 18 In one embodiment of the present invention, a polypeptide instance, a native or non-artificially engineered or naturally comprises a fusion protein comprising an organ-specific occurring gene as provided herein) encoding an organ-spe secreted polypeptide. The present invention further provides, cific secreted protein, an alternate form of such a sequence, or in other aspects, fusion proteins that comprise at least one a portion or splice variant thereofor may comprise a variant of polypeptide as described herein, as well as polynucleotides 5 Such a sequence. Polynucleotide variants may contain one or encoding such fusion proteins. The fusion proteins may com more Substitutions, additions, deletions and/or insertions prise multiple polypeptides or portions/variants thereof, as Such that the polynucleotide encodes a polypeptide useful in described herein, and may further comprise one or more the methods described herein, such as for the detection of polypeptide segments for facilitating the expression, purifi organ-specific proteins (e.g., wherein said polynucleotide cation, detection, and/or activity of the polypeptide(s). 10 In certain embodiments, the proteins and/or polynucle variants encode polypeptides that can be used to generate otides, and/or fusion proteins are provided in the form of detection reagents as described herein that are specific for an compositions, e.g., pharmaceutical compositions, vaccine organ-specific secreted protein). In certain embodiments, compositions, compositions comprising a physiologically variants exhibit at least about 70% identity, and in other acceptable carrier or excipient. Such compositions may com 15 embodiments, exhibit at least about 80%, 85%, 86%, 87%, prise buffers such as neutral buffered saline, phosphate buff 88%, 89%, identity and in yet further embodiments, at least ered saline and the like; carbohydrates such as glucose, man about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or nose. Sucrose or dextrans, mannitol; proteins; polypeptides or 99% identity to a polynucleotide sequence that encodes a amino acids Such as glycine; antioxidants; chelating agents native organ-specific secreted polypeptide or an alternate Such as EDTA or glutathione; adjuvants (e.g., aluminum form or a portion thereof. Illustrative polynucleotides of the hydroxide); and preservatives. present invention are described in Table 1 and Tables 3-5, the In general, organ-specific secreted polypeptides and poly section entitled “Brief Description of the Sequence Identifi nucleotides encoding Such polypeptides as described herein, ers' and are set forth in the sequence listing. The percent may be prepared using any of a variety of techniques that are identity may be readily determined by comparing sequences well known in the art. For example, a DNA sequence encod 25 using computer algorithms well known to those having ordi ing an organ-specific secreted protein may be prepared by nary skill in the art and described herein. amplification from a suitable cDNA or genomic library using, Polynucleotides that are complementary to the polynucle for example, polymerase chain reaction (PCR) or hybridiza otides described herein, or that have substantial identity to a tion techniques. Libraries may generally be prepared and sequence complementary to a polynucleotide as described screened using methods well known to those of ordinary skill 30 herein are also within the scope of the present invention. in the art, such as those described in Sambrooket al., Molecu “Substantial identity, as used herein refers to polynucle lar Cloning: A Laboratory Manual, Cold Spring Harbor otides that exhibit at least about 70% identity, and in certain Laboratories, Cold Spring Harbor, N.Y., 1989. cDNA librar embodiments, at least about 80%, 85%, 86%, 87%, 88%, ies may be prepared from any of a variety of organs, tissues, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or cells, as described herein. Other libraries that may be 35 99% identity to a polynucleotide sequence that encodes a employed will be apparent to those of ordinary skill in the art native organ-specific secreted polypeptide as described upon reading the present disclosure. Primers for use in ampli herein. Substantial identity can also refer to polynucleotides fication may be readily designed based on the polynucleotide that are capable of hybridizing under Stringent conditions to a sequences encoding organ-specific polypeptides as provided polynucleotide complementary to a polynucleotide encoding herein, for example, using programs such as the PRIMER3 40 an organ-specific secreted protein. Suitable hybridization program (http colon double slash www-genome dot widot conditions include prewashing in a solution of 5xSSC, 0.5% mit dot edu/cgi-bin/primer/primer3 www dot cgi). SDS, 1.0 mM EDTA (pH 8.0); hybridizing at 50-65° C., Polynucleotides encoding the organ-specific secreted 5xSSC, overnight; followed by washing twice at 65°C. for 20 polypeptides as described herein are also provided by the minutes with each of 2x, 0.5x and 0.2xSSC containing 0.1% present invention. A polynucleotide as used herein may be 45 SDS. Nucleotide sequences that, because of code degeneracy, single-stranded (coding orantisense) or double-stranded, and encode a polypeptide encoded by any of the above sequences may be DNA (genomic, cDNA or synthetic) or RNA mol are also encompassed by the present invention. ecules. Thus, within the context of the present invention, a Oligonucleotide primers for amplification of the poly polynucleotide encoding a polypeptide may also be a gene. A nucleotides encoding organ-specific secreted proteins are gene is a segment of DNA involved in producing a polypep 50 also within the scope of the present invention. Many ampli tide chain; it includes regions preceding and following the fication methods are known in the art such as PCR, RT-PCR, coding region (leader and trailer) as well as intervening quantitative real-time PCR, and the like. The PCR conditions sequences (introns) between individual coding segments (ex used can be optimized in terms of temperature, annealing ons). Additional coding or non-coding sequences may, but times, extension times and number of cycles depending on the need not, be present within a polynucleotide of the present 55 oligonucleotide and the polynucleotide to be amplified. Such invention, and a polynucleotide may, but need not, be linked techniques are well known in the art and are described in, for to other molecules and/or support materials. An isolated poly example, Mullis et al., Cold Spring Harbor Symp. Ouant. nucleotide, as used herein, means that a polynucleotide is Biol., 51:263, 1987: Erlich ed., PCR Technology, Stockton Substantially away from other coding sequences, and that the Press, NY, 1989. Oligonucleotide primers can be anywhere DNA molecule does not contain large portions of unrelated 60 from 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, coding DNA, such as large chromosomal fragments or other 24, 25, 26, 27, 28, 29, or 30 nucleotides in length. In certain functional genes or polypeptide coding regions. Of course, embodiments, the oligonucleotide primers of the present this refers to the DNA molecule as originally isolated, and invention are typically 35, 40, 45, 50, 55, 60, or more nucle does not exclude genes or coding regions later added to the otides in length. segment by the hand of man. 65 Organ-Specific Molecular Blood Fingerprints Polynucleotides of the present invention may comprise a The present invention also provides methods for defining native sequence (i.e., an endogenous polynucleotide, for organ-specific molecular blood fingerprints. Additionally, the US 9,002,652 B1 19 20 present invention provides defined examples of organ-spe Thus, an organ-specific molecular blood fingerprint for a cific molecular blood fingerprints as described further herein. given setting (e.g., aparticular disease) is defined by the levels Each normal organ controls the expression of a variety of in the blood of the organ-specific proteins that make up the genes, some of which are expressed at major levels at other fingerprint. As such, an organ-specific molecular blood fin organs or tissues in the body and some of which are expressed 5 gerprint for a given organ at any given time and in any given only in the organ of interest or at significantly increased levels disease setting is determined by measuring the levels of each in the organ of interest as compared to expression in other of a plurality of organ-specific proteins in the blood. It is the organs/tissues (e.g., at least 2 fold, at least 2.5 fold, at least 3.0 combination of the different levels in the blood of the organ fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, or specific proteins that reveals a unique pattern that defines the 10 fingerprint. Equally important, each of the levels of the pro higher fold expression in the organ of interest as compared to teins can be compared against one another to create an N-di other tissues. Some of the organ-specific transcripts encode mensional measure of the fingerprint space, a very powerful proteins which can be secreted into the blood. Hence these correlate to health and disease (see e.g., U.S. Patent Applica secreted proteins constitute an organ-specific molecular fin tion No 20020095259). It should be noted that, in certain gerprint for that organ in the blood. Analysis of levels of these 15 embodiments, an organ-specific molecular blood fingerprint proteins in the blood provides organ-specific molecular blood may be comprised of the determined level in the blood of one fingerprints that are indicative of biological states. A biologi or more organ-specific secreted proteins. In one embodiment, cal state may be a normal, healthy state or a disease state (e.g., an organ-specific molecular blood fingerprint may comprise perturbation from normal). Thus, there are molecular finger the determined level in the blood of anywhere from at least 1 prints in the blood that reflect the operation of normal organs to more than about 100, 200 or more organ-specific secreted and each organ has a specific molecular fingerprint. These proteins from a particular organ of interest. In one embodi organ-specific blood fingerprints are perturbed when disease, ment, the organ-specific molecular blood fingerprint com or other agents such as drugs, affects the organ. Different prises the quantitatively measured level in the blood of at least diseases will alter the organ-specific blood fingerprints in 2, 3, 4, 5, 6, 7, 8, 9, or 10 organ-specific secreted proteins. In different ways (e.g. alter the expression levels of the corre 25 another embodiment, the organ-specific molecular blood fin sponding secreted proteins). Thus, a unique perturbed blood gerprint comprises the determined level in the blood of at molecular fingerprint is associated with each type of distinct least, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, disease. In effect, each distinct disease, or stage of a disease, 26, 28, 29, or 30 organ-specific secreted proteins. In a further creates its own molecular blood fingerprint for each organ embodiment, the organ-specific molecular blood fingerprint that it affects. As would be readily appreciated by the skilled 30 comprises the determined level in the blood of at least, 31,32. artisan, each disease or stage of a disease can affect multiple 33, 34, 35, 36, 37, 38, 39, or 40 organ-specific secreted pro organs. For example, in kidney cancer, a primary perturbation teins. In yet a further embodiment, the organ-specific molecu in the kidney-specific molecular blood fingerprint would lar blood fingerprint comprises the determined level in the occur. However, a secondary or indirect effect may also be blood of at least, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 observed in the bladder-specific molecular blood fingerprint. 35 organ-specific secreted proteins. In an additional embodi As another example, in liver cancer, perturbation of a liver ment, the organ-specific molecular blood fingerprint com specific blood fingerprint as a primary indicator of disease prises the determined level in the blood of 51, 52, 53, 54, 55, would occur. However, secondary or indirect effects at other 56, 57, 58, 59, or 60 organ-specific secreted proteins. In sites, for example in a lymphocyte-specific blood fingerprint, another embodiment, the organ-specific molecular blood fin would also be observed. As described elsewhere herein, each 40 gerprint comprises the determined level in the blood of 61, 62. disease type and stage results in a unique, identifiable finger 63, 64, 65, 66, 67, 68, 69, or 70 organ-specific secreted pro print for each organ that it affects, for primary and secondary teins. In further embodiments, the organ-specific molecular organs affected. Thus, multiple organ-specific molecular blood fingerprint comprises the determined level in the blood blood fingerprints can be used in combination to determine a of 75, 80, 85,90, 100, or more organ-specific secreted pro particular biological state and the fingerprints may include 45 teins. those for the primary organ affected and/or for a secondary or It should be noted that in certain circumstances, an organ indirect organ that is affected by a particular disease. specific molecular blood fingerprint can be defined (in part or Most common diseases such as prostate cancer actually entirely) merely by the presence or absence of one or a plu represent multiple distinct diseases that initially appear simi rality of organ-specific proteins, and determining the exact lar (e.g., benign and very slowly growing prostate cancer, 50 level of each of a plurality of organ-specific proteins in the slowly invasive prostate cancer and rapidly metastatic pros blood may not be necessary. tate cancer represent three different types of prostate can In a further embodiment, the disease (e.g., perturbed) cer—the process of dividing individual prostate cancers into molecular blood fingerprints for a particular organ are deter one of these three types is called stratification). The blood mined by comparing the blood from normal individuals molecular fingerprints will be distinct for each of these dis 55 against that from patients with specific diseases at known ease types, thus allowing for the stratification of similar dis stages. A statistically significant change in the levels (e.g., an eases and rapid intervention where necessary. The blood fin increase or a decrease) of one or more of the organ-specific gerprints will also be perturbed in unique ways as each type of proteins in the blood that comprise the fingerprint as com disease progresses—hence the blood fingerprints will also pared to normal is indicative of a perturbation of the finger permit the progression of disease to be followed. The blood 60 print and is useful in diagnostics of the particular disease fingerprints also change with therapy, and hence will permit and/or stage of disease. As discussed elsewhere herein, the the effectiveness of therapy to be followed, thereby allowing fingerprint may be for the primary organ affected by the a physician to alter treatment accordingly. Further, the blood particular disease of interest, or a secondarily, indirectly fingerprints change with exposure to a variety of environmen affected organ. The skilled artisan would readily appreciate tal factors, such as drugs, and can be used to assess toxic or off 65 that a variety of statistical tests can be used to determine if an target damage by the drug and it will even permit following altered level of a given protein is significant. The Z-test (Man, the Subsequent recovery from Such adverse drug exposure. M. Z. et al., Bioinformatics, 16: 953-959, 2000) or other US 9,002,652 B1 21 22 appropriate statistical tests can be used to calculate P values nanomolar-affinity SchvS can be routinely obtained by mag for comparison of protein expression levels. In certain netic beadscreening and flow-cytometric sorting, thus greatly embodiments, the level of each of the plurality of organ simplified the protocol and capacity of antibody Screening; 3) specific proteins in the blood sample from the subject is with equilibrium screening, a minimal affinity threshold of compared to a previously determined normal control level of 5 the antibodies desired can be set; 4) the binding properties of each of the plurality of organ-specific proteins taking into the antibodies can be quantified directly on the yeast Surface; account standard deviation. Thus, the present invention pro 5) multiplex library Screening against multiple antigens vides determined normal control levels of each of a plurality simultaneously is possible; and 6) for applications demand of organ-specific proteins that make up a particular molecular ing picomolar affinity (e.g. in early diagnosis), Subsequent blood fingerprint. 10 Organ-specific molecular blood fingerprints can be deter rapid affinity maturation (Kieke, M. C., et al., J Mol Biol, 307: mined using any of a variety of detection reagents in the 1305-1315, 2001.) can be carried out directly on yeast clones context of a variety of methods for measuring protein levels. without further re-cloning and manipulations. Any detection reagent that can specifically bind to or other Monoclonal antibodies specific for an organ-specific wise detect an organ-specific secreted protein as described 15 secreted polypeptide of interest may be prepared, for herein is contemplated as a suitable detection reagent. Illus example, using the technique of Kohler and Milstein, Eur: J. trative detection reagents include, but are not limited to anti Immunol. 6:51 1-519, 1976, and improvements thereto. bodies, or antigen-binding fragments thereof, yeast ScFv, Briefly, these methods involve the preparation of immortal DNA or RNA aptamers, isotope labeled peptides, microflu cell lines capable of producing antibodies having the desired idic/nanotechnology measurement devices and the like. specificity (i.e., reactivity with the polypeptide of interest). In one illustrative embodiment, a detection reagent is an Such cell lines may be produced, for example, from spleen antibody or an antigen-binding fragment thereof. Antibodies cells obtained from an animal immunized as described above. may be prepared by any of a variety of techniques known to The spleen cells are then immortalized by, for example, those of ordinary skill in the art. See, e.g., Harlow and Lane, fusion with a myeloma cell fusion partner, in certain embodi Antibodies: A Laboratory Manual, Cold Spring Harbor Labo 25 ments, one that is Syngeneic with the immunized animal. A ratory, 1988. In general, antibodies can be produced by cell variety of fusion techniques may be employed. For example, culture techniques, including the generation of monoclonal the spleen cells and myeloma cells may be combined with a antibodies as described herein, or via transfection of antibody nonionic detergent for a few minutes and then plated at low genes into Suitable bacterial or mammaliancell hosts, in order density on a selective medium that Supports the growth of to allow for the production of recombinant antibodies. In one 30 hybrid cells, but not myeloma cells. An illustrative selection technique, an immunogen comprising the polypeptide is ini technique uses HAT (hypoxanthine, aminopterin, thymidine) tially injected into any of a wide variety of mammals (e.g., selection. After a sufficient time, usually about 1 to 2 weeks, mice, rats, rabbits, sheep or goats). In this step, the polypep colonies of hybrids are observed. Single colonies are selected tides of this invention may serve as the immunogen without and their culture Supernatants tested for binding activity modification. Alternatively, particularly for relatively short 35 against the polypeptide. Hybridomas having high reactivity polypeptides, a Superior immune response may be elicited if and specificity are preferred. the polypeptide is joined to a carrier protein, Such as bovine Monoclonal antibodies may be isolated from the superna serum albumin or keyhole limpet hemocyanin. The immuno tants of growing hybridoma colonies. In addition, various gen is injected into the animal host, usually according to a techniques may be employed to enhance the yield, such as predetermined schedule incorporating one or more booster 40 injection of the hybridoma cell line into the peritoneal cavity immunizations, and the animals are bled periodically. Poly of a suitable vertebrate host, such as a mouse. Monoclonal clonal antibodies specific for the polypeptide may then be antibodies may then be harvested from the ascites fluid or the purified from Such antiseraby, for example, affinity chroma blood. Contaminants may be removed from the antibodies by tography using the polypeptide coupled to a Suitable solid conventional techniques, such as chromatography, gel filtra Support. 45 tion, precipitation, and extraction. The polypeptides of this In one embodiment, multiple target proteins or peptides are invention may be used in the purification process in, for used in a single immune response to generate multiple useful example, an affinity chromatography step. detection reagents simultaneously. In one embodiment, the A number of therapeutically useful molecules are known in individual specificities are later separated out. the art which comprise antigen-binding sites that are capable In certain embodiments, antibody can be generated by 50 of exhibiting immunological binding properties of an anti phage display methods (such as described by Vaughan, T. J., body molecule. The proteolytic enzyme papain preferentially et al., Nat Biotechnol, 14:309-314, 1996; and Knappik, A., et cleaves IgG molecules to yield several fragments, two of al., Mol Biol, 296: 57-86, 2000); ribosomal display (such as which (the “F(ab) fragments) each comprise a covalent het described in Hanes, J., et al., Nat Biotechnol, 18: 1287-1292, erodimer that includes an intact antigen-binding site. The 2000), or periplasmic expression in E. coli (see e.g., Chen, G., 55 enzyme pepsin is able to cleave IgG molecules to provide et al., Nat Biotechnol, 19:537-542, 2001). In further embodi several fragments, including the “F(ab'), fragment which ments, antibodies can be isolated using a yeast Surface display comprises both antigen-binding sites. An "Fv’ fragment can library. See e.g., nonimmune library of 10 human antibody be produced by preferential proteolytic cleavage of an IgM, schv fragments as constructed by Feldhaus, M.J., et al., Nat and on rare occasions IgG or IgA immunoglobulin molecule. Biotechnol, 21: 163-170, 2003. There are several advantages 60 FV fragments are, however, more commonly derived using of this yeast Surface display compared to more traditional recombinant techniques known in the art. The Fv fragment large nonimmune human antibody repertoires such as phage includes a non-covalent V::V heterodimer including an display, ribosomal display, and periplasmic expression in E. antigen-binding site which retains much of the antigen rec coli 1). The yeast library can be amplified 10'-fold without ognition and binding capabilities of the native antibody mol measurable loss of clonal diversity and repertoire bias as the 65 ecule. Inbaret al. (1972) Proc. Nat. Acad. Sci. USA 69:2659 expression is under control of the tightly GAL1/10 promoter 2662; Hochman et al. (1976) Biochem 15:2706-2710; and and expansion can be done under non induction conditions; 2) Ehrlich et al. (1980)Biochem 19:4091-4096. US 9,002,652 B1 23 24 A single chain Fv (“sv') polypeptide is a covalently dimensional chromatography and MS/MS. The procedures linked V::V, heterodimer which is expressed from a gene described herein for analysis of blood organ-specific protein fusion including V- and V-encoding genes linked by a fingerprints can be modified and adapted to make use of peptide-encoding linker. Huston et al. (1988) Proc. Nat. microfluidics and nanotechnology in order to miniaturize, Acad. Sci. USA 85(16):5879-5883. A number of methods 5 parallelize, integrate and automate diagnostic procedures have been described to discern chemical structures for con (see e.g., L. Hood, et al., Science 306:640-643; R. H. Carlson, Verting the naturally aggregated—but chemically sepa et al., Phys. Rev. Lett. 79:2149 (1997); A. Y. Fu, et al., Anal. rated—light and heavy polypeptide chains from an antibody Chem. 74:2451 (2002); J. W. Hong, et al., Nature Biotechnol. V region into an sEv molecule which will fold into a three 22:435 (2004); A. G. Hadd, et al., Anal. Chem. 69:3407 dimensional structure substantially similar to the structure of 10 (1997): I. Karube, et al., Ann. N.Y. Acad. Sci. 750:101 (1995); an antigen-binging site. See, e.g., U.S. Pat. Nos. 5,091,513 L. C. Waters et al., Anal. Chem. 70:158 (1998); J. Fritz et al., and 5,132,405, to Huston et al.; and U.S. Pat. No. 4,946,778, Science 288,316 (2000)). to Ladner et al. It should be noted that when the term “blood' is used Each of the above-described molecules includes a heavy herein, any part of the blood is intended. Accordingly, for chain and a light chain CDR set, respectively interposed 15 determining molecular blood fingerprints, whole blood may between a heavy chain and a light chain FR set which provide be used directly where appropriate, or plasma or serum may support to the CDRS and define the spatial relationship of the be used. CDRs relative to each other. As used herein, the term “CDR Panels/Arrays for Detecting Organ-Specific Molecular Blood set' refers to the three hypervariable regions of a heavy or Fingerprints light chain V region. Proceeding from the N-terminus of a The present invention also provides panels/arrays for heavy or light chain, these regions are denoted as “CDR1. detecting the organ-specific blood fingerprints at any given “CDR2, and “CDR3 respectively. An antigen-binding site, time in a subject. The term “subject' is intended to include therefore, includes six CDRs, comprising the CDR set from any mammal or indeed any vertebrate that may be used as a each of a heavy and a light chain V region. A polypeptide model system for human disease. Examples of Subjects comprising a single CDR. (e.g., a CDR1, CDR2 or CDR3) is 25 include humans, monkeys, apes, dogs, cats, mice, rats, fish, referred to herein as a “molecular recognition unit.” Crystal Zebra fish, birds, horses, pigs, cows, sheep, goats, chickens, lographic analysis of a number of antigen-antibody com ducks, donkeys, turkeys, peacocks, chinchillas, ferrets, ger plexes has demonstrated that the amino acid residues of bils, rabbits, guinea pigs, hamsters and transgenic species CDRs form extensive contact with bound antigen, wherein thereof. Further subjects contemplated herein include, but are the most extensive antigen contact is with the heavy chain 30 not limited to, reptiles and amphibians, e.g., lizards, Snakes, CDR3. Thus, the molecular recognition units are primarily turtles, frogs, toads, salamanders, and newts. In one embodi responsible for the specificity of an antigen-binding site. ment, the panel/array of the present invention comprises one As used herein, the term “FR set' refers to the four flanking detection reagent that specifically detects an organ-specific amino acid sequences which frame the CDRs of a CDR set of secreted protein. In another embodiment, the panel/arrays are a heavy or light chain V region. Some FR residues may 35 comprised of a plurality of detection reagents that each spe contact bound antigen; however, FRS are primarily respon cifically detects an organ-specific secreted protein, wherein sible for folding the V region into the antigen-binding site, the levels of organ-specific secreted proteins taken together particularly the FR residues directly adjacent to the CDRS. form a unique pattern that defines the fingerprint. In certain Within FRs, certain amino residues and certain structural embodiments, detection reagents can be bispecific Such that features are very highly conserved. In this regard, all V region 40 the panel/array is comprised of a plurality of bispecific detec sequences contain an internal disulfide loop of around 90 tion reagents that may specifically detect more than one amino acid residues. When the V regions fold into a binding organ-specific secreted protein. The term "specifically' is a site, the CDRs are displayed as projecting loop motifs which term of art that would be readily understood by the skilled form an antigen-binding Surface. It is generally recognized artisan to mean, in this context, that the protein of interest is that there are conserved structural regions of FRs which influ 45 detected by the particular detection reagent but other proteins ence the folded shape of the CDR loops into certain “canoni are not detected in a statistically significant manner under the cal structures—regardless of the precise CDR amino acid same conditions. Specificity can be determined using appro sequence. Further, certain FR residues are known to partici priate positive and negative controls and by routinely opti pate in non-covalent interdomain contacts which stabilize the mizing conditions. interaction of the antibody heavy and light chains. 50 The panel/arrays may be comprised of a solid phase Surface The detection reagents of the present invention may com having attached thereto a plurality of detection reagents each prise any of a variety of detectable labels. The invention attached at a distinct location. As would be recognized by the contemplates the use of any type of detectable label, includ skilled artisan, the number of detection reagents on a given ing, e.g., visually detectable labels, fluorophores, and radio panel/array would be determined from the number of organ active labels. The detectable label may be incorporated within 55 specific secreted proteins in the fingerprint to be measured. In or attached, either covalently or non-covalently, to the detec one embodiment, the panel/array comprises one or more tion reagent. detection reagents. In a further embodiment, the panel/array Methods for measuring organ-specific protein levels from comprises a plurality of detection reagents, wherein the plu blood/serum/plasma include, but are not limited to, immu rality of detection reagents may be anywhere from about 2 to noaffinity based assays such as ELISAs, Western blots, and 60 about 100, 150, 160, 170, 180, 190, 200 or more detection radioimmunoassays, and mass spectrometry based methods reagents each specific for an organ-specific secreted protein. (matrix-assisted laser desorption ionization (MALDI), In one embodiment, the panel/array comprises at least 2, 3, 4, MALDI-Time-of-Flight (TOF), Tandem MS (MS/MS), elec 5, 6, 7, 8, 9, or 10 detection reagents each specific for one of trospray ionization (ESI), Surface Enhanced Laser Desorp the plurality of organ-specific secreted proteins that make up tion Ionization (SELDI)-TOF MS, liquid chromatography 65 a given fingerprint. In another embodiment, the panel/array (LC)-MS/MS, etc). Other methods useful in this context comprises at least 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 include isotope-coded affinity tag (ICAT) followed by multi detection reagents each specific for one of the plurality of US 9,002,652 B1 25 26 organ-specific secreted proteins that make up a given finger conjugate Chem. 4, 116-171; Schramm et al., 1992, Anal. print. In a further embodiment, the panel/array comprises at Biochem. 205, 47-56; Gombotzet al., 1991, J. Biomed. Mater. least 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 detection reagents Res. 25, 1547-1562; Alarie et al., 1990, Analy. Chim. Acta each specific for one of the plurality of organ-specific 229, 169-176: Owakuetal, 1993, Sensors Actuators B, 13-14, secreted proteins that make up a given fingerprint. In an 5 723-724; Bhatia et al., 1989, Analy. Biochem. 178, 408-413: additional embodiment, the panel/array comprises at least 31, Lin et al., 1988, IEEE Trans. Biomed. Enging., 35(6), 466 32, 33, 34, 35, 36, 37, 38, 39, or 40 detection reagents each 471). specific for one of the plurality of organ-specific secreted In one embodiment, the detection reagents, such as anti proteins that make up a given fingerprint. In yet a further bodies, are arrayed on a chip comprised of electronically embodiment, the panel/array comprises at least 41, 42, 43, 44, 10 activated copolymers of a conductive polymer and the detec 45, 46, 47, 48, 49, or 50 detection reagents each specific for tion reagent. Such arrays are known in the art (see e.g., U.S. one of the plurality of organ-specific secreted proteins that Pat. No. 5,837,859 issued Nov. 17, 1998: PCT publication make up a given fingerprint. In an additional embodiment, the WO94/22889 dated Oct. 13, 1994). The arrayed pattern may panel/array comprises at least 51, 52, 53, 54, 55, 56, 57, 58. be computer generated and stored. The chips may be prepared 59, or 60 detection reagents each specific for one of the 15 inadvance and stored appropriately. The antibody array chips plurality of organ-specific secreted proteins that make up a can be regenerated and used repeatedly. given fingerprint. In one embodiment, the panel/array com Using the methods described herein, a vast array of organ prises at least 61, 62,63,64, 65,66, 67,68, 69, or 70 detection specific molecular blood fingerprints can be defined for any of reagents each specific for one of the plurality of organ-spe a variety of diseases as described further herein. As such, the cific secreted proteins that make up a given fingerprint. In one 20 present invention further provides information databases embodiment, the panel/array comprises at least 75,80, 85,90, comprising data that make up molecular blood fingerprints as 100, 150, 160, 170, 180, 190, 200, or more, detection reagents described herein. As such, the databases may comprise the each specific for one of the plurality of organ-specific defined differential expression levels as determined using any secreted proteins that make up a given fingerprint. of a variety of methods such as those described herein, of each Further in this regard, the solid phase surface may be of any 25 of the plurality of organ-specific secreted proteins that make material, including, but not limited to, plastic, polycarbonate, up a given fingerprint in any of a variety of settings (e.g., polystyrene, polypropylene, polyethlene, glass, nitrocellu normal or disease fingerprints). lose, dextran, nylon, metal, silicon and carbon nanowires, Methods of Use nanoparticles that can be made of a variety of materials and The present invention provides methods for identifying photolithographic materials. In certain embodiments, the 30 organ-specific secreted proteins and methods for identifying Solid phase Surface is a chip. In another embodiment, the Solid organ-specific molecular blood fingerprints. The present phase surface may comprise microtiter plates, beads, mem invention further provides panels+/arrays of detection branes, microparticles, the interior Surface of a reaction vessel reagents for detecting Such fingerprints. The present inven such as a test tube or other reaction vessel. In other embodi tion also provides defined organ-specific molecular blood ments the peptides will be fractionated by one or more one- 35 fingerprints for normal and disease settings. As such, the dimensional columns using size separations, ion exchange or present invention provides methods of detecting diseases. hydrophobicity properties and, for example, deposited in a The invention further provides methods for stratifying dis MALDI 96 or 384 well plate and then injected into an appro ease types and for monitoring the progression of a disease. priate mass spectrometer. The present invention also provides for following responses In one embodiment, the panel/array is an addressable array. 40 to therapy in a variety of disease settings and methods for As such, the addressable array may comprise a plurality of detecting the disease state in humans using the visualization distinct detection reagents, such as antibodies or aptamers, of nanoparticles with appropriate reporter groups and organ attached to precise locations on a solid phase Surface, such as specific antibodies or aptamers. a plastic chip. The position of each distinct detection reagent The present invention can be used as a standard Screening on the surface is known and therefore “addressable'. In one 45 test. In this regard, one or more of the detection panel/arrays embodiment, the detection reagents are distinct antibodies described herein can be run on an individual and any statis that each have specific affinity for one of a plurality of organ tically significant deviation from a normal organ-specific specific polypeptides. molecular blood fingerprint would indicate that disease-re In one embodiment, the detection reagents, such as anti lated perturbation was present. Thus, the present invention bodies, are covalently linked to the solid Surface. Such as a 50 provides a standard or “normal blood fingerprint for any plastic chip, for example, through the Fc domains of antibod given organ. In certain embodiments, a normal blood finger ies. In another embodiment, antibodies are adsorbed onto the print is determined by measuring the normal range of levels of Solid Surface. In a further embodiment, the detection reagent, the individual protein members of a fingerprint. Any devia Such as an antibody, is chemically conjugated to the Solid tion therefrom or perturbation of the normal fingerprint that is Surface. In a further embodiment, the detection reagents are 55 outside the standard deviation (normal range) has diagnostic attached to the solid surface via a linker. In certain embodi utility (see also U.S. Patent Application No.0020095259). As ments, detection with multiple specific detection reagents is would be recognized by the skilled artisan, the significance of carried out in solution. any deviation in the levels of (e.g., a significantly altered level Methods of constructing protein arrays, including antibody of one or more of) the individual protein members of a fin arrays, are known in the art (see, e.g., U.S. Pat. No. 5,489.678; 60 gerprint can be determined using statistical methods known in U.S. Pat. No. 5,252,743; Blawas and Reichert, 1998, Bioma the art and described herein. As noted elsewhere herein, per terials 19:595-609; Firestone et al., 1996, J. Amer: Chem. Soc. turbation of the normal fingerprint can indicate primary dis 18, 9033-9041: Mooney et al., 1996, Proc. Natl. Acad. Sci. ease of the organ being tested or secondary, indirect affects on 93,12287-12291; Pirrung et al., 1996, Bioconjugate Chem. 7, that organ resulting from disease of another organ. 317-321; Gao et al., 1995, Biosensors Bioelectron 10, 317- 65 In an additional embodiment, the present invention can be 328; Schena et al., 1995, Science 270, 467-470; Lom et al., used to determine distinct normal organ-specific molecular 1993, J. Neurosci. Methods, 385-397; Pope et al., 1993, Bio fingerprints, such as in different populations of people. In this US 9,002,652 B1 27 28 regard, distinct normal patterns of organ-specific molecular lupus erythematosus, psoriasis, Sjogren's syndrome, hyper blood fingerprints may have differences in populations of thyroidism/Graves disease, hypothyroidism/Hashimoto's patients that permit one to stratify patients into classes that disease, Insulin-dependent diabetes (type 1), Myasthenia would respond to a particular therapeutic regimen and those Gravis, endometriosis, Scleroderma, pernicious anemia, which would not. Goodpasture syndrome, Wegener's disease, glomerulone In a further embodiment, the present invention can be used phritis, aplastic anemia, paroxysmal nocturnal hemoglobin to determine the risk of developing a particular biological uria, myelodysplastic syndrome, idiopathic thrombocy condition. A statistically significant alteration (e.g., increase topenic purpura, autoimmune hemolytic anemia, Evans or decrease) in the levels of one or more members of a par syndrome, Factor VIII inhibitor syndrome, systemic vasculi ticular molecular blood fingerprint may signify a risk of 10 developing a particular disease. Such as a cancer, an autoim tis, dermatomyositis, polymyositis and rheumatic fever. mune disease, or other biological condition. In certain embodiments, the organ-specific molecular To monitor the progression of a disease, or monitor blood fingerprints of the present invention can be used to responses to therapy, one or more organ-specific molecular detect diseases associated with infections with any of a vari blood fingerprints are detected/measured as described herein 15 ety of infectious organisms, such as viruses, bacteria, para using any of the methods as described herein at one time point sites and fungi. Infectious organisms may comprise viruses, and detected/measured again at Subsequent time points, (e.g., RNA viruses, DNA viruses, human in virus (HIV), thereby monitoring disease progression or responses to hepatitis A, B, and C virus, herpes simplex virus (HSV), therapy. cytomegalovirus (CMV) Epstein-Barr virus (EBV), human The organ-specific molecular blood fingerprints of the papilloma virus (HPV)), parasites (e.g., protozoan and meta present invention can be used to detect any of a variety of Zoan pathogens such as Plasmodia species, Leishmania spe diseases (or the lack thereof). In certain embodiments, the cies, Schistosoma species, Trypanosoma species), bacteria organ-specific molecular blood fingerprints of the present (e.g., Mycobacteria, in particular, M. tuberculosis, Salmo invention can be used to detect cancer. As such, the present nella, Streptococci, E. coli, Staphylococci), fungi (e.g., Can invention can be used to detect, monitor progression of, or 25 dida species, Aspergillus species), Pneumocystis carinii, and monitor therapeutic regimens for any cancer, including mela prions. noma, non-Hodgkin’s lymphoma, Hodgkin’s disease, leuke Business Methods mias, plasmocytomas, sarcomas, adenomas, gliomas, thymo A further embodiment of the present invention comprises a mas, breast cancer, prostate cancer, colo-rectal cancer, kidney business method of diagnosing a particular disease in a Sub cancer, renal cell carcinoma, bladder cancer, uterine cancer, 30 ject that comprises detecting an organ-specific molecular pancreatic cancer, esophageal cancer, brain cancer, lung can blood fingerprint as described herein. cer, ovarian cancer, cervical cancer, testicular cancer, gastric Thus, the present invention contemplates methods for (a) cancer, multiple myeloma, hepatoma, acute lymphoblastic manufacturing one or more of the detection reagents, panels, leukemia (ALL), acute myelogenous leukemia (AML). arrays, (b) providing diagnostic services for determining chronic myelogenous leukemia (CML), and chronic lympho 35 organ-specific blood fingerprints, (c) providing manufactur cytic leukemia (CLL), or other cancers. ers of genomics devices the use of the detection reagents, In certain embodiments, the organ-specific molecular panels, arrays, blood fingerprints or transcriptomes described blood fingerprints of the present invention can be used to herein to develop diagnostic devices, where the genomics detect, to monitor progression of, or monitor therapeutic regi device includes any device that may be used to define differ mens for diseases of the heart, kidney, ureter, bladder, urethra, 40 ences in a blood sample between the normal and disturbed liver, prostate, heart, blood vessels, bone marrow, skeletal state (d) providing manufacturers of proteomics devices the muscle, Smooth muscle, various specific regions of the brain use of the detection reagents, panels, arrays, blood finger (including, but not limited to the amygdala, caudate nucleus, prints or transcriptomes described herein to develop diagnos cerebellum, corpus callosum, fetal, hypothalamus, thala tic devices, where the proteomics device includes any device mus), spinal cord, peripheral nerves, retina, nose, trachea, 45 that may be used to define differences in a blood sample lungs, mouth, salivary gland, esophagus, stomach, Small between the normal and disturbed state and (e) providing intestines, large intestines, hypothalamus, pituitary, thyroid, manufacturers of imaging devices the use of the detection pancreas, adrenal glands, ovaries, Oviducts, uterus, placenta, reagents, panels, arrays, blood fingerprints or transcriptomes Vagina, mammary glands, testes, seminal Vesicles, penis, described herein to develop diagnostic devices, where the lymph nodes, thymus, and spleen. The present invention can 50 proteomics device includes any device that may be used to be used to detect, to monitor progression of, or monitor thera define differences in a blood sample between the normal and peutic regimens for cardiovascular diseases, neurological dis disturbed State (f) providing manufacturers of molecular eases, metabolic diseases, respiratory diseases, autoimmune imaging devices the use of the detection reagents, panels, diseases. As would be recognized by the skilled artisan, the arrays, blood fingerprints or transcriptomes described herein present invention can be used to detect, monitor the progres 55 to develop diagnostic devices, where the proteomics device sion of, or monitor treatment for, virtually any disease includes any device that may be used to define differences in wherein the disease causes perturbation in organ-specific a blood sample between the normal and disturbed state and secreted proteins. (g) marketing to healthcare providers the benefits of using the In certain embodiments, the organ-specific molecular detection reagents, panels, arrays, and diagnostic services of blood fingerprints of the present invention can be used to 60 the present invention to enhance diagnostic capabilities and detect autoimmune disease. As such, the present invention thus, to better treat patients. can be used to detect, monitor progression of, or monitor Another aspect of the invention relates to a method for therapeutic regimens for autoimmune diseases Such as, but conducting abusiness, which includes: (a) manufacturing one not limited to, rheumatoid arthritis, multiple Sclerosis, insulin or more of the detection reagents, panels, arrays, (b) provid dependent diabetes, Addison's disease, celiac disease, 65 ing diagnostic services for determining organ-specific chronic fatigue syndrome, inflammatory bowel disease, molecular blood fingerprints and (c) marketing to healthcare ulcerative colitis, Crohn's disease, Fibromyalgia, systemic providers the benefits of using the detection reagents, panels, US 9,002,652 B1 29 30 arrays, and diagnostic services of the present invention to and apparatus of the present invention apply equally to any enhance diagnostic capabilities and thus, to better treat computer system, regardless of whether the computer system patients. is a complicated multi-user computing apparatus or a single Another aspect of the invention relates to a method for user device Such as a personal computer or workstation. Com conducting a business, comprising: (a) providing a distribu puter systems suitably comprise a processor, main memory, a tion network for selling the detection reagents, panels, arrays, memory controller, an auxiliary storage interface, and a ter diagnostic services, and access to organ-specific molecular minal interface, all of which are interconnected via a system blood fingerprint databases (b) providing instruction material bus. Note that various modifications, additions, or deletions to physicians or other skilled artisans for using the detection may be made to the computer system within the scope of the reagents, panels, arrays, and organ-specific molecular blood 10 present invention Such as the addition of cache memory or fingerprint databases to improve diagnostics for patients. other peripheral devices. Yet another aspect of the invention relates to a method for The processor performs computation and control functions conducting a business, comprising: (a) identifying organ of the computer system, and comprises a suitable central specific secreted proteins in the blood sera, etc. (b) determin processing unit (CPU). The processor may comprise a single ing the organ-specific molecular fingerprint for any of a vari 15 integrated circuit, such as a microprocessor, or may comprise ety of diseases as described herein and (c) providing a any suitable number of integrated circuit devices and/or cir distribution network for selling access to the database of cuit boards working in cooperation to accomplish the func organ-specific molecular fingerprints identified in step (b). tions of a processor. For instance, the Subject business method can include an In a preferred embodiment, the auxiliary storage interface additional step of providing a sales group for marketing the allows the computer system to store and retrieve information database, or panels, or arrays, to healthcare providers. from auxiliary storage devices, such as magnetic disk (e.g., Another aspect of the invention relates to a method for hard disks or floppy diskettes) or optical storage devices (e.g., conducting a business, comprising: (a) determining one or CD-ROM). One suitable storage device is a direct access more organ-specific molecular blood fingerprints and (b) storage device (DASD). A DASD may be a floppy disk drive licensing, to a third party, the rights for further development 25 that may read programs and data from a floppy disk. It is and sale of panels, arrays, and information databases related important to note that while the present invention has been to the organ-specific molecular blood fingerprints of (a). (and will continue to be) described in the context of a fully The business methods of the present application relate to functional computer system, those skilled in the art will the commercial and other uses, of the methodologies, panels, appreciate that the mechanisms of the present invention are arrays, organ-specific secreted proteins, organ-specific 30 capable of being distributed as a program product in a variety molecular blood fingerprints, and databases comprising iden of forms, and that the present invention applies equally tified fingerprints of the present invention. In one aspect, the regardless of the particular type of signal bearing media to business method includes the marketing, sale, or licensing of actually carry out the distribution. Examples of signal bearing the present invention in the context of providing consumers, media include: recordable type media Such as floppy disks i.e., patients, medical practitioners, medical service provid 35 and CD ROMS, and transmission type media such as digital ers, and pharmaceutical distributors and manufacturers, with and analog communication links, including wireless commu all aspects of the invention described herein, (e.g., the meth nication links. ods for identifying organ-specific secreted proteins, detection The computer systems of the present invention may also reagents for Such proteins, molecular blood fingerprints, etc., comprise a memory controller, through use of a separate as provided by the present invention). 40 processor, which is responsible for moving requested infor In a particular embodiment of the present invention, a mation from the main memory and/or through the auxiliary business method relating to providing information related to storage interface to the main processor. While for the pur molecular blood fingerprints (e.g., levels of the plurality of poses of explanation, the memory controller is described as a organ-specific secreted proteins that make up a given finger separate entity, those skilled in the art understand that, in print), method for determining fingerprints and sale of panels 45 practice, portions of the function provided by the memory for determining such molecular blood fingerprints. In a spe controller may actually reside in the circuitry associated with cific embodiment, that method may be implemented through the main processor, main memory, and/or the auxiliary Stor the computer systems of the present invention. For example, age interface. a user (e.g. a health practitioner Such as a physician or a Furthermore, the computer systems of the present inven diagnostic laboratory technician) may access the computer 50 tion may comprise a terminal interface that allows system systems of the present invention via a computer terminal and administrators and computer programmers to communicate through the Internet or other means. The connection between with the computer system, normally through programmable the user and the computer system is preferably secure. workstations. It should be understood that the present inven In practice, the user may input, for example, information tion applies equally to computer systems having multiple relating to a patient Such as the patient's disease state e.g., 55 processors and multiple system buses. Similarly, although the levels determined for the proteins that make up a given system bus of the preferred embodiment is a typical hard molecular blood fingerprint using a panel or array of the wired, multidrop bus, any connection means that Supports present invention. The computer system may then, through bidirectional communication in a computer-related environ the use of the resident computer programs, provide a diagno ment could be used. sis that fits with the input information by matching the fin 60 The main memory of the computer systems of the present gerprint parameters (e.g., levels of the proteins present in the invention Suitably contains one or more computer programs blood as detected using a particular panel or array of the relating to the organ-specific molecular blood fingerprints present invention) with a database offingerprints. and an operating system. Computer program is used in its A computer system inaccordance with a preferred embodi broadest sense, and includes any and all forms of computer ment of the present invention may be, for example, an 65 programs, including source code, intermediate code, machine enhanced IBM AS/400 mid-range computer system. How code, and any other representation of a computer program. ever, those skilled in the art will appreciate that the methods The term “memory” as used herein refers to any storage US 9,002,652 B1 31 32 location in the virtual memory space of the system. It should Such as those encoding transcription factors and signal trans be understood that portions of the computer program and ducers, wield significant regulatory influences in spite of the operating system may be loaded into an instruction cache for fact they may be present in the cell at very low copy numbers. the main processor to execute, while other files may well be Differential display (Bussemakers, M. J., et al., Cancer Res, stored on magnetic or optical disk storage devices. In addi 59: 5975-5979, 1999) or cDNA microarrays (Vaarala, M. H., tion, it is to be understood that the main memory may com et al., Lab Invest, 80: 1259-1268, 2000; Chang, G. T., et al., prise disparate memory locations. Cancer Res, 57: 4075-4081, 1997) have been, used to profile All of the U.S. patents, U.S. patent application publica changes in gene expression during the AD to AI transition; tions, U.S. patent applications, foreign patents, foreign patent however, those technologies can identify only a limited num applications and non-patent publications referred to in this 10 ber of more abundant mRNAs, and they miss many low specification and/or listed in the Application Data Sheet, are abundance mRNAs due to their low detection sensitivities. incorporated herein by reference, in their entirety. Moreover, Massively parallel signature sequencing (MPSS), allows all numerical ranges utilized herein explicitly include all inte 20-nucleotide signature sequences to be determined in paral ger values within the range and selection of specific numeri lel for more than 1,000,000 DNA sequences (Brenner, et al., cal values within the range is contemplated depending on the 15 2000, supra). MPSS technology allows identification and particular use. Further, the following examples are offered by cataloging of almost all mRNAS that are changed between way of illustration, and not by way of limitation. two cell states, even those with one or a few transcripts per cell, or between different organs or tissues. Differentially EXAMPLES expressed genes thus identified can be mapped onto cellular networks to provide a systemic understanding of changes in Example 1 cellular state. Although transcriptome (mRNA levels) differences are Evidence for the Presence of Disease-Perturbed easier to study than proteome (protein levels) differences and Networks in Prostate Cancer Cells by Genomic and provide extremely valuable information, cellular functions Proteomic Analyses: A Systems Approach to Disease 25 are usually performed by proteins. RNA expression profiling studies do not address how the encoded proteins function The following example demonstrates the presence of dis biologically, and transcript abundance levels do not always ease-perturbed networks in prostate. correlate with protein abundance levels (Chen, G., et al., Mol Prostate cancer is the most common nondermatological Cell Proteomics, 1: 304-313, 2002). Therefore, the mRNA cancer in the United States (Greenlee, R.T., et al., CA Cancer 30 expression profiling described herein was complemented J Clin, 50: 7-33, 2000). Initially, its growth is androgen with a more limited protein profiling by using isotope-coded dependent (AD); early-stage therapies, including chemical affinity tags (ICAT) coupled with tandem mass spectrometry and Surgical castration, kill cancerous cells by androgen dep (MS/MS) (Gygi, S. P. et al., Nat Biotechnol, 17: 994-999, rivation. Although Such therapies produce tumor regression, 1999). they eventually fail because most prostate carcinomas 35 The LNCaP cell line is a widely used androgen-sensitive become androgen-independent (AI) (Isaacs, J. T. Urol Clin model for early-stage prostate cancer from which androgen North Am, 26: 263-273, 1999). To improve the efficacy of independent sublines have been generated (Vaarala, M. H., et prostate cancer therapy, it is necessary to understand the al., 2000, supra; Chang, G. T., et al., 1997, Supra; Patel, B.J., molecular mechanisms underlying the transition from andro et al., J Urol, 164: 1420-1425, 2000). The cells of one such gen dependence to androgen independence. 40 variant, CL-1, in contrast to their LNCaP progenitors, are The transition from AD to AI status likely results from highly tumorigenic, and exhibit invasive and metastatic char multiple processes, including activation of oncogenes, inac acteristics in intact and castrated mice (Patel, G. J., et al., tivation of tumor Suppressor genes, and changes in key com 2000, supra; Tso, C. L., et al., Cancer J Sci Am, 6: 220-233, ponents of signal transduction pathways and gene regulatory 2000). Thus CL-1 cells model late-stage prostate cancer. networks. Systems approaches to biology and disease are 45 MPSS and ICAT data extracted from these model cell lines predicated on the identification of the elements of the sys can be validated by real-time RT-PCR or western blot analysis tems, the delineation of their interactions and their changes in in more relevant biological models (tumor Xenografts) and in distinct disease states. Biological information is of two types: tumor biopsies. the digital information of the genome (e.g. genes and cis An MPSS analysis of about 5 million signatures was con control elements) and environmental cues. Proteins rarely act 50 ducted for the androgen-dependent LNCaP cell line and its in isolation; rather, they form parts of molecular machines or androgen-independent derivative CL1. The resulting data participate in network interactions mediating cellular func base offers the first comprehensive view of the digital tran tions such as signal transduction and developmental or physi Scriptomes of prostate cancer cells and allows exploration of ological response patterns. Gene regulatory networks, whose the cellular pathways perturbed during the transition from AD architecture and linkages are established by cis-control ele 55 to AI growth. Additionally, protein expression profiles ments, integrate information from signal transduction net between LNCaP and CL1 cells were compared using ICAT/ works and output it to developmental or physiological batter MS/MS technology. Further, computational analysis was ies or networks of effector proteins. Normal protein and gene used to identify those proteins that are secreted. Once such regulatory networks may be perturbed by disease—through protein was further investigated and shown to be a diagnostic genetic and/or environmental perturbations and understand 60 marker for prostate cancer used either alone, or in combina ing these differences lies at the heart of systems approaches to tion with the known PSA prostate cancer marker. disease. Disease-perturbed networks initiate altered MPSS Analysis: responses that bring about pathologic phenotypes such as the LNCaP and CL1 cells were grown using methods known in invasiveness of cancer cells. the art, for example, as described by Tso et al. 2000, supra). To map network perturbations in cancer initiation and pro 65 RNAs were isolated using Trizol (Life Technologies) accord gression, changes in expression levels of virtually all tran ing to the manufacturer's protocols (see, e.g., as described by Scripts must be measured. Certain low-abundance transcripts, Nelson et al. Proc Natl Acad Sci USA, 99: 11890-11895, US 9,002,652 B1 33 34 2002). MPSS cDNA libraries were constructed, individual (Molecular Probe Inc.) was used as a reporter. PCR condi cDNA sequences were amplified and attached to individual tions were designed to give bands of the expected size with beads and sequenced as described by Brenner, et al., 2000, minimal primer dimer bands. Supra. The resulting signatures, generally 20 bases in length, Identification of Perturbed Networks: were annotated using the then most recently annotated human Genes in the 328 Biocarta and Kyoto Encyclopedia of genome sequence (human genome release hg16, released in Genes and Genomes (KEGG) pathways or networks (http November, 2003) and the human Unigene (Unigene build colon double slash cgap dot inci dot nih dot gov slash Path #184) according to a previously published method (Meyers, ways slash) were downloaded and compared with the MPSS B.C., et al., Genome Res, 14: 1641-1653, 2004). Only 100% data, using Unigene IDs as identifiers. If a Unigene ID or an 10 E.C. number corresponded to multiple signatures, potentially matches between an MPSS signature and a genome signature due to multiple alternatively terminated isoforms, the tpm were considered. Those signatures that expressed at less than counts of the isoforms were combined and then subjected to 3 tpim in both LNCaP and CL1 libraries were also excluded, the Z-test (Man, M. Z. et al., 2000, supra). Genes with P as they might not be reliably detected (this represents less values of 0.001 or less were considered to be significantly than one transcript per cell) (Jongeneel, C.V., et al., Proc Natl 15 differentially expressed. The following criteria were used to Acad Sci USA, 2003). Additionally, cDNA signatures were identify perturbed networks: a perturbed network must have classified by their positions relative to polyadenylation sig more than 3 genes represented our differentially expressed nals and poly(A)tails and by their orientation relative to the gene list (p<0.001) and at least 50% of those genes must be up 5'-->3' orientation of source mRNA. The Z-test (Man, M. Z. regulated, it was considered an up-regulated pathway (vice et al., Bioinformatics, 16:953-959, 2000) was used to calcu Versa for the down-regulated pathways). late P values for comparison of gene expression levels Display of KEGG Networks by Cytoscape: between the cell lines. Cytoscape Software was used (www.dot cytoscape dot org) Isotope-Coded Affinity Tag (ICAT) Analysis: (Shannon, P., et al., Genome Res, 13: 2498-2504, 2003), to ICAT reagents were purchased from Applied Biosystems map the data onto the web of intracellular molecular interac Inc. Fractionation of cells into cytosolic, microsomal and 25 tions. We imported metabolic network maps and related nuclear fractions, as well as ICAT labeling, MS/MS, and data information Such as enzymes, Substrates, and reactions from analyses were performed as described by Han et al. Nat the recently developed KEGG (http colon double slash www Biotechnol, 19:946-951, 2001. In addition, probability score dot genome dot ad dot.jp slash) API 2.0 web server into the analysis (Keller, A., et al., Anal Chem, 74:5383-5392, 2002) Cytoscape program. Expression data were thus automatically and ASAPRatio (Automated Statistical Analysis on Protein 30 mapped to the KEGG and Biocarta pathways/networks and Ratio) (Li, X. J., et al., Anal Chem, 75: 6648-6657, 2003) visualized by Cytoscape. were used to assess the quality of MS spectra and to calculate MPSS Analyses of the Androgen-Dependent LNCaP Cell protein ratios from multiple peptide ratios. (Briefly, and as Line and its Androgen-Independent Variant CL1: described at http colon double slash regis dot systemsbiology Using MPSS technology, 2.22 million signature sequences dot net/software, Automated Statistical Analysis on Protein 35 were sequenced for LNCaP cells and 2.96 million for CL1 Ratio (ASAPRatio) accurately calculates the relative abun cells. dances of proteins and the corresponding confidence intervals A total of 19,595 unique transcript signatures expressed at from ICAT-type ESI-LC/MS data. The software first uses a levels >3 tpim in at least one of the samples were identified. Savitzky-Golay smoothing filter to reconstruct LC spectra of The signatures were classified into three major categories: a peptide and its partner in a single charge state, Subtracts 40 1093 signatures matched repeat sequences; 15.541 signatures background noise from each spectrum, and calculates light: matched unique cDNAs or ESTs, and 2961 signatures had no heavy ratio of the peptide in that charge state. The ratios of the matches to any cDNA or EST sequences (but did match same peptide in different charge states are averaged and genomic sequences). The last category included sequences weighted by the corresponding spectrum intensity to obtain falling into one of three different categories: signatures rep the peptide light:heavy ratio and its error. Subsequently, all 45 resenting new transcripts yet to be defined, signatures repre unique peptides identified for a given protein are collected, senting polymorphisms in cDNA sequences (a match of an their ratios and errors calculated, outliers are checked for MPSS sequence to cDNA or EST sequences requires 100% using Dixon's tests, and the relative abundance and confi sequence identity), or errors in the MPSS reads. Transcript dence interval for the protein are calculated by applying sta tags with matches to a cDNA or EST sequence were further tistics for weighed samples. The Software quickly generates a 50 classified based on the signatures relative orientation to tran list of interesting proteins based on their relative abundance. Scription direction and their position relative to a polyadeny A byproduct of the software is to identify outlier peptides lation site and/or poly(A)tail. A searchable MySQL database which may be misidentified or, more interestingly, post-trans (www dot mysql dot com) was also built containing the lationally modified.) To compare protein and mRNA expres expression levels (tpm), the genomic locations of the MPSS sion levels, the Unigene numbers of the differentially 55 sequences, the cDNAs or EST matches, and the classification expressed proteins were used to find MPSS signatures and of each signature. their expression levels in transcripts per million (tpm). If one The first analysis was restricted to those MPSS signatures Unigene had more than one MPSS signature, likely due to corresponding to cDNAs with poly(A)tails and/or polyade alternative terminations, the average tpmofall signatures was nylation sites, so that corresponding genes could be conclu taken. 60 sively identified. The Z-test was used to compare differential Real-Time RT-PCR: gene expression between LNCaP cells and CL1 cells (Mann, All primers were designed with the PRIMER3 program et al., 2000, supra). Using very stringent P values (less than (httpcolon double slash www-genome dot widot mit dot edu 0.001), 2088 MPSS signatures were identified (correspond slash cgi-bin slash primer slash primer3 www dot cgi) and ing to 1987 unique genes, as some genes have two or more BLAST-searched against the human cINA and EST database 65 MPSS signatures, due to alternative usages of polyadenyla for uniqueness. Real-time PCR was performed on an ABI tion sites) with significant differential expression. Of these, 7700 machine (PE Biosystems) and the SYBR Green dye 1011 signatures (965 genes) were overexpressed in CL1 cells, US 9,002,652 B1 35 36 and 1077 signatures (1022 genes) were overexpressed in tion and comparisons. The results showed that only those LNCaP cells. The significance score of Z-test was dependent genes expressed at >40 tpm by MPSS could be reliably on the expression level. If a cut off P value of less than 0.001 detected as changing levels by cDNA microarray hybridiza was taken in the dataset, the expression level in tpm changed tions judged by an expression level twice the standard devia from 0 to 26 tpm for the most lowly expressed transcript (>26 tion of the background, a standard cutoff value for microarray data analysis. This observation is consistent with the 33-60 fold); and changed from 7591 and 11206 tpm for the most tpim sensitivity of microarrays estimated from the experiment highly expressed transcript (1.48 fold). performed by Hill et al. Science, 290: 809-812, 2000, in The expression levels of nine randomly chosen genes were which known concentrations of synthetic transcripts were identified using the MPSS and quantitative real-time RT-PCR added. In LNCaP and CL1 cells, about 68.75% (13.471 of techniques and showed that both RNA data sets were concor 10 19,595) of MPSS signatures (>3 tpm) were expressed at a dant. The MPSS expression profiling data were consistent level below 40 tpm; changes in the levels of these genes will with the available published data. For example, using RT be missed by microarray methods. Many attempts have been PCR, Patel et al. (Patel, B.J., et al., J Urol, 164: 1420-1425, made to increase the sensitivity of DNA array technology 2000) showed that CL1 tumors express barely detectable (Han, M., et al., Nat Biotechnol, 19:631-635, 2001; Bao, P., prostate-specific antigen (PSA) and androgen receptor (AR) 15 et al., Anal Chem, 74: 1792-1797, 2002.), however, the mRNAs as compared with LNCaP cells. The present MPSS present study has not compared these new improvements results indicated that LNCaP cells expressed 584 tpm of against MPSS but it is clear that there will still be significant androgen receptor (AR) and 841 tpm of PSA; CL1 cells did differences in the levels of change that can be detected. not express either AR or PSA (Otpm in both cases). Freedland SAGE (serial analysis of gene expression) (Velculescu, V. et al. found that CD10 expression was lost in CL1 cells E., et al., Trends Genet, 16: 423-425, 2000) is another tech compared with LNCaP cells (Freedland, S.J., et al., Prostate, nology for gene expression profiling; like MPSS, it is digital 55: 71-80, 2003); the present study found that CD10 was and can generate a large number of signature sequences. expressed at 0 tpm in CL1 cells but at 56 tpm in LNCaP cells. However, MPSS, which can sequence ~1 million signatures Using cDNA microarrays, Vaarala et al. (Vaarala, M. H., et per sample, can achieve a much deeper coverage than SAGE al., Lab Invest, 80: 1259-1268, 2000) compared LNCaP cells 25 (typical ~10,000-100,000 signatures sequenced/sample) at and another androgen-independent variant, non-PSA-pro reasonable cost. The MPSS data on LNCaP cells was com ducing LNCaPline, which is similar to CL1, and identified a pared against publicly available SAGE data on LNCaP cells total of 56 differentially expressed genes. We found com (NCBI SAGE database) through common Unigene IDs. The pletely concordant expression changes in these 56 genes SAGE library GSM724 (total SAGE tags sequenced: 22,721) between LNCaP and CL1 (in contrast to 1987 found by 30 (Lal, A., et al., Cancer Res, 59:5403-5407, 1999) was derived MPSS), and between LNCaP and non-PSA-producing from LNCaP cells with an inactivated PTEN gene; it is the LNCaP cells. This underscores the striking differences in SAGE library most similar to the LNCaP cells. Only 400 sensitivity between the MPSS and cDNA microarray tech (about 20%) of the 1987 significantly differentially expressed niques. genes (P<0.001) had any SAGE tag entry in GSM724. These CL1 cells do not express AR and thus lack the AR-medi 35 data illustrate the importance of deep sequence coverage in ated response program. To distinguish androgen response identifying state changes in transcripts expressed at low abun from other programs contributing to prostate cancer progres dance levels. sion, the list of genes differentially expressed between Functional Classifications of Genes Differentially LNCaP and CL1 cells were compared with a complementary Expressed Between LNCaP and CL1 Cells: list derived from MPSS analysis of LNCaP cells grown in the 40 Examination of the GO () classification of presence or absence of androgens (LNCaPR+/R-). From the the 1987 genes revealed that multiple cellular processes 1987 differentially expressed gene between LNCaP and CL1, change during the transition from LNCaP cells to CL1 cells. 525 genes were identified that were also differentially The most interesting groups, categorized by function, are expressed in the LNCaPR+/R- dataset. Differential expres shown in Table 1. sion of these genes between LNCaP and CL1 cells probably 45 Nineteen differentially expressed proteins are related to reflects the fact that LNCaP cells express AR but CL1 does apoptosis. Twelve of these are up regulated in CL1 cells, not, and the fact that normal medium contains some andro including the apoptosis inhibitors Taxi (human T-cell leuke gen. The remaining 1462 differentially expressed genes were mia virus type I) binding protein 1 (TAX1 BP1) and CASP8 not directly related to cellular AR status. and FADD-like apoptosis regulator. Seven are down regu To compare the sensitivity of the MPSS and cDNA 50 lated in CL1, including programmed cell death 8 and 5 (apo microarray procedures, cDNA microarrays containing ptosis-inducing factors), and BCL2-like 13 (an apoptosis 40,000 human cDNAs were hybridized to the same LNCaP facilitator). Since CL1 cells have increased expression of and CL1 RNAs that were used for MPSS. Three replicate apoptosis inhibitors and decreased expression of apoptosis array hybridizations were performed. MPSS signatures and inducers, net inhibition of apoptosis may contribute to their array clone IDs were mapped to Unigene IDs for data extrac greater tumorigenicity. TABLE 1

EXAMPLES OF DIFFERENTIALLY EXPRESSED GENES AND THEIRFUNCTIONAL CLASSIFICATIONS

LNCaP CL1 Signatures (tpm) (tpm) Description GenBank ID SEQID NOS: Apoptosis related

GATCAAATGTGTGGCCT O 3609 lectin, BCOO1693 1574-1575 (SEQ ID NO:3) galactoside US 9,002,652 B1 37 38 TABLE 1-continued

EXAMPLES OF DIFFERENTIALLY EXPRESSED GENES AND THEIRFUNCTIONAL CLASSIFICATIONS

LNCaP CL1 Signatures (tpm) (tpm) Description GenBank ID SEQ ID NOS: binding, soluble, 1 (galectin 1), GA TCATAATGTTAACTA O 14 pleiomorphic NM 002656 576-1577 (S D NO:4) adenoma gene like 1 (PLAGL1) TCATCCAGAGGAGCT O 16 caspase 7, U4O281 578-1579 (S D NO:5) apoptosis related cysteine protease TCGCGGTATTAAATC O 15 tumor necrosis U7538O S80-1581 (S D NO: 6) factor receptor Superfamily, member 12 G TCTCCTGTCCATCAG O 24 interleukin 1, M15330 S82-1583 (S D NO:7) beta G TCCCCTTCAAGGACA 1 19 nudix (nucleoside NM 006024 S84-1585 (S D NO:8) diphosphate linked moiety X)-type motif 1 GATCATTGCCATCACCA 51 278 EST, Highly AL832733 1586 (S similar to CUL2 HUMA NCULLIN HOMOLOG 2 GA TCTGAAAATTCTTGG 16 S6 CASP8 and U97075 rS87-1588 (S ID NO: 10) FADD-like apoptosis regulator TCCACCTTGGCCTCC 49 149 tumor necrosis NMOO3842 1589-1590 (S ID NO: 11) factor receptor Superfamily, member 10b TCATGAATGACTGAC 118 257 cytochromec BCOO9582 1591-1592 D NO:12) TCAAGTCCTTTGTGA 299 102 programmed H2O713 1593 D NO: 13) cell death 8 (apoptosis inducing factor) TCACCAAAACCTGAT 72 24 BCL2-like 13 BM904887 1594 D NO: 14) (apoptosis facilitator) TCAATCTGAACTATC S63. 146 apoptosis NM 016085 1595-1596 D NO: 15) related protein APR-3 (APR-3) TCCCTCTGTACAGGC 83 13 unc-13-like (C. NM OO6377 1597-1598 D NO:16) elegans) (UNC13), mRNA. TCTGGTTGAAAATTG 1OO6 49 CED-6 protein NM 016315 1599-1600 (S D NO:17) (CED-6), mRNA. G TCTCCCATGTTGGCT 86 4 CASP2 and BCO 17042 1601-16O2 (S D NO: 18) RIPK1 domain containing adaptor with death domain GATCAGAAAATCCCTCT 27 1 DEAD/H (Asp- BCO11556 1603-1604 (SEQID NO:19) Glu-Ala Asp/His) box polypeptide 20, 103 kDa GATCAAGGATGAAAGCT 50 3 programmed D2O426 1605 (SEQID NO:20) cell death 2 GATCTGATTATTTACTT 1227 321 programmed NM 004708 1606-1607 (SEQID NO:21) cell death 5 GATCAAGTCCTTTGTGA 299 102 programmed NMOO4208 1608-1609 (SEQID NO: 22) cell death 8 (apoptosis inducing factor)

US 9,002,652 B1 41 42 TABLE 1-continued

EXAMPLES OF DIFFERENTIALLY EXPRESSED GENES AND THEIRFUNCTIONAL CLASSIFICATIONS

LNCaP CL1 Signatures (tpm) (tpm) Description GenBank ID SEQ ID NOS: GATCGGTGCGTTCTCCT 287 509 CD107a, AIS21424 1648 (SEQID NO:47) lysosomal associated membrane protein 1 GATCTACAAAGGCCATG 161 681 CD29, integrin, NMOO2211 1649-1650 (SEQID NO: 48) beta 1 GATCATTTATTTTAAGC 56 O CD10 (neutral BQ013520 1651 (SEQID NO:49) endopeptidase, enkephalinase) GATCAGTCTTTATTAAT 150 50 CD107b, A459107 1652 (SEQID NO: 50) lysosomal associated membrane protein 2 GATCTTGGCTGTATTTA 84 1014 CD59 antigen NMOOO611 1653-1654 (SEQID NO:51) p18-20 GATCTTGTGCTGTGCTA 4.08 234 CD9 antigen NM OO1769 1655-1656 (SEQID NO:52) (p.24) Transcription factors

GATCAAATAACAAGTCT O 62 transcription BM854818 1657 (SEQID NO:53) factor BMAL2 GATCTCTATGTTTACIT O 27 transcription BG163364 1658 (SEQID NO:54) factor BMAL2 GATCCTGACACATA AGA 12 74 transcription BFOSS294 1659 (SEQID NO:55) factor BMAL2 GATCATTTTGTATTAAT 10 61 transcription BCO478.78 1660-1661 (SEQID NO:56) factor NRF GATCGTCTCATATTTGC 52 O transcriptional NM O25085 1662-1663 (SEQID NO. 57) coactivator tubedown-100 GATCCCCCTCTTCAATG transcriptional AJ299.431 1664-1665 (SEQID NO:58) co-activator with PDZ binding motif GATCAAATGCTATTGCA transcriptional AI126SOO 1666 (SEQID NO:59) regulator interacting with the PHS bromodomain 2 GATCTGTGACAGCAGCA 140 35 transducer of BCO31406 1667-1668 (SEQID NO: 60) ERBB2, 1 GATCAAATCTGTACAGT 239 23 transducer of AA694240 1669 (SEQID NO: 61) ERBB2, 2 Annexins and their ligands

ATCCTGTGCAACAAGA O 69 annexin A10 BCOO732O 1670-1671 S E QID NO: 62) TCTGTGGTGGCAATGC 41 630 annexin A11 ALS76782 1672 S. Q D NO:63) AGAATCATGGTCT O 1079 annexin A2 BCOO1388 1673-1674 D NO: 64) se 3. TCTTTGACTGCTG 210 860 annexin A5 BCOO1429 1675-1676 D NO: 65) R 8.CAAAAACATCCTG 83 241 annexin A6 AIS66871 1677 EQID NO: 66) AGAAG ACTTTAAT O 695 annexin Al BCOO1275 1678-1679 EQID NO: 67) TCAGGACACTTAGCA O 2949 S100 calcium BCO15973 1680-1681 EQID NO:68) binding protein A10 (annexin II ligand) Matrix metalloproteinase

GATCATCACAGTTTGAG O 38 matrix BCOO2591 1682-1683 (SEQID NO: 69) metalloproteinase 10 (stromelysin 2) GATCCCAGAGAGCAGCT O 108 matrix BCO13118 1684-1685 (SEQID NO: 70) metalloproteinase 1 (interstitial collagenase) US 9,002,652 B1 43 44 TABLE 1-continued

EXAMPLES OF DIFFERENTIALLY EXPRESSED GENES AND THEIRFUNCTIONAL CLASSIFICATIONS

LNCaP CL1 Signatures (tpm) (tpm) Description GenBank ID SEQID NOS: GATCGGCCATCAAGGGA O 25 matrix AI370581 1686 (SEQID NO: 71) metalloproteinase 13 (collagenase 3) GATCTGGACCAGAGACA O 10 matrix BG-3321SO 1687 (SEQID NO: 72) metalloproteinase 2 (gelatinase A)

15 Matrix metalloproteinases (MMPs), which degrade extra Genes and proteins rarely act alone but rather generally cellular matrix components that physically impede cell operate in networks of interactions. Identifying key nodes migration, are implicated in tumor cell growth, invasion, and (proteins) in the disease-perturbed networks may provide insights into effective drug targets. Comparing the genes metastasis. MMP1, 2, 10 and 13 were found to be signifi (proteins) currently available in the 314 BioCarta and 155 cantly overexpressed in CL1 cells (Table 1), which may par KEGG pathway or network (http colon double slash cgap dot tially explain these cells aggressive and metastatic behavior. inci dot nih dot.gov slash Pathways slash) databases with the CD (cluster designation of monoclonal antibodies) mark MPSS data through Unigene IDs, we identified 37 BioCarta ers are generally localized at the cell Surface; some may be and 14 KEGG pathways that are up regulated and 23 BioCarta associated with prostate cancer (Liu, A.Y., et al., Prostate, 40: 25 and 22 KEGG pathways down regulated in LNCaP cells 192-199, 1999). All currently identified CD markers (CD1 to versus CL1 cells (Table 2). The number of genes whose CD247) from the PROW CD index database (www.dot ncbi expression patterns changed in each pathway is listed in Table dot nlm dot nih dot.gov.slash prow slash guide slash 45277084 2. Each gene along with its expression level in LNCaP and dot htm) were converted to UniGene numbers and the Uni CL1 cells is listed pathway by pathway in our database (fip gene numbers used to identify their signatures and their 30 colon double slash ftp dot systemsbiology dot net slash blin expression levels. Fifteen CD markers were identified that slash impSS). Changes in these pathways reveal the underlying were differentially expressed between LNCaP and CL1 cells phenotypic differences between LNCaP and CL1 cells. For (Z score <0.001) (Table 1). Eleven CD markers, including example, multiple networks involved in modulating cell CD213a2 and CD213a1, which encode IL-13 receptors alpha mobility, adhesion and spreading are up regulated in CL1 1 and 2, are up regulated in CL1 cells; three CD markers, 35 cells, which are more metastatic and invasive than LNCaP CD9, CD10, and CD107, WERE downregulated in these cells cells (Table 2). In the uCalpain and Friends in Cell Spread (Table 1). Six CD markers went from 0 or 1 tpm to >35 tpm pathway, calpains are calcium-dependent thiol proteases (Table 1), making them good digital or absolute markers or implicated in cytoskeletal rearrangements and cell migration. therapeutic targets. These data Suggest that carefully selected During cell migration, calpain cleaves target proteins such as CD markers may be useful in following the progression of 40 talin, ezrin, and paxillin at the leading edge of the membrane, prostate cancer, and indeed could serve as potential targets for while at the same time cleaving the cytoplasmic tails of the antibody-mediated therapies (Liu, A.Y., et al., Prostate, 40: integrins B1 (a) and B3(b) to release adhesion attachments at 192-199, 1999). the trailing membrane edge. Increased activity of calpains Delineation of Disease-Perturbed Networks in Prostate increases migration rates and facilitates cell invasiveness Cancer Cells. (Liu, A. et al., Prostate, 40: 192-199, 1999). TABLE 2

PATHWAYS THAT ARE UP OR DOWN REGULATED COMPARING LNCAP TO CL1 CELLS. # Genes hits # p < 0.001 & # p < 0.001 & # no Pathways in a pathway LNCA > CL1 LNCA < CL1 change Up-regulated Pathways in LNCAP cells BioCarta Pathways Mechanism of Gene Regulation 35 9 2 24 by Peroxisome Proliferators via PPARa alpha T Cell Receptor Signaling 21 6 2 13 Pathway ATM Signaling Pathway 15 5 2 8 CARM1 and Regulation of the 18 5 2 11 Estrogen Receptor HIV-I Nef negative effector of 33 5 2 26 Fas and TNF EGF Signaling Pathway 17 5 11 Role of BRCA1 BRCA2 and 16 5 10 ATR in Cancer Susceptibility US 9,002,652 B1 45 46 TABLE 2-continued

PATHWAYS THAT ARE UP OR DOWN REGULATED COMPARING LNCAP TO CL1 CELLS. # Genes hits # p < 0.001 & # p < 0.001 & # no Pathways in a pathway LNCA > CL1 LNCA < CL1 change TNFR1 Signaling Pathway 7 5 11 Toll-Like Receptor Pathway 7 5 11 FAS signaling pathway CD95 7 4 12 VEGF Hypoxia and 6 4 11 Angiogenesis Bone Remodelling 9 3 5 ER associated degradation 1 3 7 ERAD Pathway Estrogen-responsive protein 1 3 7 Efp controls cell cycle and breast tumors growth influence of Ras and Rho 6 3 12 proteins on G1 to S Transition inhibition of Cellular 3 3 9 Proliferation by Gleevec Map Kinase Inactivation of 9 3 5 SMRT Corepressor NFkB activation by 6 3 12 Nontypeable Hemophilus influenzae RB Tumor Suppressor O 3 6 Checkpoint Signaling in response to DNA damage Transcription Regulation by O 3 6 Methyltransferase of CARM1 Ceramide Signaling Pathway 3 4 O 9 Cystic fibrosis transmembrane 7 4 O 3 conductance regulator and beta 2 adrenergic receptor pathway Nerve growth factor pathway 1 4 NGF PDGF Signaling Pathway 6 4 O 12 TNF Stress Related Signaling 4 4 O 10 Activation of Csk by cAMP- 9 3 O 6 dependent Protein Kinase Inhibits Signaling through the T Cell Receptor AKAP95 role in mitosis and 1 3 O 8 dynamics Attenuation of GPCR Signaling 7 3 O 4 Chaperones modulate 1 3 O 8 interferon Signaling Pathway ChREBP regulation by 2 3 O 9 carbohydrates and cAMP IGF-1 Signaling Pathway 1 3 O 8 Insulin Signaling Pathway 1 3 O 8 NF-kB Signaling Pathway 1 3 O 8 Protein Kinase A at the 2 3 O 9 Centrosome Regulation ofck1 colk5 by type O 3 O 7 1 glutamate receptors Role of Mitochondria in O 3 O 7 Apoptotic Signaling Signal transduction through 4 3 O 11 IL1R KEGG Pathways

Aminosugars metabolism 24 9 4 11 Androgen and estrogen 37 13 5 19 metabolism Benzoate degradation via 5 3 1 hydroxylation C21-Steroid hormone 4 1 O metabolism CS-Branched dibasic acid 2 2 O O metabolism Carbazole degradation 1 1 Terpenoid biosynthesis 4 1 1 Chondroitin heparan Sulfate 14 8 biosynthesis Fatty acid biosynthesis (path 1) 3 2 O 1 Fluorene degradation 3 2 O Pentose and glucuronate 19 9 1 9 interconversions US 9,002,652 B1 47 48 TABLE 2-continued

PATHWAYS THAT ARE UP OR DOWN REGULATED COMPARING LNCAP TO CL1 CELLS. # Genes hits # p < 0.001 & # p < 0.001 & # no Pathways in a pathway LNCA > CL1 LNCA < CL1 change Phenylalanine, tyrosine and 10 5 2 3 tryptophan biosynthesis Porphyrin and chlorophyll 28 13 3 12 metabolism Streptomycin biosynthesis 6 4 1 Up-regulated Pathways in CL1 cells BioCarta Pathways Rho cell motility signaling 18 2 6 10 pathway Trefoil Factors Initiate 14 6 Mucosal Healing integrin Signaling Pathway 5 Ca Calmodulin-dependent 4 Protein Kinase Activation Effects of calcineurin in 4 Keratinocyte Differentiation Angiotensin II mediated 12 3 activation of JNK Pathway via Pyk2 dependent signaling Bioactive Peptide Induced 16 3 12 Signaling Pathway CBL mediated ligand-induced 3 downregulation of EGF receptors Control of skeletal myogenesis 12 3 by HDAC calcium calmodulin-dependent kinase CaMK How does salmonella hijack a 3 cell Melanocyte Development and 3 Pigmentation Pathway Overview of telomerase protein 3 component gene hTert Transcriptional Regulation Regulation of PGC-1a O 4 ADP-Ribosylation Factor O 3 Downregulated of MTA-3 in O 3 ER-negative Breast Tumors Endocytotic role of NDK O 3 Phosphins and Dynamin Mechanism of Protein Import O 3 into the Nucleus Nuclear Receptors in Lipid O 3 Metabolism and Toxicity Pertussis toxin-insensitive O 3 CCR5 Signaling in Macrophage Platelet Amyloid Precursor O 3 Protein Pathway Role of Ran in mitotic spindle O 3 regulation Sumoylation by RanBP2 O 3 Regulates Transcriptional Repression uCalpain and friends in Cell O 3 spread KEGG Pathways

Arginine and proline 45 7 16 22 metabolism ATP synthesis 31 15 Biotin metabolism 1 3 Blood group glycolipid 12 1 6 5 biosynthesis -lactoseries Cyanoamino acid metabolism O 3 Ethylbenzene degradation 1 3 Ganglioside biosynthesis 16 2 6 Globoside metabolism 17 3 8 Glutathione metabolism 26 4 10 Glycine, serine and threonine 32 6 14 metabolism US 9,002,652 B1 49 50 TABLE 2-continued

PATHWAYS THAT ARE UP OR DOWN REGULATED COMPARING LNCAP TO CL1 CELLS. # Genes hits # p < 0.001 & # p < 0.001 & # no Pathways in a pathway LNCA > CL1 LNCA < CL1 change Glycosphingolipid metabolism 35 6 18 11 Glycosylphosphatidylinositol 26 5 12 9 (GPI)-anchor biosynthesis Glyoxylate and dicarboxylate 9 1 6 2 metabolism Huntington's disease 25 4 10 11 Methane metabolism 9 1 3 5 O-Glycans biosynthesis 19 3 8 8 One carbon pool by folate 12 2 8 2 Oxidative phosphorylation 93 21 45 27 Parkinson's disease 30 5 14 11 Phospholipid degradation 21 4 12 5 Synthesis and degradation of 7 1 3 3 ketone bodies Urea cycle and metabolism of 18 2 8 8 amino groups

Many pathways we identified as perturbed in the LNCaP Commun, 322: 1166-1170, 2004), seven are involved in fatty and CL1 comparison are interconnected to form networks (in acids and lipid metabolism that are involved in the carcino fact there are probably no discrete pathways, only networks). genesis and progression of prostate cancer (Pandian, S.S., et For example, the insulin signaling pathway, the signal trans 25 al., J R Coll Surg Edinb, 44; 352-361, 1999), five are related duction through IL1R pathway, NF-kB signaling pathway are to apoptosis, 11 are cancer related, and five proteins are interconnected through c-Jun, IL1R and NF-kB. The map putative transcription factors. As we only identified a limited ping of genes onto networks/pathways will be an ongoing number of proteins that are significantly differentially objective as more networks/pathways become available. Our expressed due to low sensitivity of ICAT technology, we were transcriptome data will be an invaluable resource in delineat 30 only able to identify a few pathways that are perturbed based ing these relationships. on ICAT data alone (using the stringent criteria discussed As gene regulatory networks controlled by transcription above). This also illustrated the importance of MPSS analysis factors form the top layer of the hierarchy that controls the described earlier. physiological network, we sought to identify differentially 103 of 190 (54%) differentially expressed proteins identi expressed transcription factors. Of 554 transcription factors 35 fied have enzymatic activity and hence many are involved in expressed in LNCaP and CL1 cells, 112 showed significantly metabolism. Notably, many of the proteins identified are different levels between the cell lines (P<0.001) This clearly involved in fatty acid and lipid metabolism, including fatty demonstrated significant difference in the functioning of the acid synthase, carnitine palmitoyltransferase II and propionyl corresponding gene regulatory networks during the progres Coenzyme A carboxylase alpha polypeptide. Fatty acid and sion of prostate cancer from the early to late stages. 40 lipid metabolism is known to be perturbed in prostate cancer Quantitative Proteomics Analysis of Prostate Cancer Cells. (Fleshner, N., et al., J Urol, 171: S19-24, 2004). Additionally, We quantitatively profiled the protein expression changes many genes involved in lipid transport were altered, including between LNCaP and CL1 cells using the ICAT-MS/MS pro the annexins, prosaposin, and fatty acid binding protein 5. tocol described by Han et al. Nat Biotechnol, 19: 946-951, Annexin A1 has previously been shown to be overexpressed 2001. To increase proteome coverage, cells were separated 45 in non-PSA-producing LNCaP cells as compared with PSA into nuclear, cytosolic and microsomal fractions prior to producing LNCaP cells (Vaarala, M. H., et al., 2000, supra). ICAT analysis as described in Han et al., 2001, supra. We Annexin A7 is postulated to be a prostate tumor suppressor generated a total of 142,849 tandem mass spectra, 7282 of gene (Cardo-Vila, M., et al., Pharmacogenomics J. 1:92-94, which corresponded to peptides with a mass spectrum quality 2001). Annexin A2 expression is reduced or lost in prostate score P value (Keller, A., et al., Anal Chem. 2002 Oct. 15: 50 cancer cells, and its re-expression inhibits prostate cancer cell 74(20):5383-92) greater than 0.9 (allowing unambiguous migration (Liu, J. W., et al., Oncogene, 22: 1475-1485, 2003). identification of peptides). These 7282 peptides represented Other genes identified here have been implicated in car 971 proteins (Keller, A., et al., 2002, supra). We obtained cinogenesis, including tumor Suppressorp16 and insulin-like quantitative peptide ratios for 4583 peptides corresponding to growth factor 2 receptor (Chi, S. G. et al., Clin Cancer Res, 941 proteins. The number of peptides is greater than the 55 3: 1889-1897, 1997: Kiess, W., et al., Horm Res, 41 Suppl2: number of proteins because 1) mass spectrometry identified 66-73, 1994). Some genes have previously been implicated in multiple peptides from the same protein and 2) the ionization prostate cancer, Such as prostate cancer over expressed gene 1 step of mass spectrometry created different charge states for POV1, which is over expressed in prostate cancer (Cole, K. the same peptide. The protein ratios were calculated from A., et al., Genomics, 51: 282-287, 1998), and delta 1 and multiple peptide ratios using an algorithm for the automated 60 alpha 1 catenin (cadherin-associated protein) and junction statistical analysis of protein abundance ratios (ASAPRatio) plakoglobin, which are down regulated in prostate cancer (Li, X.J., et al., Anal Chem, 75: 6648-6657, 2003). In the end, cells (Kallakury, B.V., et al., Cancer, 92: 2786-2795, 2001). we identified 82 proteins that are down regulated and 108 However, the potential relationships of most of the proteins proteins that are up regulated by at least 1.8-fold in LNCaP identified here to prostate cancer require further elucidation. cells compared with CL1 cells. For example, five proteins 65 For example, transmembrane protein 4 (TMEM4), a gene belong to annexins that were markers for prostate and other predicted to encode a 182-amino acid type II transmembrane cancers (Hayes, M.J. and Moss, S. E. Biochem Biophy's Res protein, is downregulated about twofold in CL1 cells com US 9,002,652 B1 51 52 pared with LNCaP cells. MPSS data also indicated that protein IDs and MPSS signatures to Unigene IDs to compare TMEM4 is down regulated about twofold in CL1 cells. Many the MPSS data with the ICAT-MS/MS data. We limited this type II transmembrane proteins, such as TMPRSS2, are over comparison to those with common Unigene IDs and with expressed in prostate cancer patients (Vaarala, M. H., et al., reliable ICAT ratios (standard deviation less than 0.5) and Int. J Cancer; 94: 705-710, 2001). It will be interesting to see ended up with a subset of 79 proteins. Of these, 66 genes whether TMEM4 overexpression plays a primary role in (83.5%) were concordant in their changes in mRNA and prostate carcinogenesis. We also identified 12 proteins that protein levels of expression and 13 genes (16.5%) were dis have not been annotated or functionally characterized. cordant, i.e. having higher protein expression but lower The mRNA expression level of eight proteins change from mRNA expression or vice versa. There are no functional Otpm in LNCaP cells to greater than 50 tpm (we called them 10 similarities among the discordant genes. As these mRNAS digital changes because they go from Zero to some expres and proteins are expressed at relatively high levels, discor sion) in CL1 cells, and that of one protein changed from 0 tpm dance due to measurement errors is unlikely. Clearly post in CL1 cells to greater than 50 in LNCaP cells. These genes transcriptional mechanism(s) of protein expression are func can be used as digital diagnostic signals. Twenty-two of the tioning, although the elucidation of the specific differentially expressed proteins were predicted to be 15 mechanism(s) awaits further studies. secreted proteins (See Table 3) and can be further evaluated as Thus, these results, and those described in the Examples serum marker (see also Example 2 below). below, indicate a systems approach to disease will offer pow Additionally, we sought to compare the expression at the erful tools for diagnostics, therapeutics, and even aid in pre protein level with that at the mRNA level. We converted the vention in the future. TABLE 3

DIFFERENTIALLY EXPRESSED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQID Signature ID NO: Number NOS: Description GATCAGCATGGGCCACG 73 NM OO1928 594-595 D component of complement (adipsin) GATCTACTACTTGGCCT 74 NM OO628O 596-597 signal sequence receptor, delta (translocon associated protein delta) GATCCTGTTGGGAAAGA 75 NM 203329 598-599 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344 ) GATCCTGTTGGGAAAGA 76 NM 203331 600-601 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344 ) GATCCCTGAAGTTGCCC 77 NM 203331 600-601 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344 ) GATCTTGGCTGTATTTA 78 NM 203331 600-601 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344) GATCCCTGAAGTTGCCC 79 NM 203330 602-603 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344 ) GATCCTGTTGGGAAAGA 8O NM 203330 602-603 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344 ) GATCTTGGCTGTATTTA 81 NM 203330 602-603 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 an G344) GATCCCTGAAGTTGCCC 82 NM 203329 598-599 igen p18-20 (antigen identified by ibodies 6.3A5, EJ16, E.J30, G344) GATCTTGGCTGTATTTA 83 NMOOO611 604-605 igen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, E L32 and G344) US 9,002,652 B1 53 54 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCCCTGAAGTTGCCC 84 NM 000611 604-605 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344) GATCCTGTTGGGAAAGA 85 NM 000611 604-605 CD59 antigen p18-20 (antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344) GATCTTGGCTGTATTTA 86 NM 203329 598-599 CD59 antigen p18-20 antigen identified by monoclonal antibodies 6.3A5, EJ16, E.J30, EL32 and G344) GATCTGTGCTGACCCCA 87 NM OO2982 606-607 chemokine (C-C motif) igand 2 GATCTCTTGGAATGACA 88 NM 012242 608-609 dickkopf homolog 1 (Xenopus laevis) GATCACCATCAAGCCAG 89 NM 012242 608-609 dickkopf homolog 1 (Xenopus laevis) GATCAAACAGCTCTAGT 90 NM 016308 610-611 UMP-CMP kinase GATCCCCTGTTACGACA 91 NM 014155 612-613 HSPCO63 protein GATCTCTGATTACCAGC 92 NM O25205 614-615 mediator of RNA polymerase II transcription, Subunit 28 homolog (yeast) GATCATTGAACGAGACA 93 NM 03.1903 616-617 mitochondrial ribosomal protein L32 GATCACAGACCACGAGT 94 NM 178507 618-619 NS5ATP13TP2 protein GATCTGCATCAGTTGTA 95 NM 148170 620-621 cathepsin C GATCTCTTGCTAGATTT 96 NM 005059 622-623 relaxin 2 GATCACAAGGCTGCCTG 97 NM 000405 624-625 GM2 ganglioside activator GATCGTTTCTCATCTCT 98 NM OO6432 626-627 Niemann-Pick disease, type C2 GATCCCCGCGATACTTC 99 NM O15921 628-629 chromosome 6 open reading frame 82 GATCTTTTTTTGGATAT OO NM 181777 630-631 ubiquitin-conjugating enzyme E2A (RAD6 homolog) GATCCGAGAGTAAGGAA NM 032488 632-633 cornifelin GATCATGTGTTTCCATG NM O14435 634-63S N-acylsphingosine amidohydrolase (acid ceramidase)-like GATCTCAGAACAACCTT NM O16029 636-637 dehydrogenase reductase (SDR family) member 7 GATCTTACCTCCTGATA NM 020467 638-639 hypothetical protein from clone 643 GATCCCAGACTGGTTCT 05 NM 003782 640-641 UDP-Gal:betaGlcNAc beta1,3- galactosyltransferase, polypeptide 4 GATCAAGTGCATTTGAC NM 173631 642-643 Zinc finger protein 547 GATCAGTGCGTCATGGA NM 005423 644-645 trefoil factor 2 (spasmolytic protein 1) GATCCAAGAGGAAGAAT NM O14402 646-647 ow molecular mass ubiquinone-binding protein (9.5kD) GATCCAGCAAACAGGTT 09 NM OO3851 648-649 cellular repressor of E1A-stimulated genes 1 GATCATAGAAGGCTATT 10 NM 181834 neurofibromin 2 (bilateral acoustic neuroma) GATCCCCCTTCATTTGA 11 NM 004862 652-653 ipopolysaccharide induced TNF factor GATCCCAAATTTGAAGT 12 NM OO1685 654-6SS ATP synthase, H+ transporting, mitochondrial FO complex, SubunitF6 GATCTGCTTTCTGTAAT 13 NM OO2406 656-657 mannosyl (alpha-1,3-)- glycoptein beta-1,2-N- acetylglucosaminyltransferase US 9,002,652 B1 55 56 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCACTCCTTATTTGC 14 NM O19021 658-659 hypothetical protein FL2OO10 GATCACCTTCGACGACT 15 NM OO3130 660-661 sorcin GATCTCTATTGTAATCT 16 NM OO2489 662-663 NADH dehydrogenase (ubiquinone) 1 alpha Subcomplex, 4, 9 kDa GATCTCCTGGCTGCAAA 17 NM 138429 664-665 claudin 15 GATCCCAGTCTCTGCCA 18 NM 201397 666-667 glutathione peroxidase 1 GATCTTCTTTATAATTC 19 NM 004.048 668-669 beta-2-microglobulin GATCTGTTCAAACAGCA 2O NM 024060 670-671 hypothetical protein MGC5395 GATCGTGCTCACAGGCA 21 NM 033280 672-673 SEC11-like 3 (S. cerevisiae) GATCAATATGTAAATAT 22 NM O2O199 674-675 open reading frame 15 GATCAGCTTTGCTCCTG 23 NM 207495 676-677 hypothetical protein DKFZp686I15217 GATCTCTATGGCTGTAA 24 NM 033211 678-679 hypothetical gene Supported by AF038182: BCOO92O3 GATCTCAGAACCTCTGT 25 NM 00100143 680-681 similar to RIKEN cDNA 492.1524.17 GATCCAGCCATTACTAA 26 NM O16205 682-683 platelet derived growth actor C GATCTTTCCCAAGATTG 27 NM 001001434 684-685 syntaxin 16 GATCGATTCTGTGACAC 28 NM 181726 686-687 low density lipoprotein receptor-related protein binding protein GATCTATTTTTTCTAAA 29 NM 004125 688-689 guanine nucleotide binding protein (G protein), gamma 10 GATCAAGAATCCTGCTC 30 NM OO6332 690-691 interferon, gamma inducible protein 30 GATCGGTGGAGAACCTC 31 NM 175742 692-693 melanoma antigen, amily A, 2 GATCGGTGGAGAACCTC 32 NM 175743 694-695 melanoma antigen, amily A, 2 GATCGGTGGAGAACCTC 33 NM 153488 696-697 melanoma antigen, amily A, 2B GATCATGGGTGAGGGGT 34 NM OO1483 698-699 glioblastoma amplified Sequence GATCCCCCTCACCATGA 35 NM 032621 700-701 brain expressed X-linked 2 GATCAACTAATAGCTCT 36 NM 181892 702-703 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 37 NM 181892 702-703 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAGGAGACCCGGA 38 NM 024540 704-705 mitochondrial ribosomal protein L24 GATCAAGGAGACCCGGA 39 NM 145729 706-707 mitochondrial ribosomal protein L24 GATCCTAAGCCATAGAC 40 NM O25075 708-709 Ngg1 interacting factor 3 ike 1 binding protein 1 GATCCATTGAGCCCAGC 41 NM 181725 710-711 hypothetical protein FLJ12760 GATCTGAGGGCGTCTTC 42 NM 012153 712-713 ets homologous factor GATCTCGGTAGTTACGT 43 NM 012153 712-713 ets homologous factor GATCCCAAGATGATTAA 44 NM 014177 714-715 chromosome 18 open reading frame 55 GATCTCAAACTTGTCTT 45 NM OO3350 716-717 ubiquitin-conjugating enzyme E2 variant 2 GATCATAGTTATTATAC 46 NM 032466 718–719 aspartate beta hydroxylase GATCCCAACTGCTCCTG 47 NM OO5947 720-721 metallothionein 1B (functional) GATCAA AATGCTAAAAC 48 NM 016311 722-723 ATPase inhibitory factor 1 GATCTGITTGTTCCCTG 49 NM 013411 724-725 adenylate kinase 2 GATCA ACAGTGGCAATG 50 NM 001001392 726-727 CD44 antigen (homing function and Indian blood group system) US 9,002,652 B1 58 TABLE 3-continued

DIFFERENTIALLY EXPRESS ED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCAATAATAATGAGG 51 NM 001001392 726-727 CD44 antigen (homing unction and Indian blood group system) GATCAACTAATAGCTCT 52 NM 181890 728-729 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 53 NM 181891 730-731 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA S4 NM 181890 728-729 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 55 NM 181889 732-733 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAACTAATAGCTCT 56 NM OO3340 734-735 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAACTAATAGCTCT 57 NM 181888 736-737 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 58 NM 181888 736-737 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAACTAATAGCTCT 59 NM 181891 730-731 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAACTAATAGCTCT 60 NM 181887 738-739 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 61 NM 181887 738-739 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAACTAATAGCTCT 62 NM 181886 740-741 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 63 NM 181886 740-741 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 64 NM OO3340 734-735 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAACTAATAGCTCT 65 NM 181889 732-733 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCTGATTTTTTCCCC 66 NM 145751 742-743 TNFreceptor-associated actor 4 GATCAGAAATGACTGTG 67 NM 018509 744-745 hypothetical protein PRO1855 GATCACTGAGAAAAAAT 68 NM 1524O7 746-747 GrpE-like 2, mitochondrial (E. coli) GATCCAAGAGTTTAGTG 69 NM OO6807 748-749 chromobox homolog 1 (HP1 beta homolog Drosophila) GATCTTTGCTGGCAAGC 70 NM OO2954 750-751 ribosomal protein S27a GATCCACACTGAGAGAG 71 NM 145864 752-753 kallikrein 3, (prostate specific antigen) GATCTGTATTATTAAAT 72 NM 032549 754-755 IMP2 inner mitochondrial membrane protease-like (S. cerevisiae) GATCTGTTTGTTCCCTG 73 NM 172199 756-757 adenylate kinase 2 GATCCCCTGCCTGGTGC 74 NM 001312 758-759 cysteine-rich protein 2 GATCAACTAATAGCTCT 75 NM 181893 760-761 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCAAATAAAGTTATA 76 NM 181893 760-761 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) GATCTTTTTCAAGTCTT 77 NM 012071 762-763 COMM domain containing 3 GATCATGTATGAGATAG 78 NM 012460 764-765 translocase of inner mitochondrial membrane 9 homolog (yeast) US 9,002,652 B1 59 60 TABLE 3-continued

DIFFERENTIALLY EXPRESS ED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCCTTCAGGCAGTAA 79 NM 176805 766-767 mitochondrial ribosomal protein S11 GATCITTTTITGGATAT 8O NM OO3336 768-769 ubiquitin-conjugating enzyme E2A (RAD6 homolog) GATCCCAGTCTCTGCCA 81 NM OOO581 770-771 glutathione peroxidase 1 GATCAAGACGAGCCTGC 82 NM 004864 772-773 growth differentiation factor 15 GATCCCAGCTGATGTAG 83 NM OO1885 774-775 crystallin, alpha B GATCATGAAG ACCTGCT 84 NM 003754 776-777 eukaryotic translation initiation factor 3, subunit 5 epsilon, 47 kDa GATCTCAAGGTTGATAG 85 NM OO3864 778-779 sin3-associated polypeptide, 30 kDa GATCACCAGGCTGCCCA 86 NM 148571 780-781 mitochondrial ribosomal protein L27 GATCAA AATGCTAAAAC 87 NM 178190 782-783 ATPase inhibitory factor 1 GATCAAGATGACACTGA 88 NM 004.483 784-785 glycine cleavage system protein H (aminomethyl carrier) GATCGGGAACTCCTGCT 89 NM OO5952 786-787 metallothionein 1X GATCTTGTCTTTAAAAC 90 NM O15646 788-789 RAP1B, member of RAS oncogene family GATCCACACACGTTGGT 91 NM OO3255 790-791 issue inhibitor of metalloproteinase 2 GATCATCAGTCACCGAA 92 NM OOOO77 792-793 cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4) GATCCAGTATTCACGTCA 93 NM 002166 794-795 inhibitor of DNA binding 2, dominant negative helix-loop-helix protein GATCCTTGCAGGGAGCT 94 NM O15343 796-797 dullard homolog (Xenopus laevis) GATCTCCTTGCCCCAGC 95 NM O15343 796-797 dullard homolog (Xenopus laevis) GATCGCCTAGTATGTTC 96 NM OO3897 798-799 immediate early response 3 GATCAGACTGTATTAAA 97 NM 032052 800-801 Zinc finger protein 278 GATCGGCCCTACTAGAT 98 NM 032052 800-801 Zinc finger protein 278 GATCTCCCACTGCGGGG 99 NM 032052 800-801 Zinc finger protein 278 GATCTGTGATGGTCAGC 2OO NM OOO232 802-8O3 sarcoglycan, beta (43 kDa dystrophin-associated glycoprotein) GATCACTGTGGTATCTA NM 052822 804-805 Secretory carrier membrane protein 1 GATCATCAGTCACCGAA NM 058197 806-807 cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4) GATCATTTGTTTATTAA NM 022334 808-809 integrin beta 1 binding protein 1 GATCAAATATGTAAAAT 204 NM 004.842 810-811 A kinase (PRKA) anchor protein 7 GATCTCTTGCTAGATTT 205 NM 134441 812-813 relaxin 2 GATCACCTTCGACGACT NM 198901 814-815 Sorcin GATCGGATTGATTAAAA NM 020353 816-817 phospholipid scramblase 4 GATCTAGTTGGGAGATA 208 NM 153367 818-819 chromosome 10 open reading frame 56 GATCTTTTTTGGCTACT 209 NM 018424 820-821 erythrocyte membrane protein band 4.1 like 4B GATCACATTTTCTGTTG 210 NM 201436 822-823 H2A histone family, member V GATCACCTGGGTTTCTT 211 NM O21999 824-825 integral membrane protein 2B GATCTATTAGATTCAAA 212 NM 021105 826-827 phospholipid scramblase 1 GATCTCTTATTTTACAA 213 NM 000546 828-829 tumor protein p53 (Li Fraumeni syndrome) GATCATAGAAGGCTATT 214 NM 181835 830-831 neurofibromin 2 (bilateral acoustic neuroma) US 9,002,652 B1 61 62 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCTTCCTGGACAGGA 215 NM 152992 832-833 POM (POM121 homolog, rat) and ZP3 usion GATCAAGGACCGGCCCA 216 NM 032391 834-835 Small nuclear protein PRAC GATCGCATTTTTGTAAA 217 NM 058171 836-837 inhibitor of growth amily, member 2 GATCCATCCTCATCTCC 218 NM O2O188 838-839 DC13 protein GATCGATGGTGGCGCTT 219 NM 138992 beta-site APP-cleaving enzyme 2 GATCTTATAAAAAGAAA 220 NM O17998 840-841 chromosome 9 open reading frame 40 GATCTGAACGATGCCGT 221 NM O24579 842-843 hypothetical protein FLU23221 GATCTCCCCGCCGCAGC 222 NM O15973 844-845 galanin GATCGTCGTCCAGGCCA 223 NM 032920 846-847 chromosome 21 open reading frame 124 GATCGTTGGGGAACCCC 224 NM 199483 848-849 chromosome 20 open reading frame 24 GATCCTATATGTCCTGT 225 NM 152344 8SO-851 hypothetical protein FLJ30656 GATCGATGGTTGACAAT 226 NM OO4552 852-853 NADH dehydrogenase (ubiquinone) Fe—S protein 5, 15 kDa (NADH-coenzyme Q reductase) GATCTTGTACTAACTTA 227 NM O19059 854-85S translocase of outer mitochondrial membrane 7 homolog (yeast) GATCCCGATGTTCTTAA 228 NM OO1806 856-857 CCAAT?enhancer binding protein (C/EBP), gamma GATCCTGTTTAACAAAG 229 NM O15469 858-8.59 nipSnap homolog 3A (C. elegans) GATCACGCACACACAAT 230 NM 198337 860-861 insulin induced gene 1 GATCCAGCCAGACTTGC 231 NM 144772 862-863 apolipoprotein A-I binding protein GATCCACACTGGAGAGA 232 NM OO3450 864-86.5 Zinc finger protein 174 GATCTCAGTTCTGCGTT 233 NM 004642 866-867 CDK2-associated protein GATCTACACCTCTTGCC 234 NM 052845 868-869 methylmalonic aciduria (cobalamin deficiency) type B GATCCAGCTGGAAAGCT 235 NM OO6406 870-871 peroxiredoxin 4 GATCCTTCAGGCAGTAA 236 NM 022839 872-873 mitochondrial ribosomal protein S11 GATCCACACTGAGAGAG 237 NM OO1648 874-875 kallikrein 3, (prostate specific antigen) GATCACCTTATGGATGT 238 NM OO3932 876-877 Suppression of tumorigenicity 13 (colon carcinoma) (Hsp70 interacting protein) GATCTAGTTATTTTAAT 239 NM 172178 878-879 mitochondria ribosomal protein L42 GATCATTGAGAATGCAG 240 NM 206966 880-881 similar to AVLV472 GATCATGCCAAGTGGTG 241 NM 058248 882-883 deoxyribonuclease II beta GATCACATTTTCTGTTG 242 NM 201516 884-885 H2A histone family, member V GATCAGAAAGAAACCTT 243 NM OO6744 886-887 retinol binding protein 4, plasma GATCCGTGGCAGGGCTG 244 NM 03.1901 888-889 mitochondrial ribosomal protein S21 GATCCGTGGCAGGGCTG 245 NM O18997 890-891 mitochondrial ribosomal protein S21 GATCTATCACCCAAACA 246 NM 198157 892-893 ubiquitin-conjugating enzyme E2L 3 GATCAAGCGTGCTTTCC 247 NM OOO995 894-895 ribosomal protein L34 GATCAAGCGTGCTTTCC 248 NM 033625 896-897 ribosomal protein L34 GATCCCTCATCCCTGAA 249 NM O14098 898-899 peroxiredoxin 3 GATCCACCTTGGCCTCC 250 NM 147187 900-901 tumor necrosis factor receptor Superfamily, member 10b GATCTTAGGGAGACAAA 251 NM 182529 902-903 TRAP domain containing 5 US 9,002,652 B1 63 64 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCAAGATACGGAAGA 252 NM 177924 904-905 N-acylsphingosine amidohydrolase (acid ceramidase) 1 GATCTGTTTGTTCCCTG 253 NM OO1625 906-907 adenylate kinase 2 GATCAGCAAAAGCCAAA 2S4 NM 201263 908-909 tryptophanyl tRNA synthetase 2 (mitochondrial) GATCGGGGGAGGGTAAA 255 NM 004544 910-911 NADH dehydrogenase (ubiquinone) 1 alpha Subcomplex, 10, 42 kDa GATCGTGGAGGAGGGAC 2S6 NM 016310 912-913 polymerase (RNA) III (DNA directed) polypeptide K, 12.3 kDa GA TCACTTTTGAAAGCA 257 NM O18465 914-915 chromosome 9 open reading frame 46 GA TCTGATTTGCTAGTT 258 NM O15147 916-91.7 KIAAOS82 GA TCCTAGGGGGTTTTG 259 NM O15147 916-91.7 KIAAOS82 GA TCTAAGTTGCCTACC 260 NM 014176 918-919 HSPC150 protein similar o ubiquitin-conjugating enzyme GA CCTTTGTTCTTGACC 261 NM 020531 920-921 chromosome 20 open reading frame 3 GA TCTCTTAGCCAGAGG 262 NM 153333 922-923 transcription elongation actor A (SII)-like 8 GA TCTCTCTCACCTACA 263 NM OO3287 924-925 tumor protein D52-like 1 GA TCAGAGGTGAAGGGA 264 NM 007021 926-927 chromosome 10 open reading frame 10 GA TCTCATTGATGTACA 26S NM 032947 928-929 putative small membrane protein NID67 GA TCTGTGCCGGCTTCC 266 NM OO5656 930-931 transmembrane protease, serine 2 GA TCCGTCTGTGCACAT 267 NM OO5656 930-931 transmembrane protease, serine 2 GA TCGGCTCTGGGAGAC 268 NM OO6315 932-933 ring finger protein 3 GA TCGATTAATGAAGTG 269 NM 016326 934-935 chemokine-like factor GA TCCTGGACTGGGTAC 270 NM OO6830 936-937 ubiquinol-cytochrome c reductase (6.4 kD) subunit GA TCTTGGAGAATGTGA 271 NM OO1216 938-939 carbonic anhydrase DX GA CCTTTTTTTGGATAT 272 NM 181762 940-941 ubiquitin-conjugating enzyme E2A (RAD6 hoinolog) GA TCTAGTTATTTTAAT 273 NM O14050 942-943 mitochondrial ribosomal protein L42 GA TCTAGTTATTTTAAT 274 NM 172177 944-94.5 mitochondrial ribosomal protein L42 GA TCAAGGGACGGCTGA 275 NM OOO978 946-947 ribosomal protein L23 TCAGAAGGCTCTGGT 276 NM 018442 948-949 IQ motif and WD repeats 1 GA TCAATGTTGAAGAAT 277 NM 018442 948-949 IQ motif and WD repeats 1 GA TCCTGCACTCTAACA 278 NM 203339 950-951 clusterin (complement ysis inhibitor, SP-40.40, Sulfated glycoprotein 2, estosterone-repressed prostate message 2, apolipoprotein J) GA TCTGATTATTTACTI 279 NM 004708 952-953 programmed cell death 5 GA TCCTTGAAGGCAGCT 28O NM 197958 954-955 acheron GA TCCCTTTTCTTACTA 281 NM 153713 956-957 hypothetical protein MGC46719 GA TCTGTCCACTTCTGG 282 NM 153713 956-957 hypothetical protein MGC46719 GA TCAGATACCACCAAG 283 NM 001001503 958-959 NADH dehydrogenase (ubiquinone) flavoprotein 3, 10 kDa GA TCCTTTGGATTAATC 284 NM O16138 960-961 coenzyme Q7 homolog, ubiquinone (yeast) GA TCATTATTTCTGTCT 285 NM 018184. 962-963 ADP-ribosylation factor ike 10C GA TCAGCCCTCAAAGAA 286 NM 018184. 962-963 ADP-ribosylation factor ike 10C GA TCAGCAAAAATAAAG 287 NM O16096 964-965 HSPCO38 protein GA TCTCAGCGGCATTAA 288 NM 052951 966-967 deoxynucleotidyltransferase, erminal, interacting protein 1 US 9,002,652 B1 65 66 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCCCTGGAGTGCCTT 289 NM OO3226 968-969 trefoil factor 3 (intestinal) GATCTGTTTCTACCAAT 290 NM 183045 970-971 ring finger protein (C3H2C3 type) 6 GATCCTGCTGTGAAAGG 291 NM 153750 972-973 chromosome 21 open reading frame 81 GATCTTGAAAGTGCCTG 292 NM 022130 974-975 golgi phosphoprotein 3 (coat-protein) GATCAATACAATAACAA 293 NM OO3479 976–977 protein tyrosine phosphatase type IVA, member 2 GATCTCCTATGAGA ACA 294 NM OO3479 976–977 protein tyrosine phosphatase type IVA, member 2 GATCAATACAATAACAA 295 NM 080391 978-979 protein tyrosine phosphatase type IVA, member 2 GATCTCCTATGAGA ACA 296 NM 080391 978-979 protein tyrosine phosphatase type IVA, member 2 GATCCAACCCTGTACTG 297 NM 177969 980-981 protein phosphatase 1B (formerly 2C), magnesium-dependent, beta isoform GATCTCTACCATTTAAT 298 NM 001017 982-983 ribosomal protein S13 GATCCAGAAATACTTAA 299 NM 005410 984-985 selenoprotein P, plasma, 1 GATCCAATGCTA AACTC 3OO NM 005410 984-985 selenoprotein P, plasma, 1 GATCAAATGAGAATAAA 301 NM 182620 986-987 family with sequence similarity 33, member A GATCCTTGCCACAAGAA 3O2 NM 004034 988-989 annexin A7 GATCAGACTGTATTAAA 303 NM 032051 990-991 Zinc finger protein 278 GATCTCCCACTGCGGGG 3O4 NM 032051 990-991 zinc finger protein 278 GATCGGCCCTACTAGAT 305 NM 032051 990-991 Zinc finger protein 278 GATCAAAAAGCAAGCAG 306 NM O15972 992-993 polymerase (RNA) I polypeptide D, 16 kDa GATCACTTCAGCTGCCT 307 NM O19007 994-995 armadillo repeat containing, X-linked 6 GATCACCGACTGAAAAT 3O8 NM 002165 996-997 inhibitor of DNA binding 1, dominant negative helix-loop-helix protein GATCAATGAAGTGAGAA 309 NM 003094 998-999 Small nuclear ribonucleoprotein polypeptide E GATCATCTCAGAAGTCT 310 NM 018683 000-1001 Zinc finger protein 313 GATCAGGAAGGACTTGT 311 NM 018683 000-1001 Zinc finger protein 313 GATCATTCCCATTTCAT 312 NM OO2583 002-1003 PRKC, apoptosis, WT1, regulator GATCGCTTTCTACACTG 313 NM OO6926 004-1005 surfactant, pulmonary associated protein A2 GATCAGTTAGCTTTTAT 314 NM O14335 OO6-1007 CREBBP, EP300 inhibitor 1 GATCAGTAGTTCA ACAG 315 NM 175061 008-1009 juxtaposed with another Zinc finger gene 1 GATCCGATAAGTTATTG 316 NM 004707 010-1011 APG12 autophagy 12 like (S. cerevisiae) GATCAGTGGGCACAGTT 317 NM OO6818 012-1013 ALL1-fused gene from chromosome 1 GATCAGTGCCAGAAGTC 318 NM 016303 014-1015 WW domain binding protein 5 GATCAGAGAAGTAAGTT 319 NM 004871 016-1017 golgi SNAP receptor complex member 1 GATCTCACTTTCCCCTT 32O NM O15373 018-1019 PKD2 interactor, golgi and endoplasmic reticulum associated 1 GATCAGGCAGTTCCTGG 321 NM 213720 020-1021 chromosome 22 open reading frame 16 GATCCTTGCCACAAGAA 322 NM OO1156 022-1023 annexin A7 GATCAAGAAAAATAAGG 323 NM OOO999 024-1025 ribosomal protein L38 GATCGATTTCTTTCCTC 324 NM 021102 026-1027 serine protease inhibitor, Kunitz type, 2 GATCATAGAAGGCTATT 325 NM 181826 028-1029 neurofibromin 2 (bilateral acoustic neuroma) US 9,002,652 B1 67 68 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCO DEPREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCCGGTGCGCCATGT 326 NM 002638 protease inhibitor 3, skin-derived (SKALP) GATCGCAGTTTGGAAAC 327 NM OO5461 O32-1033 v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian) GATCAATTTCAA ACCCT 328 O32-1033 v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian) GATCTCCTATGAGA ACA 329 NM 080392 O34-1035 protein tyrosine phosphatase type IVA, member 2 GATCAATACAATAACAA 330 NM 080392 O34-1035 protein tyrosine phosphatase type IVA, member 2 GATCCTACCACCTACTG 331 NM 018281 O36-1037 hypothetical protein FLJ10948 GATCATTTGTTTATTAA 332 NM OO4763 O38-1039 integrin beta 1 binding protein 1 GATCAA AATGCTAAAAC 333 NM 178191 O40-1041 ATPase inhibitory factor 1 GATCTGGGGTGGGAGTA 334 NM OO2773 O42-1043 protease, serine, 8 (prostasin) GATCATGCTTGTGTGAG 335 NM 018648 O44-1045 nucleolar protein family A, member 3 (H/ACA Small nucleolar RNPs) GATCAAATATGTAAAAT 336 NM 1386.33 O46-1047 A kinase (PRKA) anchor protein 7 GATCAGACTTCTCAGCT 337 NM OO6856 O48-1049 activating transcription factor 7 GATCATAGAAGGCTATT 338 NM 181827 OSO-1051 neurofibromin 2 (bilateral acoustic neuroma) GATCCACCTTGGCCTCC 339 NM OO3842 OS2-1053 tumor necrosis factor receptor Superfamily, member 10b GATCTCTGGCCCCTCAG 340 NM 198527 054-1OSS Similar to RIKEN cDNA 1110033O09 gene GATCCTCATTGAGCCAC 341 NM O24866 056-1057 adrenomedulin 2 GATCCAGTGGGGTCCGG 342 NM OO2475 OS8-1059 myosin light chain 1 slow a GATCATTTTGTATTAAT 343 NM 017544 O60-1061 NF-kappa B repressing factor GATCAGAAAAAGAAAGA 344 NM OOO982 O62-1063 ribosomal protein L21 GATCCTGTTCCTGTCAC 345 NM 203413 O64-1065 S-phase 2 protein GATCATGGTTCTCTTTG 346 NM OOO2O2 O66-1067 iduronate 2-sulfatase (Hunter syndrome) GATCCTCTGACCGCTGG 347 NM O22365 O68-1069 DnaJ (Hsp40) homolog, Subfamily C, member 1 GATCTGCTATTGCCAGC 348 NM 016399 O70-1071 hypothetical protein HSPC132 GATCCTGGAAATTGCAG 349 NM OO1233 O72-1073 caveolin 2 GATCAGTCTCAAGTGTC 350 NM 003702 O74-1075 regulator of G-protein signalling 20 GATCAGGTTAGCAAATG 351 NM 004331 O76-1077 BCL2 adenovirus E1B 19 kDa interacting protein 3-like GATCAGTATGCTGTTTT 352 NM OO4968 O78-1079 islet cell autoantigen 1, 69 kDa GATCTGGTTTCTAGCAA 353 NM 024096 O80-1081 XTP3-transactivated protein. A GATCTAATTAAATAAAT 3S4 NM OOO903 O82-1083 NAD(P)H dehydrogenase, quinone 1 GATCCTGGGTTTTTGTG 355 NM O17830 O84-1085 OCIA domain containing 1 GATCACCGACTGAAAAT 356 NM 181353 O86-1087 inhibitor of DNA binding 1, dominant negative helix-loop-helix protein GATCAGGTAACCAGAGC 357 NM OO2488 O88-1089 NADH dehydrogenase (ubiquinone) 1 alpha Subcomplex, 2, 8 kDa GATCAGTGAACACTAAC 358 NM 016645 O90-1091 mesenchymal stem cell protein DSC92 GATCTCAGATGCTAGAA 359 NM O16567 O92-1093 BRCA2 and CDKN1A. interacting protein US 9,002,652 B1 69 70 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCO DEPREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCGCTCTGCCCATGT 360 NM O16567 O92-1093 BRCA2 and CDKN1A interacting protein GATCAGCTCCGTGGGGC 361 NM 152398 O94-1095 OCIA domain containing 2 GATCATTGCCCAAAGTT 362 NM 152398 O94-1095 OCIA domain containing 2 GATCTGGCACTGTGGTT 363 NM OOO998 O96-1097 ribosomal protein L37a GATCTGGCACTGTGGGT 364 NM OOO998 O96-1097 ribosomal protein L37a GATCTCAGATGCTAGAA 365 NM O78468 O98-1099 BRCA2 and CDKN1A interacting protein GATCGCTCTGCCCATGT 366 NM O78468 O98-1099 BRCA2 and CDKN1A interacting protein GATCTGCTGTGGAATTG 367 NM 172316 OO-1101 Meis1, myeloid ecotropic viral integration site 1 homolog 2 (mouse) GATCGTTCTTGATTTTG 368 NM 032476 O2-1103 mitochondrial ribosomal protein S6 GATCTTGGTTTCATGTG 369 NM 032476 O2-1103 mitochondrial ribosomal protein S6 GATCATTCTTGATTTTG 370 NM 032476 O2-1103 mitochondrial ribosomal protein S6 GATCCATATGGAAAGAA 371 NM 014171 O4-1105 postsynaptic protein CRIPT GATCTGCCCCCACTGTC 372 NM 138929 06-1107 diablo homolog (Drosophila) GATCGCCTAGTATGTTC 373 NM 052815 O8-1109 immediate early response 3 GATCAATGCTAATATGA 374 NM OO5805 10-1111 proteasome (prosome, macropain) 26S Subunit, non-ATPase, 14 GATCAGCATCAGGCTGT 375 NM 012459 12-1113 translocase of inner mitochondrial membrane 8 homolog B (yeast) GATCTGGAAGTGAAACA 376 NM 134265 14-1115 WD repeat and SOCS box-containing 1 GATCCACGTGTGAGGGA 377 NM 182640 16-1117 mitochondrial ribosomal protein S9 GATCACAGAAAAATTAA 378 NM 182640 16-1117 mitochondrial ribosomal protein S9 GATCTCTCTGCGTTTGA 379 NM 012445 18-1119 spondin 2, extracellular matrix protein GATCTCAGAAGTTTTGA 380 NM 138459 20-1121 chromosome 6 open reading frame 68 GATCCGGACTTTTTAAA 381 NM OO6339 22-1123 high-mobility group 20B GATCATAGTTATTATAC 382 NM 032467 24-1125 aspartate beta hydroxylase GATCCTGCCCTGCTCTC 383 NM OO3145 26-1127 signal sequence receptor, beta (translocon associated protein beta) GATCGATTGAGAAGTTA 384 NM 012110 28-1129 cysteine-rich hydrophobic domain 2 GATCCAAGTACTCTCTC 385 NM 175081 30-11.31 purinergic receptor P2X, igand-gated ion channel, 5 GATCATACACCTGCTCA 386 NM 001009 32-1133 ribosomal protein S5 GATCCTGGATGCCACGA 387 NM 174889 34-1135 hypothetical protein LOC91.942 GATCCCTGCCACAAGTT 388 NM OO6923 36-1137 stromal cell-derived actor 2 GATCAGACGAGGCCATG 389 NM OO6107 38-1139 cisplatin resistance associated overexpressed protein GATCTTTCAGGAAAGAC 390 NM 033011 40-1141 plasminogen activator, issue GATCTTTTAAAAATATA 391 NM OO1914 42-1143 cytochrome b-5 GATCGTTTTGTTTTGTT 392 NM 021149 44-1145 coactosin-like 1 (Dictyostelium) GATCTATGGCCTCTGGT 393 NM 021643 46-1147 tribbles homolog 2 (Drosophila) GATCCTAAATCATTTTG 394 NM O22783 48-1149 DEP domain containing 6 GATCTAAGAAGAAACTA 395 NM OO5765 SO-1151 ATPase, H+ transporting, lysosomal accessory protein 2 GATCTTGGTGTTCAAAA 396 NM OO1497 52-1153 UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, polypeptide 1 US 9,002,652 B1 71 72 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCO DEPR EDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GA TCCCTCATCCCTGAA 397 NM OO6793 54-1155 peroxiredoxin 3 GA TCTGCAGTGCTTCAC 398 NM 178181 56-1157 CUB domain-containing protein GA TCTATGCCCTTGTTA 399 NM 0331.67 S8-1159 UDP-Galibeta GlcNAc

galactosyltransferase,

GA TCTATGCCCTTGTTA 400 NM 03.3169 60-1161 UDP-Galibeta GlcNAc

galactosyltransferase,

GA TCAGTTTATTATTGA NM 03.3169 60-1161 UDP-Galibeta GlcNAc

galactosyltransferase,

GA TCTATGCCCTTGTTA NM 0331.68 62-1163 UDP-Galibeta GlcNAc

galactosyltransferase,

GA TCAGTTTATTATTGA 403 NM 0331.67 S8-1159 UDP-Galibeta GlcNAc galactosyltransierase, GA TCTATGCCCTTGTTA 404 NM 003781 64-1165 UDP-Gal:betaGlcNAc

galactosyltransferase,

GA TCAGTTTATTATTGA 40S NM 003781 64-1165 UDP-Gal:betaGlcNAc

galactosyltransferase,

GA TCAGTTTATTATTGA NM 0331.68 62-1163 UDP-Gal:betaGlcNAc

galactosyltransferase, polypeptide 3 GA TCGAGTCAAGATGAG 407 NM 013442 66-1167 stomatin (EPB72)-like 2 GA TCACCATGATGCAGA 4.08 NM 03.1905 68-1169 SVH protein GA TCCCGTGTGTGTGTG 409 NM 03.1905 68-1169 SVH protein GA TCATGGITCTGITTG 410 NM OO6123 70-1171 iduronate 2-sulfatase (Hunter syndrome) GA TCCGCAGGCAGAAGC 411 NM 002775 72-1173 Protease, serine, 11 (IGF binding) GA TCGATGGTGGCGCTT 412 NM 138991 74-1175 beta-site APP-cleaving enzyme 2 GA TCTGCATCAGTTGTA 413 NM OO1814 76-1177 cathepsin C GA TCTCTACTACCACAA 414 NM OO1908 78-1179 cathepsin B GA TCTCTACTACCACAA 415 NM 147780 8O-1181 cathepsin B GA TCTCTACTACCACAA 416 NM 147781 82-1183 cathepsin B GA TCTCTACTACCACAA 417 NM 147782 84-1185 cathepsin B GA TCTCTACTACCACAA 418 NM 147783 86-1187 cathepsin B GA TCGATGGTGGCGCTT 419 NM 012105 88-1189 beta-site APP-cleaving enzyme 2 GA TCTTTCAGGAAAGAC 420 NM OOO931 90-1191. plasminogen activator, tissue GA CAAATTGCAA AATA 421 NM 153705 92-1193 KDEL (Lys-Asp-Glu Leu) containing 2 GA TCTTATTTTCTGAGA 422 NM 014584 94-1195 ERO1-like (S. cerevisiae) GA TCCACAAGGCCTGAG 423 NM 001185 96-1197 alpha-2-glycoprotein 1, Zinc GA TCTAGGCCTCATCTT 424 NM 016352 98-1199 carboxypeptidase A4 GA TCCCTTTGAAATTTT 425 NM OO1219 2OO-12O1 calumenin GA TCTACA ACATATAAA 426 NM 020648 2O2-12O3 twisted gastrulation homolog 1 (Drosophila) GA TCAGTTTTTTCACCT 427 NM OO1901 204-1205 connective tissue growth factor GA TCACAGTGTCAGAGA 428 NM OO7224 206-12O7 neurexophilin 4 GA TCGTTACTATGTGTC 429 NM OO4541 2O8-1209 NADH dehydrogenase (ubiquinone) 1 alpha Subcomplex, 1, 7.5 kDa GA TCATTGACCTCTGTG 430 NM OO6459 210-1211 SPFH domain family, member 1 GA TCTGAAGCCCAGGTT 431 NM O24514 212-1213 cytochrome P450, family 2, subfamily R, polypeptide 1 US 9,002,652 B1 73 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCO DEPREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCTGTTAAAAAAAAA 432 NM 147159 214-1215 opioid receptor, sigma 1 GATCTTTCAGGAAAGAC 433 NM OOO930 216-1217 plasminogen activator, tissue GATCATA AGACAATGGA 434 NM OO1657 218-1219 amphiregulin (Schwaninoma-derived growth factor) GATCAGTCTTTATTAAT 435 NM O13995 220-1221 lysosomal-associated membrane protein 2 GATCCAGGCTCACTGTG 436 NM OO5250 222-1223 forkhead box L1 GATCAAATAATGCGACG 437 NM 018064 224-1225 chromosome 6 open reading frame 166 GATCTTGGTTTTCCATG 438 NM OO3OOO 226-1227 Succinate dehydrogenase complex, Subunit B, iron Sulfur (Ip) GATCTGTTAGTCAAGTG 439 NM 005313 228-1229 glucose regulated protein, 58 kDa GATCATTTCTGGTAAAT 440 NM 005313 228-1229 glucose regulated protein, 58 kDa GATCAAAGCACTCTTCC 441 NM 005313 228-1229 glucose regulated protein, 58 kDa GATCATGCCAAGTGGTG 442 NM 021233 230-1231 deoxyribonuclease II beta GATCATCGCCTCCCTGG 443 NM OO6216 232-1233 serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 2 GATCACCAGGCTGCCCA 444 NM O16504 234-1235 mitochondrial ribosomal protein L27 GATCGGATGGGCAAGTC 445 NM 002178 236-1237 insulin-like growth factor binding protein 6 GATCTCAAGACCAAAGA 446 NM O3O810 238-1239 hioredoxin domain containing 5 GATCTCACATTGTGCCC 447 NM O14254 240-1241 transmembrane protein 5 GATCAGTCTTTATTAAT 448 NM 002294 242-1243 ysosomal-associated membrane protein 2 GATCAGAGA AGATGATA 449 NM OOO640 244-1245 interleukin 13 receptor, alpha 2 GATCAGGTAACCAGAGC 450 NM 000591 246-1247 CD14 antigen GATCATCAGTAAATTTG 451 NM 031284 248-1249 ADP-dependent glucokinase GATCAATAAAATGTGAT 452 NM 002658 2SO-1251 plasminogen activator, urokinase GATCCCTCGGGTTTTGT 453 NM OO6350 2S2-1253 follistatin GATCTTGCA ACTCCATT 454 NM OO6350 2S2-1253 follistatin GATCCAGCATGGAGGCC 455 NM 018664 254-1255 Jun dimerization protein p21SNFT GATCATTGTGAAGGCAG 456 NM OO1511 256-1257 chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity, alpha) GATCTGCCAGCAGTGTT 457 NM OO2004 1258-1259 farnesyl diphosphate synthase (farnesyl pyrophosphate synthetase, dimethylallyltranstransferase, geranyltranstransferase) GATCAGAGGTTACTAGG 458 NM OO6408 1260-1261 anterior gradient 2 homolog (Xenopus laevis) GATCCACAGGGGTGGTG 459 NM OOO602 1262-1263 serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1 GATCACAAGGGGGGGAT 460 NM O16588 1264-1265 neuritin 1 GATCTCTGTTTTGACTA 461 NM OO4109 1266-1267 ferredoxin 1 GATCTAACCTGGCTTGT 462 NM OO4109 1266-1267 ferredoxin 1 GATCAGCAAGTGTCCTT 463 NM OOO935 1268-1269 procollagen-lysine, 2-oxoglutarate 5-dioxygenase 2 US 9,002,652 B1 75 76 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCO DEPREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GA TCTAGTGGTTCACAC 464 NM OO3236 270 271 transforming growth factor, alpha GA CAAACAGTTTCTGG 465 NM O16139 272 273 coiled-coil-helix-coiled coil-helix domain containing 2 GA TCATCAAGAAAAAAG 466 NM 018464 274 275 chromosome 10 open reading frame 70 GA TCCCAGAGAGCAGCT 467 NMOO2421 276 277 matrix metalloproteinase 1 (interstitial collagenase) GA CCTTGTGTATTTTTG 468 NM 020440 278 279 prostaglandin F2 receptor negative regulator GA TCTATGTTCTCTCAG 469 NM 0133.63 281 procollagen C endopeptidase enhancer 2 TCAGCAAGTGTCCTT 470 NM 182943 282 283 procollagen-lysine, 2 oxoglutarate 5 dioxygenase 2 GA TCATGTGCTACTGGT 471 NM OO3172 284 285 Surfeit 1 GA TCTGTAAATAAAATC 472 NM 130781 286 287 RAB24, member RAS oncogene family GA TCAGGGCTGAGGGTA 473 NM OOO157 288 289 glucosidase, beta; acid (includes glucosylceramidase) GA TCCTCCTATGTTGTT 474 NM OO5551 290 291 kallikrein 2, prostatic GA TCAGAGATGCACCAC 475 NM OO2997 292 293 Syndecan 1 GA 476 NM 005570 294 295 ectin, mannose-binding, 1 GA TCACCATGAAAGAAG 477 NM OO3873 296 297 neuropilin 1 TCTGTTAAAAAAAAA 478 NM OO5866 298 299 opioid receptor, sigma 1 GA TCAATTCCCTTGAAT 479 NM 138322 300 301 proprotein convertase subtilisin?kexin type 6 GA TCCCAGACCAACCCT 480 NM O24642 303 UDP-N-acetyl-alpha-D- galactosamine:polypeptide N-acetylgalactosaminyltransferase 2 (GalNAc-T12) GA TCATCACAGTTTGAG 481 NM OO2425 305 matrix metalloproteinase O (stromelysin 2) GA TCGGAACAGCTCCTT 482 NM 178154 307 lucosyltransferase 8 (alpha (1,6) lucosyltransferase) TCGGAACAGCTCCTT 483 NM 178155 309 lucosyltransferase 8 alpha (1,6) lucosyltransferase) TCGGAACAGCTCCTT 484 NM 178156 311 lucosyltransferase 8 alpha (1,6) lucosyltransferase) GA TCTGTGGGCCCAGTC 485 NM 004077 312 313 citrate synthase GA CAACCTTAAAGGAA 486 NM 000143 314 315 umarate hydratase GA CCTTCTACTTGCCTG 487 NM OOO3O2 3.16 317 procollagen-lysine 1, 2-oxoglutarate 5-dioxygenase 1 TCACCAGCCATGTGC 488 NM 004390 318 319 cathepsin H TCACCGGAGGTCAGT 489 NM O16026 320 321 retinol dehydrogenase 11 (all-trans and 9-cis) TCTATTTTATGCATG 490 NM 020792 322 323 KIAA1363 protein TCTGTTAAAAAAAAA 491 NM 147157 324 325 opioid receptor, sigma 1 TCATTTTGGTTCGTG 492 NM O16417 326 327 chromosome 14 open reading frame 87 TCACTTGTGTACGAA 493 NM O24641 328 329 mannosidase, endo-alpha TCCCTCCACCCCCAT 494 NM OO1441 330 331 fatty acid amide hydrolase TCCAAAGTCATGTGT 495 NM 058172 332 333 anthrax toxin receptor 2 TCCATAAATATTTAT 496 NM 058172 332 333 anthrax toxin receptor 2 TCTGCCTGCATCCTG 497 NM OO3225 334 335 trefoil factor 1 (breast cancer, estrogen inducible sequence expressed in) GA TCCAGTGTCCATGGA 498 NM 007085 336 337 follistatin-like 1 GA TCAATTCCCTTGAAT 499 NM 138324 338 339 proprotein convertase Subtilisin?kexin type 6 GA TCCGTGTGCTTGGGC 500 NM 018143 340 341 kelch-like 11 (Drosophila) US 9,002,652 B1 77 78 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCO DEPREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCCAGGGTCCCCCAG 5O1 NM 004911 342-1343 protein disulfide isomerase related protein (calcium-binding protein, intestinal-related) GATCATGGGACCCTCTC 502 NM OO3032 344-1345 sialyltransferase 1 (beta galactoside alpha-2,6- sialyltransferase) GATCATGGGACCCTCTC 503 NM 173216 346-1347 sialyltransferase 1 (beta galactoside alpha-2,6- sialyltransferase) GATCTCACTGTTATTAT SO4 NM 007115 348-1349 tumor necrosis factor, alpha-induced protein 6 GATCCTGTATCCAAATC 505 NM 007115 348-1349 tumor necrosis factor, alpha-induced protein 6 GATCAGTTTTCTCTTAA SO6 NM O24769 350-1351 adipocyte-specific adhesion molecule GATCTACCAGATAACCT 507 NM 000522 352-1353 homeo box A13 GATCCTAGTAATTGCCT SO8 NM 054034 354-13SS fibronectin 1 GATCAATGCA ACGACGT 509 NM OO6833 356-1357 COP9 constitutive photomorphogenic homolog subunit 6 (Arabidopsis) GATCAATTCCCTTGAAT 510 NM 138325 358-1359 proprotein convertase Subtilisin?kexin type 6 GATCAATTCCCTTGAAT 511 NM 138323 360-1361 proprotein convertase Subtilisin?kexin type 6 GATCCCAGAGGGATGCA 512 NM 024040 362-1363 CUE domain containing 2 GATCATCAAAAATGCTA 513 NM O17898 364-1365 hypothetical protein FLU20605 GATCCCTCGGGTTTTGT S1.4 NM 013409 366-1367 ollistatin GATCTTGCA ACTCCATT 515 NM 013409 366-1367 ollistatin GATCTTGTTAATGCATT S16 NM OO1873 368-1369 carboxypeptidase E GATCAAAGGTTTAAAGT 517 NM OO1627 370-1371 activated leukocyte cell adhesion molecule GATCACCAAGATGCTTC 518 NM 018371 372-1373 chondroitin beta1,4N acetylgalactosaminyltransferase GATCAAATGTGCCTTAA 519 NM O14918 374-1375 carbohydrate (chondroitin) synthase 1 GATCTTCGGCCTCATTC 520 NM O17860 376-1377 hypothetical protein FLJ20519 GATCCCTTCTGCCCTGG 521 NM O22367 378-1379 Sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4A GATCCAACCGACTGAAT 522 NM OO6670 1380-1381 trophoblast glycoprotein GATCTCTGCAGATGCCA 523 NM OO4750 1382-1383 cytokine receptor-like factor 1 GATCACAAAATGTTGCC 524 NM 001077 1384-13852 UDP glycosyltransferase family, polypeptide B17 GATCTCTCTTTCTCTCT 525 NM 031882 386-1387 protocadherin alpha Subfamily C, 1 GATCTCTCTTTCTCTCT 526 NM 031860 388-1389 protocadherin alpha 10 GATCTCTCTTTCTCTCT 527 NM O18906 390-1391 protocadherin alpha 3 GATCTCTCTTTCTCTCT 528 NM 031411 392-1393 protocadherin alpha 1 GATCACAGGCGTGAGCT 529 NM 032620 394-1395 GTP binding protein 3 (mitochondrial) GATCA ACATCTTTTCTT 530 NM 004343 396-1397 calreticulin GATCTCTGATTTAACCG 531 NM 002185 398-1399 interleukin 7 receptor GATCTCTCTTTCTCTCT 532 NM 031497 400-1401 protocadherin alpha 3 GATCCATTTTTAATGGT 533 NM 198278 402-1403 hypothetical protein LOC255743 GATCTTTTCTAAATGTT 534 NM OO5699 404-1405 interleukin 18 binding protein GATCTCTCTTTCTCTCT 535 NM 031410 406-1407 protocadherin alpha 1 GATCGGTGCGTTCTCCT 536 NM OO5561 408-1409 lysosomal-associated membrane protein 1 GATCTTTTCTAAATGTT 537 NM 173042 410-1411 interleukin 18 binding protein GATCTTTTCTAAATGTT 538 NM 173043 412-1413 interleukin 18 binding protein US 9,002,652 B1 79 80 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCO DEPREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GA CCTCTCTTTCTCTCT NM 031496 414-1415 protocadherin alpha 2 GA TCCTGTTGGATGTGA NM 080927 416-1417 discoidin, CUB and LCCL domain containing 2 GA CCTCTCTTTCTCTCT S41 NM 031864 418-1419 protocadherin alpha 12 GA CCTCTCTTTCTCTCT S42 NM 031849 420-1421 protocadherin alpha 6 GA TCCTGTGCTTCTGCA 543 NM OO6464 422-1423 trans-golgi network protein 2 GA CCTCTCTTTCTCTCT 544 NM 031865 424-1425 protocadherin alpha 13 GA TCTGATGAAGTATAT 545 NM O22746 426-1427 hypothetical protein GA TCACTTGTCTTGTGG S46 NM OO6988 428-1429 FL22390 a disintegrin-like and metalloprotease (reprolysin type) with hrombospondin type 1 motif, 1 GA CCTTTTCTAAATGTT 547 NM 173044 430 431 interleukin 18 binding protein GA CCTCTCTTTCTCTCT S48 NM 031856 432 433 protocadherin alpha 8 GA CCTCTCTTTCTCTCT S49 NM 031500 434 435 protocadherin alpha 4 GA TCAGCACTGCCAGTG 550 NM O16592 436 437 GNAS complex locus GA TCCGGAAAGATGAAT 551 NM 144640 438 439 interleukin 17 receptor E GA CCTCTCTTTCTCTCT 552 NM 031501 440 441 protocadherin alpha 5 GA CCTCTCTTTCTCTCT 553 NM 031495 442 443 protocadherin alpha 2 GA TCTAATGTAAAATCC 554 NM 002354 444 445 tumor-associated calcium signal transducer 1 GA CCTTCTTTTGTAATG 555 NM 032780 446 447 transmembrane protein 25 GA TCAATAATAATGAGG 556 NM 001001390 448 449 CD44 antigen (homing unction and Indian blood group system) GA TCA ACAGTGGCAATG 557 NM 001001390 448 449 CD44 antigen (homing unction and Indian blood group system) GA TCA ACAGTGGCAATG 558 NM 001001391 450 451 CD44 antigen (homing unction and Indian blood group system) GA TCAATAATAATGAGG 559 NM 001001391 450 451 CD44 antigen (homing unction and Indian blood group system) GA TCATTGCTCCTTCTC S60 NM 004872 452 453 chromosome 1 open reading frame 8 GA TCTCTGCATTTTATA 561 NM O2O198 454 455 GK001 protein GA TCTATGAAATCTGTG S62 NM O2O198 454 455 GK001 protein GA CCTCTCTTTCTCTCT 563 NM O18901 456 457 protocadherin alpha 10 GA TCACTGGAGCTGTGG S64 NM 002116 458 459 major histocompatibility complex, class I, A GA TCATCCAGTTTGCTT 565 NM OO4540 460 461 neural cell adhesion molecule 2 GA CAAAATTGTTACCC 566 NM OO4540 460 461 neural cell adhesion molecule 2 GA TCA ACAGTGGCAATG 567 NM OO1 OO1389 462 463 CD44 antigen (homing unction and Indian blood group system) GA TCAATAATAATGAGG 568 NM OO1 OO1389 462 463 CD44 antigen (homing unction and Indian blood group system) GA TCA ACAGTGGCAATG 569 NM OOO610 464 465 CD44 antigen (homing unction and Indian blood group system) GA TCAATAATAATGAGG 570 NM OOO610 464 465 CD44 antigen (homing unction and Indian blood group system) GA TCCATACTGTTTGGA 571 NM OO1792 466 467 cadherin 2, type 1, N cadherin (neuronal) GA TCTGCATTTTCAGAA 572 NM O15544 468 469 DKFZPS64K1964 protein GA TCCCATTTTTTGGTA 573 NM 000574 470 471 decay accelerating factor or complement (CD55, Cromer blood group system) GA TCTGCAGTGCTTCAC 574 NM 022842 472 473 CUB domain-containing protein 1 US 9,002,652 B1 81 82 TABLE 3-continued

DIFFERENTIALLY EXPRESSED GENES THAT ENCODE PREDICTED SECRETED PROTEINS. SEQ Accession SEQ ID Signature ID NO: Number NOS: Description GATCTGTTAAAAAAAAA 575 NM 147160 474-1475 opioid receptor, sigma 1 GATCATAGGTCTGGACA 576 NM 014045 476-1477 low density lipoprotein receptor-related protein 10 GATCTAATACTACTGTC 577 NM 001110 478-1479 a disintegrin and metalloproteinase domain 10 GATCTCTTGAGGCTGGG 578 NM 016371 480-1481 hydroxysteroid (17-beta) dehydrogenase 7 GATCGTTCATTGCCTTT 579 NM OO1746 482–1483 calnexin GATCTCTCTTTCTCTCT S8O NM O18900 484-1485 protocadherin alpha 1 GATCTGACCTGGTGAGA 581 NM 004393 486-1487 dystroglycan 1 (dystrophin-associated glycoprotein 1) GATCATCTTTCCTGTTC 582 NM 002117 488-1489 major histocompatibility complex, class I, C GATCGTAAAATTTTAAG 583 NM OO3816 490-1491 a disintegrin and metalloproteinase domain 9 (meltrin gamma) GATCTCTCTTTCTCTCT S84 NM O18904 492-1493 protocadherin alpha 13 GATCTCTCTTTCTCTCT 585 NM O18911 494-1495 protocadherin alpha 8 GATCTCTCTTTCTCTCT S86 NM O18905 496-1497 protocadherin alpha 2 GATCTCTCTTTCTCTCT 587 NM O18903 498-1499 protocadherin alpha 12 GATCTCTCTTTCTCTCT 588 NM O18907 500-1501 protocadherin alpha 4 GATCTCTCTTTCTCTCT 589 NM O18908 502-1503 protocadherin alpha 5 GATCCGGAAAGATGAAT 590 NM 153480 504-1505 interleukin 17 receptor E GATCCGGAAAGATGAAT 591 NM 153483 506-1507 interleukin 17 receptor E GATCTCTGTAATTTTAT 592 NM O21923 508-1509 fibroblast growth factor receptor-like 1 GATCTA AGAGATTAATA 593 NM 004362 510-1511 calmegin

Example 2 cells compared with LNCaP cells. Thus these proteins can be 35 used in blood diagnostics to follow prostate cancer progres Sion. Identification of Secreted Proteins by Computational Analysis of MPSS Signature Sequences Example 3

Secreted proteins can readily be exploited for blood cancer 40 Prostate Cancer Diagnostics Using Multiparameter diagnosis and prognosis. As such, the differentially expressed Analysis genes identified in Example 1 were further analyzed to deter mine how many of the differentially expressed genes encode This example describes a multiparameter diagnostic fin secreted proteins. Proteins with signal peptides (classical gerprint using the WDR19 prostate-specific secreted protein secretory proteins) were predicted using the same criteria 45 in combination with PSA. The WDR19 prostate-specific pro described by Chenet al., Mamm Genome, 14: 859-865, 2003, tein is diagnostically Superior to PSA when used alone and with the SignalP 3.0 server developed by The Center for further improved prostate cancer detection when used incom Biological Sequence Analysis, Lyngby, Denmark (http colon bination with PSA. double slash www dot cbs dot ditu dot dk slash services slash WDR19 was previously identified as relatively tissue-spe 50 cific by cDNA array studies and Northern blot analysis (see SignalP-3.0; see also, J. D. Bendtsen, et al., J. Mol. Biol., e.g., U.S. Patent Application Publication No. 20020150893). 340:783-795, 2004.) and the TMHMM2.0 server (see for This protein was selected, expressed as protein, purified and example A. Krogh, et al., Journal of Molecular Biology, 305 antibodies were made against it, all using standard techniques (3):567-580, January 2001; E. L. L. Sonnhammer, et al. In J. known in the art (the cDNA encoding the WDR19 protein is Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and 55 provided in SEQID NO:1, the amino acid sequence is pro C. Sensen, editors, Proceedings of the Sixth International vided in SEQID NO:2). The WDR19-specific antibody was Conference on Intelligent Systems for Molecular Biology, shown to be an excellent tissue-specific marker of prostate pages 175-182, Menlo Park, Calif., 1998. AAAI Press). Puta cancer with staining of the specific epithelial cells being tively nonclassical secretory secreted proteins (without signal directly proportional to the progression of the cancer. In this peptides) were predicted based on the SecretomeP 1.0 server, 60 regard it is very different from the well-established PSA (httpcolon double slash www dot cbs dot dtu dot dkslash marker which is not a good prostate tissue cancer marker. services slash SecretomeP-1.0 slash) and required an odds The WDR19 antibodies and those for the well-established ratio score >3. PSA prostate cancer blood marker were used to analyze 10 Five hundred and twenty one signatures belonging to 460 blood samples from normal individuals, 10 blood samples genes potentially encoding secreted proteins (Table 3) were 65 from early prostate cancer patients and 10 blood samples identified. Among these, 287 (259 genes) and 234 (201 genes) from late prostate cancer patients. The results showed that signatures were overexpressed or underexpressed in CL1 WDR19 reacted against no normals, against 5/10 early can US 9,002,652 B1 83 84 cers, and against 5/10 late cancers, whereas PSA reacted TABLE 4-continued against no normals, no early cancers and 7/10 late cancers. The two markers together detected all the late cancers. Thus PROSTATE-SPECIFIC the multiparameter analysis of blood markers (e.g. the analy PROTEINSPOTENTIALLY SECRETED INTO BLOOD ses of multiple markers) for prostate cancer was far more SEQ powerful than using each marker alone. Accession ID Accordingly, the results show a molecular blood finger No. NO: Annotations. Description print that comprises the WDR19 and PSA proteins. This N P 817089 518 cadherin-like 26 isoforma; cadherin-like protein fingerprint allows Superior diagnostic power to PSA alone VR20 Homo sapiens 10 N P 068582 519 cadherin-like 26 isoform b. cadherin-like protein and further improves prostate cancer detection. VR20 Homo sapiens WDR19 was also shown to be an effective histochemical N P OO1864 520 carboxypeptidase E precursor Homo sapiens marker for prostate cancer. Two hundred and seventy-five N P OO4807 521 chromosome 9 open reading frame 61; Friedreich ataxia region gene X123 Homo sapiens tissue cores that contain both stromal and epithelial cells from N P OO1271 522 cold inducible RNA binding protein; Cold-inducible cancer patients, 17 from benign prostatic hyperplasia (BPH) RNA-binding protein; cold inducible RNA-binding and 12 from normal individuals were examined. The mean 15 protein; glycine-rich RNA binding protein Homo WDR19 protein staining intensities were 2.52 standard error Sapiens P O08977 523 elastin microfibril interfacer 1: TNF2 elastin (S.E.), 0.05: 95% confidence interval (CI), 2.41-2.61 for microfibril interface located protein; elastin prostate cancer; 1.03 BPH (S.E. 0.03: 95% CI, 0.96-1.09); microfibril interface located protein Homo and 1.0 (S.E., 0, 95% CI 1.0-1.0) for normal individuals. Sapiens Pair-wise comparisons (using independent t-test) demon P OO4104 524 fibroblast growth factor 12 isoform 2: fibroblast growth factor 12B: fibroblast growth factor strated that WDR19 staining intensity is significantly differ homologous factor 1; myocyte-activating factor; ent between prostate cancer and BPH (mean difference 1.49; fibroblast growth factor FGF-12b Homo sapiens P<0.0001) and between prostate cancers and normal (mean P OO5962 525 FXYD domain containing ion transport regulator 3 isoform 1 precursor; phospholemman-like protein; difference 1.52; P-0.0001). These data suggested that 25 FXYD domain-containing ion transport regulator WDR19, in addition to being a prostate-specific blood biom 3 Homo sapiens arker, is a quantitative cancer-specific marker for prostate P 068710 526 FXYD domain containing ion transport regulator 3 tissues. isoform 2 precursor; phospholemman-like protein; FXYD domain-containing ion transport regulator Example 4 3 Homo sapiens 30 P OO6352 527 homeo box B13; homeobox protein HOX-B13 Homo sapiens Identification of Organ-Specific Secreted Proteins P 002139 528 homeo box D10; homeobox protein Hox-D10; Using MPSS and Computational Analysis homeo box 4D: Hox-4 N P 000513 529 homeobox protein A13; homeobox protein MPSS as described in Example 1 and in the detailed HOXA13; homeo box 1J; transcription factor 35 HOXA13 Homo sapiens description, was used to identify more than 2 million tran P 060819 530 hypothetical protein FLJ11175 Homo sapiens scripts from each of the prostate cell lines (see Example 1) P 078985 531 hypothetical protein FLJ14146 Homo sapiens P 061894 532 hypothetical protein FLJ20010 Homo sapiens and in normal prostate tissue. The MPSS signature sequences P 115617 533 hypothetical protein FLJ23544: QM gene; DNA from normal prostate were compared against 29 other tissues segment on chromosome X (unique) 648 each with about 1 million or more mRNA transcripts. This expressed sequence; 60S ribosomal protein L10; comparison revealed that about 300 of these transcripts are 40 tumor suppressor QM; Wilms tumor-related protein; laminin receptor homolog Homo sapiens organ-specific and about 60 of these organ-specific tran P 057582 534 hypothetical protein HSPC242 Homo sapiens scripts are potentially secreted into the blood. (See Table 4). P 116285 535 hypothetical protein MGC14388 Homo sapiens P 116293 536 hypothetical protein MGC14433 Homo sapiens TABLE 4 P 077020 537 hypothetical protein MGC4309 Homo sapiens 45 P 061074 538 hypothetical protein PRO1741 Homo sapiens PROSTATE-SPECIFIC P 563614 539 hypothetical protein similar to KIAA0187 gene PROTEINSPOTENTIALLY SECRETED INTO BLOOD product Homo sapiens P 95.1038 -mfa domain-containing protein isoform p40 SEQ Homo sapiens Accession ID P OO5542 S41 kallikrein 2, prostatic isoform 1: glandular No. NO: Annotations. Description 50 kallikrein 2 Homo sapiens P OO4908 kallikrein 4 preproprotein; protease, serine, 17; NP 001176 1512 alpha-2-glycoprotein 1, zinc: Alpha-2-glycoprotein, enamel matrix serine protease 1: kallikrein-like Zinc Homo sapiens protein 1; protase; androgen-regulated message 1 NP OO1719 1513 basigin isoform 1: OK blood group; collagenase Homo sapiens stimulatory factor; M6 antigen; extracellular P 002328 543 ow density lipoprotein receptor-related protein matrix metalloproteinase inducer Homo sapiens 55 associated protein 1; lipoprotein receptor NP 94O991 1514 basigin isoform 2: OK blood group; collagenase associated protein; alpha-2-MRAP:alpha-2- stimulatory factor; M6 antigen; extracellular macroglobulin receptor-associated protein 1: matrix metalloproteinase inducer Homo sapiens ow density lipoprotein-related protein-associated NP OO4039 1515 beta-2-microglobulin precursor Homo sapiens protein 1: low density li 1516 NP 002434 beta-microSeminoprotein isoform a precursor: 1544 ow density lipoprotein receptor-related protein Seminal plasma beta-inhibin; prostate Secreted NP 859077 Seminal plasma protein; immunoglobulin binding 60 binding protein Homo sapiens actor; prostatic secretory protein 94 Homo NP OOO897 1545 natriuretic peptide receptor Aguanylate cyclase A Sapiens (atrionatriuretic peptide receptor A); Natriuretic NP 619540 1517 beta-microSeminoprotein isoform b precursor; peptide receptor Aguanylate cyclase A Homo Seminal plasma beta-inhibin; prostate Secreted Sapiens Seminal plasma protein; immunoglobulin binding NP 085048 1546 Nedd4 family interacting protein 1: Nedd4WW actor; prostatic secretory protein 94 Homo 65 domain-binding protein 5 Homo sapiens Sapiens NP OOO896 1547 neuropeptide Y Homo sapiens US 9,002,652 B1 85 86 TABLE 4-continued 6,000 genes were identified that were significantly changed between the localized prostate cancer and the metastasized PROSTATE-SPECIFIC cancer and again, many of the changed genes encoded PROTEINSPOTENTIALLY SECRETED INTO BLOOD secreted proteins that can be part of the blood fingerprints SEQ indicative of the more advanced disease status of metastases. Accession ID The metastases-altered blood fingerprints may indicate the No. NO: Annotations/Description site of metastases. NP 039227 548 olfactory receptor, family 10, subfamily H, member These experiments demonstrate that there are continuous 2 Homo sapiens changes in the two types of networks as prostate cancer NP 000599 549 orosomucoid 2: alpha-1-acid glycoprotein, type 2 10 progresses—from localized to androgen independence to Homo sapiens NP 002643 550 prolactin-induced protein; prolactin-inducible metastases. These graded network transitions Suggest that protein Homo sapiens one will be able to detect the very earliest stages of prostate NP 057674 551 prostate androgen-regulated transcript 1 protein; cancer and, accordingly, that the organ-specific, molecular prostate-specific and androgen-regulated cDNA blood fingerprints approach described herein will also permit 14D7 protein Homo sapiens NP OO1639 552 prostate specific antigen isoform 1 preproprotein; 15 a very early diagnosis of prostate and other types of cancers. gamma-seminoprotein; semenogelase; seminin; P-30 antigen Homo sapiens Example 6 NP 665863 553 prostate specific antigen isoform 2: gamma Seminoprotein; semenogelase; seminin; P-30 antigen Homo sapiens MPSS Analysis in a Yeast Model System NP 001090 554 prostatic acid phosphatase precursor Homo sapiens NP 001000 555 ribosomal protein S5; 40S ribosomal protein S5 This experiment demonstrates perturbation-specific fin Homo sapiens gerprints of patterns of gene expression for nuclear, cytoplas NP OO5658 556 ring finger protein 103; Zinc finger protein expressed in cerebellum; zinc finger protein 103 mic, membrane-bound and secreted proteins in the yeast homolog (mouse) Homo sapiens metabolic system that converts the Sugar galactose into glu NP 93.7761 557 ring finger protein 138 isoform 2 Homo sapiens 25 cose-6-phosphate (the gal system). NP OO2998 558 semenogelin Iisoform a preproprotein Homo The gal systems includes 9 proteins. In the course of study Sapiens NP 937782 559 semenogelin Iisoform b preproprotein Homo ing how this systems works, 9 new strains of yeast were Sapiens created, each with a different one of the 9 relevant genes XP 353669 560 similar to HIC protein isoform p32 Homo sapiens destroyed (gene knockouts). Yeast is a single celled eukaryote NP 003855 561 sin3 associated polypeptide p30 Homo sapiens 30 organism with about 6,000 genes. The expression patterns of NP 036581 562 six transmembrane epithelial antigen of the prostate; six transmembrane epithelial antigen of each of the 6,000 genes was studied in the wildtype yeast and the prostate (NOTE: non-standard symbol and each of the 9 knockout strains. The data from these experi name) Homo sapiens ments showed: 1) the wild type and each of knock out strains NP OO8868 563 SMT3 Suppressor of miftwo 3 homolog 2: SMT3 exhibited Statistically significant changes in patterns of gene (Suppressor of miftwo 3 yeast) homolog 2 Homo Sapiens 35 expression from the wild type strain ranging from 89 to 465 NP 066568 564 solute carrier family 15 (H+ peptide transporter), altered patterns of gene expression; 2) each of these patterns member 2 Homo sapiens of changed gene expression were unique; and 3) on average NP 055394 565 solute carrier family 39 (zinc transporter), member 2 about 15% of the genes with changed expression patterns Homo sapiens NP O03209 566 telomeric repeat binding factor 1 isoform 2: encoded proteins that were potentially secreted (as deter Telomeric repeat binding factor 1; telomeric repeat 40 mined by computational analysis from the sequence of the binding protein 1 Homo sapiens gene). These genes are as follows: (listed by gene name as NP 110437 567 thioredoxin domain containing 5 isoform 1: available through the public yeast genome database at http:// thioredoxin related protein; endothelial protein disulphide isomerase Homo sapiens www.yeastgenome.org/. The genomic DNA, cDNA and NP OO4863 568 thymic dendritic cell-derived factor 1; liver amino acid sequences corresponding to each of the listed membrane-bound protein Homo sapiens 45 genes are publicly available, for example, through the yeast NP 665694 569 TNF receptor-associated factor 4 isoform 2: genome database.) YGL102C, YGL069C, YLL044W, tumor necrosis receptor-associated factor 4A; malignant 62; cysteine-rich domain associated YMR321C, YKL153W, YMR195W, YHL015 W, YNL096C, with ring and TRAF domain Homo sapiens YGR030C, YDR123C, YKL186C, YOR234C, YKL001C, NP 005647 570 transmembrane protease, serine 2: epitheliasin YJL188C, YDL023C, YPL143W, YEL039C, YKL006W, Homo sapiens 50 YGR280C, YBR285W, YKRO91W, YDR064W, YBR047W, NP OO8931 571 uroplakin 1A Homo sapiens NP 036609 572 WW domain binding protein 1 Homo sapiens YGR243W, YOR309C, YDR461 W, YHR053C, YHR055C, NP OO9062 573 zinc finger protein 75 Homo sapiens YGR148C, YGL187C, YIL018W, YFR003C, YPL107W, YBR185C, YNR014W, YJL067W, YDR451C, YGL031C, YHR141C, YNL162W, YBR046C, YNL036W, YDL136W, 55 YDL191W, YLR257W, YNL057W, YGL068W, YKR057W, Example 5 YLR201C, YHL001 W, YDRO10C, YPL138C, YOR312C, YPL276W, YML114C, YLR327C, YBR191W, YOR257W, Comparison of Localized Prostate Cancer and YOR096W, YPL223C, YJL136C, YAL044C, YER079W, Prostate Cancer Metastases in the Liver YMR107W, YPL079W, YDR175C, YGR035C, YDR153C, 60 YDR337W, YOR167C, YMR194W, YOR194C, YHR090C, In an additional experiment, the transcriptome from nor YGR110W, YMR242C, YHR198C, YPL177C, YLR164W, mal prostate tissue was compared to the transcriptome of each YMR143W, YDL083C, YLR325C, YOR203W, YMR193W, of the LNCaP and CL-1 prostate cancer cell lines. The com YLR062C, YOR383C, YLR300W, YJL079C, YJL158C, parison showed that the transcriptomes were distinct for the YHR139C, YGL032C, YER150W, YNL160W, YDR382W, normal tissue, the early prostate cancer and the late prostate 65 YMR305C, YKL096W, YKRO13W, YCL043C, YLR042C, cancer. An additional comparison was carried out between YDR055W, YPL163C, YEL040W, YJL171C, YLR121C, localized prostate cancer and metastases in the liver. About YDR382W, YLR250W, YGR189C, YJL159W, YMR215W, US 9,002,652 B1 87 88 YDR519W, YIL162W, YKL163W, YDR518W, YDR534C, organs. In this example, prostate enriched/specific expression YPR157W, YML130C, YML128C, YBR092C, YDRO32C, was analyzed by comparing the expression level (tpim counts) YLR120C, YBR093C, YHR215W, YAR071W, YDL130W, of MPSS signature sequences identified from normal prostate YDR144C, YPR123C, YGR174C, YOR327C, YNL058C, tissue to their corresponding expression levels in 33 normal YGR265W, YGR160W, YIL117C, YOLO53W, YGR236C, tissues. A particular gene that demonstrated at least a 2.5-fold YGR06OW, YKL120W,YDL046W, YHR132C, YMR058W, increase in expression in prostate as compared to all tissues YLR332W, YKR061 W, YEL001C, YKL154W, YKL073W, examined (each tissue evaluated individually) was consid YMR238W, YJR020W, YIL136W, YHL028W, YDL010W, ered to be prostate-specific/enriched. The tissues examined YLR339C, YNL217W, YHR063C. were adrenal gland, bladder, bone marrow, brain (amygdala, The different knockout strains can be thought of as analo 10 gous to genetic disease mutants. Accordingly, these data fur caudate nucleus, cerebellum, corpus callosum, hypothala ther Support the notion that each disease has a unique expres mus, and thalamus), whole fetal brain, heart, kidney, liver sion fingerprint and that each disease generates unique (new cloning), lung, mammary gland, monocytes, peripheral collections of secreted proteins that constitute molecular fin blood lymphocytes, pituitary gland, placenta, pancreas, pros gerprints capable of identifying the corresponding disease. 15 tate, retina, spinal cord, salivary gland, Small intestine, stom ach, spleen, testis, thymus, trachea, thyroid, and uterus. This Example 7 analysis identified 109 unique genes (with mpSS signature sequence belonging to class 1-4, i.e. with confirmed match to Identification of Prostate-Specific/Enriched Genes cDNAs) whose expression was at least 2.5 fold that observed Using a 2.5 Fold Over-Expression Cut-Off in other normal tissues. The list of prostate-specific/enriched genes is provided in Tables 5A-5D with the expression level Organ specific/enriched expression can be determined by in tpm in prostate shown. This list includes KLK2, KLK3, the ratio of the expression (e.g., measured in transcripts per KLK4, TMPRSS2, which are genes previously shown to be million (tpm)) in a particular organ as compared to other prostate-specific. TABLE 5A

PROSTATEENRICHED GENESIDENTIFIED BYRATIO. SCHEMA (RATIO >2.S)* MPSS Sig. SEQID Genbank Genbank SEQID Tissue Names MPSS Signature NO: Name Accession No. NOS: Description GATCTCAGAACAACCTT 688 DHRS7 BCOOO637 1797-1798 Dehydrogenase/reductase (SDR amily) member 7 GATCCAGCCCAGAGACA 689 NPY BCO29497 1799-1800 Neuropeptide Y GATCACTCCTTATTTGC 690 FLJ2O010 AW172826 1801 Hypothetical protein FLJ20010 GATCCCTCTCCTCTCTG 691 C9orf61 BIT 71919 1802 Chromosome 9 open reading rame 61 GATCTGACTTTTTACTT 692 Lrp2bp BU853.306 1803 Ankyrin repeat domain 37 GATCGTTAGCCTCATAT 693 HOXB13 BCOO7092 1804-1805 Homeo box B13 GATCACAAGGAATCCTG 694 CREB3L4 BCO38962 1806-1807 CAMP responsive element binding protein 3-like 4 GATCTCATGGATGATTA 695 LEPREL1 BCOOSO29 1808-1809 Leprecan-like 1 GATCCAGAAATAAAGTC 696 KLK4 CBOS1271 1810 Kallikrein 4 (prostase, enamel matrix, prostate) GATCTCACAGAAGATGT 697 MGC35558 NM 145013 1811-1812 Chromosome 11 open reading rame 45 GATCCAAAATCACCAAG 698. HAX1 BU157155 1813 HCLS1 associated protein X-1 GATCCTGGGCTGGAAGG 699 O AW2O72O6 1814 Hypothetical gene Supported by AY338954 GATCCAGATGCAGGACT 700 O BCO13389 1815 LOC44O156 GATCTGTGCTCATCTGT 701 TMEM16G BCO28162 1816-1817 Transmembrane protein 16G GATCATTTTATATCAAT 702 MGC31963 BXO9916O 1818 Chromosome 1 open reading rame 85 GATCCACACTGAGAGAG. 703 KLK3 BCOO5307 1819-1820 Kallikrein 3, (prostate specific antigen) GATCCGTCTGTGCACAT 704 TMPRSS2 NMOO5656 1821-1822 Transmembrane protease, serine 2 GATCATTGTAGGGTAAC 70S. LOC221442 BCO26923 823 Hypothetical protein LOC221442 GATCAGCCCTCAAAAAA 706 ARL10C BU1598OO 824 ADP-ribosylation factor-like 8B GATCTGGATTCAGGACC 707 MGC 13102 NM 032323 1825-1826 Hypothetical protein MGC13102 GATCAAAAATAAAATGT 708 O AI9S4252 827 Hypothetical gene Supported by AK022914: AKO95211; BCO16035; BC041856; BX248.778 GATCCGCTCTGGTCAAC 709 SEPX1 BQ941313 828 Selenoprotein X, 1 GATCCCTCAAG ACTGGT 710 ACPP BCOO7460 1829-1830 Acid phosphatase, prostate GATCCACAAAGACGAGG 711 BIN3 BI911790 831 Bridging integrator 3 GATCTCTCTGCGTTTGA 712 SPON2 BCOO2707 1832-1833 Spondin 2, extracellular matrix protein GATCTCA ACCTCGCTTG 713 O AKO26938 834 Hypothetical gene Supported by AL713796 GATCAAGTTCCCGCTGG 714 RPL18A BG818587 835 Ribosomal protein L18a US 9,002,652 B1 89 90 TABLE 5A-continued

PROSTATEENRICHED GENESIDENTIFIED BYRATIO. SCHEMA (RATIO >2.S)* MPSS Sig. SEQID Genbank Genbank SEQID Tissue Names MPSS Signature NO: Name Accession No. NOS: Description GATCATAATGAGGTTTG 715 ABCC4 NM 005845 1836-1837 ATP-binding cassette, Sub family C (CFTR/MRP), member 4 GATCGGTGACATCGTAA 71.6 RPS11 AA888,242 1838 Ribosomal protein S11 GATCCACCAGCTGATAA 717 NSEP1 CN353139 1839 Y box binding protein 1 GATCA ACACACTTTATT 718 FLJ22955 AA256381 1840 Hypothetical protein FLJ22955 GATCCCTTCCTTCCTCT 719 EHOXD11 AAS13SOS 1841 Homeo box D11 GATCAGGACACAGACTT 720 ORM1 BGS642S3 1842 Orosomucoid GATCCTGCAATCTTGTA 721 HTPAP AI572087 1843 Phosphatidic acid phosphatase type 2 domain containing 1B GATCCTCCTATGTTGTT 722 KLK2 AA259,243 1844 Kallikrein 2, prostatic GATCTGTACCTTGGCTA 723 SLC2A12 AI675682 1845 Solute carrier family 2 (facilitated glucose transporter), member 12 GATCGGGGCAAGAGAGG 724 NDRG1 NM OO6096 846-1847 N-myc downstream regulated gene 1 GATCCCCTCCCCTCCCC 725 NPR1 NM OOO906 848-1849 Natriuretic peptide receptor Aguanylate cyclase A (atrionatriuretic peptide receptor A) GATCCTACAAAGAAGGA 726 FL21511 NM O25087 850-1851 Hypothetical protein FLJ21511 GATCATTTGCAGTTAAG 727 FOXA1 NM 004496 852-1853 Forkhead box A1 GATCTGTCTCCTGCTCT 728 ENPP3 AIS35878 854 Ectonucleotide pyrophosphatase/phosphodiesterase 3 GATCCTTCCCAAGGTAC 729 GATA2 NM 032638 855-1856 GATA binding protein 2 GATCTTGTTGAAGTCAA 730 ARG2 BX331427 857 Arginase, type II GATCGCACCACTGTACA 731 XPO1 AIS 69484 858 Exportin 1 (CRM1 homolog, yeast) GATCATTTTCTGCTTTA 732 ASB3 BCOO9569 859-1860 Ankyrin repeat and SOCS box containing 3 GATCCCCACACTTGTCC 733 O AKOOOO28 861 Hypothetical LOC90024 GATCTGGAATTGTCATA 734 KLF3 BX1 OO634 862 Kruppel-like factor 3 (basic) GATCAATAAGCTTTAAA 735 TGM4 BCOO7003 863-1864 Transglutaminase 4 (prostate) GATCAATGTTTGTAGAT 736 FLJ16231 NM OO10084O1 865-1866 FLJ16231 protein GATCTACATGTCTATCA 737 BLINK BX113323 867 B-cell linker GATCTGTTTTAAATGAG 738 SLC14A1 NM O15865 868-1869 Solute carrier family 14 (urea transporter), member 1 (Kidd blood group) GATCAAAAAATGCTGCA 739 PTPLB AIO17286 870 Protein tyrosine phosphatase ike (proline instead of catalytic arginine), member b GATCATGTCTTCATTTT 740 ORS1E2 NM 030774 1871-1872 Olfactory receptor, family 51, Subfamily E, member 2 GATCCCTCCACCCCCAT 74.1 FAAH NM OO1441 1873-1874 Fatty acid amide hydrolase GATCCTAAGCCATAAAT 742 STAT6 ALO44554 1875 Signal transducer and activator of transcription 6, interleukin-4 induced GATCATCGTCCTCATCG 743 ANKH CBO494.66 1876 Ankylosis, progressive homolog (mouse) GATCATCATTTGTCATT 744 DSCR1L2 AWS75747 1877 Down syndrome critical region gene 1-like 2 GATCTAATTTGAAAAAC 745 TRPM8 NM 024080 1878-1879 Transient receptor potential cation channel, Subfamily M, member 8 GATCTTCCTTGTATCAT 746 TMC4 AV724SOS 880 Transmembrane channel-like 4 GATCTCCCCCATGCCTG 747 ZNF589 BCOOS859 1881-1882 Zinc finger protein 589 GATCAAATTTAGTATTT 748 LRRK1 BCOOS408 1883-1884 Leucine-rich repeat kinase 1 GATCTGCCTTATAAACA 749 STEAP2 AA177004 885 Six transmembrane epithelial antigen of the prostate 2 GATCAGAAAATGAGCTC 7SO SAFB2 BCOO1216 886 Scaffold attachment factor B2 GATCACCGTGGAGGTTA 751 CPE BG707154 887 Carboxypeptidase E GATCCCTCTGTGCTTCT 752 GNB2L1 AAO24878 888 Guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1 GATCTCATTTTTAGAGC 7S3 LOC92689 BU688574 889 Hypothetical protein BC001096 GATCATCACATTTCGTG 754 DLG1 BCO42118 890 Discs, large homolog 1 (Drosophila) GATCATTTTCTGCTTCA 755 SEMG1 NM OO3007 1891-1892 Semenogelin I GATCAATGAAGGAGAGA 756 SPATA13 BM875598 893 Spermatogenesis associated 13 GATCCCAACTACTCGGG 757 LOC157657 NM 177965 1894-1895 Chromosome 8 open reading frame 37 GATCAGTTTTTCTGTAA 758 KIAA1411 CA433208 896 KIAA1411 GATCAA AATTTTAAAAA 759 MGC2O781 BM984931 897 5'-nucleotidase, cytosolic III-like US 9,002,652 B1 91 92 TABLE 5A-continued

PROSTATEENRICHED GENESIDENTIFIED BYRATIO. SCHEMA (RATIO >2.S)* MPSS Sig. SEQID Genbank Genbank SEQID Tissue Names MPSS Signature NO: Name Accession No. NOS: Description GATCACCCTTCTCTTCC 1760 LOC2SS189 BCO3S335 1898-1899 Phospholipase A2, group IVF GATCCTGGGTACTGAAA 1761 ERBB2 BCO8O193 1900 V-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuroglioblastoma derived oncogene homolog (avian) GATCGTTCTA AGAGTGT 762 ZFP64 NM 199427 1901-1902 Zinc finger protein 64 homolog (mouse) GATCATCATCAAGGGCT 763 SUHW2 BCO42370 1903 Suppressor of hairy wing homolog 2 (Drosophila) GATCAA AATGATTTTCA 764 ELOVL7 AL137SO6 1904-1905 ELOVL family member 7, elongation of long chain fatty acids (yeast) GATCTGATTTTTTTCCC 765 TRAF4 AI888175 1906 TNF receptor-associated factor 4 GATCCCATTTCTCACCC 766 SLC39A2 AI669751 1907 Solute carrier family 39 (zinc transporter), member 2 GATCCTCCCGCCTTGCC 767 HNF4G AIO88739 1908 Hepatocyte nuclear factor 4, gamma GATCTTTCTTTTTTTGT 768 SLC22A3 BCO7O3OO 1909 Solute carrier family 22 (extraneuronal monoamine transporter), member 3 GATCTTAACTGTCTCCT 769 HIST2H2BE BCOOS827 1910 Histone 2, H2be GATCAGTTTGATTCTGT 770 AMD1 BCO41345 1911-1912 Adenosylmethionine decarboxylase 1 GATCATGATGTAGAGGG 771 TYMS BX390O36 1913 Thymidylate synthetase GATCGCACCACTACAGT 772 PHC3 AKO224.55 1914 Polyhomeotic like 3 (Drosophila) GATCTCAAAGTGCCTTC 773 SARG AL83.2940 915-1916 Chromosome 1 open reading rame 116 GATCAATGTCA AACTTC 774. MTERF BCOOO96S 917-1918 Mitochondrial transcription ermination factor GATCTCCCAGAGTCTAA 775 CYP4F8 NM 007253 919-1920 Cytochrome P450, family 4, Subfamily F, polypeptide 8 GATCCTGATGGCTGTGT 776 PPAP2A AK1244O1 1921 Phosphatidic acid phosphatase type 2A GATCACTTCCCGCAGTC 777 KIAAOOS6 BCO11408 922-1923 KIAA.0056 protein GATCTCAAAGGAACCAA 778 MSMB AA469293 1924 Microseminoprotein, beta GATCTGTGCCAGGGTTA 779 VEGF AKOS6914 1925 Vascular endothelial growth actor GATCTCTTTTTATTTAA 780 CDH1 NM 004360 926-1927 Cadherin 1, type 1, E-cadherin (epithelial) GATCTCCAGCACCAATC 781 TARP BCO62761 928-1929 TCRgamma alternate reading rame protein GATCTGGCGCTTGGGGG 782 RFP2 NM OO100.7278 930-1931 Ret finger protein 2 GATCCCGACGGGGGCAT 783 MESP1 NM 018670 932-1933 Mesoderm posterior 1 homolog (mouse) GATCCCGGGCCGTTATC 784 TRPM4 AAO26974 934 Transient receptor potential cation channel, Subfamily M, member 4 GATCTTTCTCAA AATAT 785 PAK1IP1 AI468032 935 PAK1 interacting protein 1 GATCGTGACGCTTAATA 786 HNRPA1 CF122297 936 Heterogeneous nuclear ribonucleoprotein A1 GATCGCATAATTTTTAA 787 ZNF2O7 CBOS3869 937 Zinc finger protein 207 GATCCCA ACACTGAAGG 788 WNK4 NM 032387 1938-1939 WNK lysine deficient protein kinase 4 GATCTTAAAAACTGCAG 789 APXL2 BQ448015 940 Apical protein 2 GATCATTTTTTCTATCA 790 MED28 AISS4477 941 Mediator of RNA polymerase II transcription, Subunit 28 homolog (yeast) GATCCCATTGTGTGTAT 791 LOC285300 AKO956SS 942 Hypothetical protein LOC2853OO GATCTCAAAGGAAAAAA 792 O AW291753 943 Transcribed locus GATCTTCTGTTATATTT 793 O BMO23121 944 Full length insert cDNA clone ZD79H10 GATCCACA ACATACAGC 794 O AY338.953 945 Prostate-specific P712P mRNA Sequence GATCTGTGCAGTTGTAA 795 O AYS33562 946 KLK16 mRNA, partial sequence GATCTACTATGCCAAAT 796 O BCO30554 947 (clone HGT25) T cell receptor gamma-chain mRNA, V region ratio of prostate expression in tpm to other organs greater than 2.5 US 9,002,652 B1 93 TABLE 5B

PROSTATEENRICHED GENESIDENTIFIED BY RATIO. SCHEMA (RATIO > 2.S)* SignalP3.0 Genbank prediction Genbank SEQID SignalP3.0 prediction Signal peptide Accession No. NOS: Name Prediction probability BCOOO637 797-1798 DHRS7 Signal peptide O.999 BCO29497 799-1800 NPY Signal peptide O.998 AW172826 8O1 FL2OO10 Non-secretory protein O.OO1 BIT 71919 802 C9orf61 Signal peptide O994 BU853.306 803 Lrp2bp Non-secretory protein O BCOO7092 804-1805 HOXB13 Non-secretory protein O BCO38962 806-1807 CREB3L4 Non-secretory protein O BCOOSO29 808-1809 LEPREL1 Signal peptide O.995 CBOS1271 810 KLK4 Signal peptide O.988 NM 145013 811-1812, MGC3SSS8 Signal peptide O.935 BU157155 813 HAX1 Non-secretory protein O.OO1 AW2O72O6 814 O Non-secretory protein O.OO1 BCO1338.9 815 O Non-secretory protein O BCO28162 816-1817 TMEM16G Non-secretory protein O.OO1 BXO9916O 818 MGC31963 Signal peptide O994 BCOO5307 819-1820 KLK3 Signal peptide O.992 NM OO5656 821-1822 TMPRSS2 Non-secretory protein O BCO26923 823 LOC221442 Signal anchor O.O1 BU1598OO 824 ARL10C Non-secretory protein O NM 032323 825-1826 MGC13102 Non-secretory protein O AI9S4252 827 O Non-secretory protein O.128 BQ941313 828 SEPX1 Non-secretory protein O BCOO7460 829-1830 ACPP Signal peptide 1 BI911790 831 BIN3 Non-secretory protein O BCOO2707 832-1833 SPON2 Signal peptide O.998 AKO26938 834 O Signal peptide 0.587 BG818587 835 RPL18A Non-secretory protein O NM 005845 836-1837 ABCC4 Non-secretory protein O AA888242 838 RPS11 Non-secretory protein O CN353139 839 NSEP1 Non-secretory protein O.OO1 CA2S6381 840 FLT22955 Non-secretory protein O.O6 AAS13 SOS 841 HOXD11 Non-secretory protein O BGS642S3 842 ORM1 Signal peptide 1 AI572O87 843 HTPAP Non-secretory protein O.O21 AA259243 844 KLK2 Signal peptide O.98S AI675682 845 SLC2A12 Non-secretory protein O NM OO6096 846-1847 NDRG1 Non-secretory protein O NM OOO906 848-1849 NPR1 Signal peptide O.997 NM O25087 850-1851 FL21511 Non-secretory protein O.OOS NM 004496 852-1853 FOXA1 Non-secretory protein O AIS35878 854 ENPP3 Non-secretory protein O.069 NM 032638 855-1856 GATA2 Non-secretory protein O BX331427 857 ARG2 Non-secretory protein O.O14 AIS69484 858 XPO1 Non-secretory protein O BCOO9569 859-1860 ASB3 Non-secretory protein O AKOOOO28 861 O Non-secretory protein O.OO1 BX1 OO634 862 KLF3 Non-secretory protein O BCOO7003 863-1864 TGM4 Non-secretory protein O NM OO10084O1 865-1866 FLJ16231 Non-secretory protein O BX113323 867 BLNK Non-secretory protein O NM O15865 868-1869 SLC14A1 Non-secretory protein O AIO17286 870 PTPLB Non-secretory protein O.O6 NM 030774 871-1872 ORS1E2 Non-secretory protein O.OO8 NM OO1441 873-1874 FAA Signal peptide O.805 ALO44554 875 STAT6 Non-secretory protein O CBO494.66 876 ANKH Non-secretory protein O.OO1 AW575747 877 DSCR1L2 Non-secretory protein O NM 024080 878-1879 TRPM8 Non-secretory protein O AV7245OS 88O TMC4 Non-secretory protein O BCOOS859 881-1882 ZNF589 Non-secretory protein O BCOOS408 883-1884 LRRK1 Non-secretory protein O AA177004 885 STEAP2 Non-secretory protein O BCOO1216 886 SAFEB2 Non-secretory protein O BG707154 887 CPE Signal peptide 1 AAO24878 888 GNB2L1 Non-secretory protein O BU688574 889 LOC92689 Non-secretory protein O BCO42118 890 DLG1 Non-secretory protein O NM OO3.007 891-1892 SEMG1 Signal peptide O.922 BM875598 893 SPATA13 Non-secretory protein O NM 177965 894-1895 LOC157657 Non-secretory protein O CA4332O8 896 KIAA1411 Non-secretory protein O BM984931 897 MGC2O781 Non-secretory protein O BCO35335 898-1899 LOC2SS189 Non-secretory protein O US 9,002,652 B1 95 TABLE 5B-continued

PROSTATEENRICHED GENESIDENTIFIED BY RATIO. SCHEMA (RATIO > 2.S)* SignalP3.0 Genbank prediction Genbank SEQID SignalP3.0 prediction Signal peptide Accession No. NOS: Name Prediction probability BCO8O193 900 ERBB2 Non-secretory protein O NM 199427 901-1902 ZFP64 Non-secretory protein O BCO42370 903 SUHW2 Non-secretory protein O AL137SO6 904-1905 ELOVL7 Non-secretory protein O AI888175 906 TRAF4 Non-secretory protein O AI669751 907 SLC39A2 Signal peptide O.982 AIO88739 908 HNF4G Non-secretory protein O.OO1 BCO7O3OO 909 SLC22A3 Signal anchor O.097 BCOOS827 910 HIST2H2BE Non-secretory protein O BCO41345 911-1912 AMD1 Non-secretory protein O BX390O36 913 TYMS Non-secretory protein O AKO224.55 914 PHC3 Non-secretory protein O AL83.2940 915-1916 SARG Non-secretory protein O BCOOO96S 917-1918 MTERF Non-secretory protein O NM 007253 919-1920 CYP4F8 Signal peptide 1 AK1244O1 921 PPAP2A Non-secretory protein O.348 BCO11408 922-1923 KIAAO056 Non-secretory protein O AA469293 924 MSMB Signal peptide O.997 AKO56914 925 VEGF Non-secretory protein O NM 004360 926-1927 CDH1 Signal peptide O896 BCO62761 928-1929 TARP Non-secretory protein O NM OO1007 278 930-1931 RFP2 Non-secretory protein O NM 018670 932-1933 MESP1 Signal anchor O.OO)4 AAO26974 934 TRPM4 Non-secretory protein O AI468032 935 PAK1IP1 Non-secretory protein O.OO1 CF122297 936 HNRPA1 Non-secretory protein O CBOS3869 937 ZNF2O7 Non-secretory protein O NM 032387 938-1939 WNK4 Non-secretory protein O BQ448015 940 APXL2 Non-secretory protein O AI554477 941 MED28 AKO956SS 942 LOC2853OO AW291753 943 O BMO23121 944 O AY33895.3 945 O AYS33562 946 O BCO30554 947 O

ratio of prostate expression in tpm to other organs greater than 2.5

TABLE 5C

PROSTATEENRICHED GENESIDENTIFIED BY RATIO. SCHEMA (RATIO > 2.S)*

SecretomeP2.0 TMHMM 2.0 SignalP3.0 prediction prediction Genbank prediction Secreted Pred trans Genbank SEQID Max cleavage potential membrane Accession No. NOs: l8le site probability (Odds) domains

BCOOO637 797-1798 DHRS7 O.599 between 6.3 1 bos. 28 and 29 BCO29497 799-1800 NPY O-520 between 6.09 1 bos. 28 and 29 AW172826 8O1 FL2O010 O.OOO between 6.06 O bos. 46 and 47 BIT 71919 8O2 C9orf61 O.534 between 5.9 2 bos. 29 and 30 BU853.306 803 Lrp2bp O.OOO between S.62 O bos. 55 and 56 BCOO7092 804-18OS HOXB13 O.OOO between 5.14 O pos. -1 and O BCO38962 806-1807 CREB3L4 OOOO between 4.72 O pos. -1 and O BCOOSO29 808-1809 LEPREL1 O.991 between 4.59 O bos. 24 and 25 CBOS1271 810 KLK4 0.401 between 4.57 1 bos. 29 and 30 NM 145013 811-1812, MGC35558 O.901 between 4.47 O bos. 22 and 23 BU157155 813 HAX1 O.OO1 between 4.41 O bos. 18 and 19 US 9,002,652 B1 97 98 TABLE 5C-continued

PROSTATEENRICHED GENESIDENTIFIED BY RATIO. SCHEMA (RATIO > 2.S)*

SecretomeP2.0 TMHMM 2.0 SignalP3.0 prediction prediction Genbank prediction Secreted Pred trans Genbank SEQID Max cleavage potential membrane Accession No. NOs: l8le site probability (Odds) domains

AW2O72O6 814 O O.OO1 between 4.39 O bos. 20 and 21 BCO13389 815 O O.OOO between 4.3 O bos. 27 and 28 BCO28162 816-1817 TMEM16G. O.OO1 between 4.29 7 bos. 22 and 23 BXO9916O 818 MGC31963 O.855 between 4.22 2 bos. 35 and 36 BCOO5307 819-1820 KLK3 O.525 between 3.938 O bos. 23 and 24 NM OO5656 821-1822 TMPRSS2 O.OOO between 3.86 1 pos. -1 and O BCO26923 823 LOC221442 (0.004 between 3.81 O bos. 50 and 51 BU1598OO 824 ARL10C O.OOO between 3.76 O bos. 35 and 36 NM 032323 825-1826 MGC13102 O.OOO between 3.69 5 pos. -1 and O AI9S4252 827 O 0.121 between 3.58 O bos. 42 and 43 BQ941313 828 SEPX1 O.OOO between 3.49 O bos. 13 and 14 BCOO7460 829-1830 ACPP 0.975 between 3.49 1 bos. 32 and 33 BI911790 831 BIN3 O.OOO between 3.41 O pos. -1 and O BCOO2707 832-1833 SPON2 O.829 between 3.06 O bos. 26 and 27 AKO26938 834 O O.568 between 3.02 O bos. 27 and 28 BG818587 835 RPL18A O.OOO between 2.8 O bos. 24 and 25 NM 005845 836-1837 ABCC4 O.OOO between 2.67 11 pos. -1 and O AA888242 838 RPS11 O.OOO between 2.64 O pos. -1 and O CN353139 839 NSEP1 O.OOO between 2.35 O bos. 25 and 26 AA256381 840 FLJ22955 O.O38 between 2.19 1 bos. 15 and 16 AAS13 SOS 841 HOXD11 O.OOO between 2.14 O bos. 20 and 21 BGS642S3 842 ORM1 O.923 between 2.03 O bos. 18 and 19 AI572O87 843 HTPAP O.O09 between 2.01 4 bos. 63 and 64 AA259243 844 KLK2 O455 between 81 O bos. 17 and 18 AI675682 845 SLC2A12 O.OOO between .79 12 bos. 51 and 52 NM OO6096 846-1847 NDRG1 O.OOO between .76 O pos. -1 and O NM OOO906 848-1849 NPR1 O.960 between 75 O bos. 32 and 33 NM O25087 850-1851 FL21511 O.005 between 75 10 bos. 20 and 21 NM 004496 852-1853 FOXA1 O.OOO between 71 O pos. -1 and O AIS35878 854 ENPP3 O.O36 between 69 1 bos. 42 and 43 NM 032638 855-1856 GATA2 O.OOO between 6S O bos. 22 and 23 BX331427 857 ARG2 O.O13 between S6 O bos. 36 and 37 AIS69484 858 XPO1 O.OOO between 54 O pos. -1 and O BCOO9569 859-1860 ASB3 O.OOO between 53 O pos. -1 and O AKOOOO28 861 O O.OOO between .46 O bos. 22 and 23 BX1 OO634 862 KLF3 O.OOO between .4 O pos. -1 and O

US 9,002,652 B1 101 102 TABLE 5C-continued

PROSTATEENRICHED GENESIDENTIFIED BY RATIO. SCHEMA (RATIO > 2.S)*

SecretomeP2.0 TMHMM 2.0 SignalP3.0 prediction prediction Genbank prediction Secreted Pred trans Genbank SEQID Max cleavage potential membrane Accession No. NOs: l8le site probability (Odds) domains

BX390O36 913 TYMS O.OOO between 0.57 O pos. -1 and O AKO224.55 914 PHC3 O.OOO between 0.57 O pos. -1 and O AL83.2940 915-1916 SARG O.OOO between O.S6 O bos. 21 and 22 BCOOO96S 917-1918. MTERF O.OOO between O.S6 O bos. 14 and 15 NM 007253 919-1920 CYP4F8 O.781 between O.S6 1 bos. 36 and 37 AK1244O1 921 PPAP2A O.226 between O.S3 5 bos. 30 and 31 BCO11408 922-1923 KIAAOOS6 O.OOO between O.S2 O pos. -1 and O AA469293 924 MSMB O.928 between O.S1 1 bos. 20 and 21 AKO56914 925 VEGF O.OOO between O.485 O pos. -1 and O NM 004360 926-1927 CDH1 O.487 between O.36 1 bos. 22 and 23 BCO62761 928-1929 TARP O.OOO between O.35 1 bos. 20 and 21 NM OO1007 278 1930-1931 RFP2 O.OOO between O.32 1 bos. 24 and 25 NM 018670 932-1933 MESP1 O.OO2 between O.31 O bos. 20 and 21 AAO26974 934 TRPM4 O.OOO between O.3 5 pos. -1 and O AI468032 935 PAK1IP1 O.OOO between 0.27 O bos. 25 and 26 CF122297 936 HNRPA1 O.OOO between O.22 O bos. 32 and 33 CBOS3869 937 ZNF2O7 O.OOO between O.21 O pos. -1 and O NM 032387 938-1939 WNK4 O.OOO between O.2 O pos. -1 and O BQ448015 940 APXL2 O.OOO between O.19 O pos. 41 and 42 AISS4477 941 MED28 #NA #NA AKO956SS 942 LOC285300 #NA #NA AW291753 943 O #NA #NA BMO23121 944 O #NA #NA AY33895.3 945 O #NA #NA AYS33562 946 O #NA #NA BCO30554 947 O #NA #NA

ratio of prostate expression in tpm to other organs greater than 2.5

TABLE5D

PROSTATEENRICHED GENESIDENTIFIED BYRATIO. SCHEMA (RATIO > 2.S)*

Genbank Prostate Genbank SEQID Expression Accession No. NOS: l8le NN-score Odds (tmp)

BCOOO637 1797-1798 DHRS7 O.92 6.302 754 BCO29497 1799-1800 NPY O.911 6.099 642 AW172826 1801 FL2OO10 O.911 6.061 92 BIT 71919 1802 C9orf61 O.906 S.902 91 BU853.306 1803 Lrp2bp O.895 S.626 95 BCOO7092 1804-1805 HOXB13 0.875 S.145 344 BCO38962 1806-1807 CREB3L4 O.866 4.721 334 BCOOSO29 1808-1809 LEPREL1 0.857 4.594 118 CBOS1271 1810 KLK4 O.856 4.575 360 NM 145013 1811-1812 MGC35558 O.86 4.477 53 BU157155 1813 HAX1 0.854 4.412 67 AW2O72O6 1814 O 0.854 4.391 279 BCO13389 1815 O O.85 4.304 64 BCO28162 1816-1817 TMEM16G O.843 4.293 281

US 9,002,652 B1 105 106 TABLE 5D-continued

PROSTATEENRICHED GENESIDENTIFIED BYRATIO. SCHEMA (RATIO > 2.S)*

Genbank Prostate Genbank SEQID Expression Accession No. NOS: l8le NN-score Odds (tmp) AK1244O1 921 PPAP2A O.211 O.S33 75 BCO11408 922-1923 KIAAOO56 O.281 0.527 287 AA469293 924 MSMB 0.27 0.517 275 AKO56914 925 VEGF O.256 O485 2O2 NM 004360 926-1927 CDH1 O.179 O.362 192 BCO62761 928-1929 TARP O.174 O.353 S64 NM OO1007 278 930-1931 RFP2 O.162 O.322 192 NM 018670 932-1933 MESP1 O.154 O.315 133 AAO26974 934 TRPM4 O.147 O.30S 290 AI468032 935 PAK1IP1 O.13 O.271 74 CF122297 936 HNRPA1 O. 106 O.228 104 CBOS3869 937 ZNF2O7 O.O99 O.212 72 NM 032387 938-1939 WNK4 O.O89 O.2O1 100 BQ448015 940 APXL2 O.083 O.19 244 AISS4477 941 MED28 700 AKO956SS 942 LOC2853OO 84 AW291753 943 O 310 BMO23121 944 O 178 AY33895.3 945 O 166 AYS33562 946 O 67 BCO30554 947 O 66 ratio of prostate expression in tpm to other organs greater than 2.5

Additional analysis was carried out to determine the secre bination with PSA. The NDRG1 prostate-specific protein tion potential of the prostate-specific genes identified. The further improved prostate cancer detection when used incom analysis programs used included SignalP 3.0, Secretome 2.0 30 bination with PSA. and TMHMM 2.0 (see http colon double slash www dot cbs Commercially available antibodies specific for numerous dotdtu dot dk/services/). The SignalPanalysis identifies clas sical secreted proteins and was conducted using the classical proteins encoded by prostate-specific genes as described in secretion pathway prediction as described at http colon Table 5 were used to determine which proteins would be double slash www dot cbs dot dtu dot dk/services/SignalP/ useful in a multiparameter diagnostic assay for prostate can (see Jannick Dyrlev Bendtsen, et al. J. Mol. Biol. 340:783 35 cer. Most of the commercially available antibodies were not 795, 2004; Henrik Nielsen et al., Protein Engineering, 10:1-6, Suitable (e.g., were not sensitive enough or showed non 1997; Henrik Nielsen and Anders Krogh. Proceedings of the specific binding). However, the antibody available for Sixth International Conference on Intelligent Systems for NDRG1 (anti-NDRG1 (Cterminal)poly IgY: CathA22272B: Molecular Biology (ISMB 6), AAAI Press, Menlo Park, GenWay Inc) was shown to specifically bind to NDRG1 from Calif., pp. 122-130, 1998). The Secretome2.0 analysis iden 40 serum. NDRG1 is a member of the N-myc downregulated tifies nonclassical secreted proteins (see J. Dyrlev Bendtsen, gene (NDRG) family that belongs to the alpha/beta hydrolase et al., Protein Eng. Des. Sel., 17(4):349-356, 2004). Superfamily. It is classified as a tumor Suppressor and heavy TMHMM uses hidden Markov model for three-state (TM metal-response protein. Its expression is modulated by helix, inside, outside) topology prediction of transmembrane diverse physiological and pathological conditions including proteins (see Erik L. L. Sonnhammer, et al., Proc. of Sixth Int. 45 hypoxia, cellular differentiation, heavy metal, N-myc and Conf. on Intelligent Systems for Molecular Biology, p. 175 neoplasia (Lachat P. et al.; Histochem Cell Biol. 2002 182 Ed. J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. November; 118 (5):399-408). Sankoff, and C. Sensen, Menlo Park, Calif: AAAI Press, NDRG1 protein expression was analyzed in serum 1998). According to the SignalP analysis method, proteins samples from 18 advanced prostate cancer patients, 21 pros with an odds scoring 3 or higher have a high confidence of 50 tate cancer patients with localized cancer, and 22 normal being secreted. However, it should be noted that several pro controls. Western blot analysis was used to measure serum teins scoring well below 3 by this method are known to be protein expression as follows: Serum was diluted (1:10) with secreted proteins detected in the blood (see e.g., Table 5, lysis buffer (50 mM Hepes, pH 7.4, 4 mM EDTA, 2 mM KLK2). Further, these analyses do not take into account pro EGTA, 2 uM PMSF, 20 g/ml, leupeptine (or 1x protease teins that may be shed. 55 inhibitor cocktail), 1 mM NaVO,10 mM NaF, 2 mM Na In Summary, this example identifies prostate-specific and pyrophosphate, 1% Triton X-100). Protein concentration was potentially secreted prostate-specific proteins that can be determined using the Bio-Rad protein assay kit. Serum pro used in diagnostic panels for the detection of diseases of the teins (50 g) were subjected to SDS-PAGE electrophoresis prostate. and transferred to a PVDF membrane (Hybond-P, Amersham 60 Pharmacia Biotech, Piscataway, N.J.). The membrane was Example 8 blocked with 4% non-fat milk in TBS (25 mM Tris, pH 7.4, 125 mM NaCl) for 1 h at room temperature, followed by Prostate Cancer Diagnostics Using Multiparameter incubation with primary antibodies against NDRG1 IgY Analysis (1:500) overnight at 4°C. The membranes were washed 3 65 times with TBS, and then incubated with horseradish peroxi This example describes a multiparameter diagnostic fin dase conjugated anti-rabbit IgY (1:16,000) for 1 h. The gerprint using the NDRG1 prostate-specific protein in com immunoblot was thenwashed five times with TBS and devel US 9,002,652 B1 107 108 oped using an ECL (Amersham). The intensities of the single TABLE 6-continued band corresponding to the NDRG1 protein were then scored. The results are summarized in Table 6 together with serum COMBINED ANALYSIS OF NDRG1 AND PSA SERUMEXPRESSION PSA measurements performed using a commercial ELISA INCREASES PROSTATE CANCER DIAGNOSIS CONFIDENCE. kit. 5 NDRG-1 PSA C8Ce intensity values serum diagnosis serum diagnosis TABLE 6 Status (scores) (ng/ml) by PSA by NDRG1 Primary O 4.58 Grey Zone of COMBINED ANALYSIS OF NDRG1 AND PSA SERUMEXPRESSION diagnosis by Psa INCREASES PROSTATE CANCERDIAGNOSIS CONFIDENCE. 10 Primary O 5.67 Grey Zone of diagnosis by Psa NDRG-1 PSA Primary -1 6.48 Grey Zone of C8Ce intensity values serum diagnosis serum diagnosis diagnosis by Psa Status (scores) (ng/ml) by PSA by NDRG Primary 3 12.71 possibly cancer strong NDRG-1 expression Advance 3 70.48 identified as identified as cancer E. the cancer by PSA by NDRG1 assay 15 diagnosis of this assay - patient as cancer Advance 4 127.3 identified as identified as cancer Primary 3 4.93 Grey Zone of strong NDRG-1 cancer by PSA by NDRG1 assay diagnosis by Psa expression assay inf th Advance 4 422.1 identified as identified as cancer E. finis cancer by PSA by NDRG1 assay 20 patient as cancer assay Primary 1 3.16 Grey Zone o Advance 4 1223 Clele 8S identified as cancer iagnosis by Psa cancer by PSA by NDRG1 assay Primary 1 4.87 Grey Zone o assay iagnosis by Psa Advance 4 71.28 identified as identified as cancer Primary 1 4.66 Grey Zone o cancer by PSA by NDRG1 assay 25 iagnosis by Psa assay Primary 1 6.87 Grey Zone o Advance 2 133.2 identified as missed by NDRG1 iagnosis by Psa cancer by PSA assay Primary O 3.91 Grey Zone o assay iagnosis by Psa Advance 4 353.7 identifie s identified as cancer Primary O 6.48 2. 2. cancer by PSA by NDRG1 assay 30 iagnosis by Psa assay Primary 2 13.1 possibly cancer Advance 1 73.95 identified as missed by NDRG1 Primary O 4.58 Grey Zone o cancer by PSA assay iagnosis by Psa assay Primary 1 4.72 Grey Zone o Advance 3 454.8 identified as identified as cancer iagnosis by Psa cancer by PSA by NDRG1 assay 35 Primary 4 12.71 possibly cancer strong NDRG-1 assay expression Advance 4 474 identified as identified as cancer E. the cancer by PSA by NDRG1 assay diagnosis of this assay - patient as cancer Advance 6 150.1 identified as identified as cancer Norma O.8 Norma O8. cancer by PSA by NDRG1 assay Norma O.8 Norma O8. assay - 40 Norma O O.6 Norma O8. Advance O 1375 Clele 8S missed by NDRG1 Norma Norma O8. cancer by PSA assay Norma .2 Norma O8. assay - Norma .91 Norma O8. Advance 6 71.28 identified as identified as cancer Norma 2 O.6 Norma O8. cancer by PSA by NDRG1 assay Norma 0.3 Norma O8. assay 45 Norma O Norma O8. Advance 6 4O66 Clele 8S identified as cancer Norma 0.4 Norma O8. cancer by PSA by NDRG1 assay Norma O.8 Norma O8. assay Norma O Norma O8. Advance 4 11.99 identified as identified as cancer Norma O.8 Norma O8. cancer by PSA by NDRG1 assay Norma 2 O.6 Norma O8. assay 50 Norma O.S Norma O8. Advance 1 38.14 identified as missed by NDRG1 Norma Norma O8. cancer by PSA assay Norma 0.7 Norma O8. assay Norma .2 Norma O8. Advance 6 552.6 identified as identified as cancer Norma .1 Norma O8. cancer by PSA by NDRG1 assay Norma O O.8 Norma O8. assay 55 Norma O 0.7 Norma O8. Advance 5 321 identified as identified as cancer Norma O O.6 Norma O8. cancer by PSAC by NDRG1 assay scores: no expression, -1; no expression to very faint, 0; expression levels then scored assay from 1 to 6 by intensities Primary -1 14.2 possibly cancer Primary 6.27 Grey Zone of PSA was detected in 100% of the advanced prostate can diagnosis by Psa 60 cers. NDRG1 was detected in 14 out of 18 advanced cancers Primary 2 9.2 Grey Zone of (78%) (see Table 6, scores greater than 3). Serum PSA levels diagnosis by a below 15 ng/ml, particularly, levels between 4-10 ng/ml (of Primary 1 8.57 Grey Zone o ten referred to as the grey Zone' in the PSA assay) cannot Primary O 5.67 Greydiagnosis Zone by of Psa reliablyliablvd detect prostate cancer as PSA 1evels ls in thisthi range may diagnosis by Psa 65 be the result of other factors such as infection (prostatitis) or Primary 2 11.3 possibly cancer benign prostatic hyperplasia (BPH), a common condition in older men. Additionally, the normal range of PSA values US 9,002,652 B1 109 110 increases with patient age. NDRG1 detection in serum rein and PSA can improve prostate cancer diagnosis to reduce forced the diagnosis of three prostate cancer patients with false positive and false negative rates. PSA levels between 4.9 ng/ml and 15 ng/ml. In these three From the foregoing it will be appreciated that, although patients, the NDRG1 scores were 3 or 4, significantly higher specific embodiments of the invention have been described than the NDRG1 scores in a cohort of 22 normal individuals 5 herein for purposes of illustration, various modifications may (average 0.09, range -1 to 2). be made without deviating from the spirit and scope of the Thus, this example illustrates that the use of two or more invention. Accordingly, the invention is not limited except as prostate specific/enriched cancer markers such as NDRG1 by the appended claims.

SEQUENCE LISTING The patent contains a lengthy “Sequence Listing section. A copy of the “Sequence Listing is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=USO9002652B1). An electronic copy of the “Sequence Listing will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR1.19(b)(3).

What is claimed is: cursor (CPE) having SEQID NO:1520 or olfactory receptor, 1. A method for identifying a set of at least 5 organ-specific family 51, subfamily E, member 2 (OR51e2) having SEQID NO:1740. proteins that are specifically produced in a preselected organ 25 and are shed into the blood comprising, 3. A method for identifying a set of at least 5 organ-specific (a) determining, using mRNA transcripts, the sequences of proteins that are specifically produced in a preselected organ and are shed into the blood comprising, sequences 17-20 nucleotides long that have been iden (a) determining, using mRNA transcripts, the sequences of tified as signature sequences characteristic of mRNA sequences 17-20 nucleotides long that have been iden transcripts that encode proteins from Substantially all 30 tified as signature sequences characteristic of mRNA mRNA transcripts isolated from a sample of said prese transcripts that encode proteins from Substantially all lected organ; mRNA transcripts isolated from a sample of said prese (b) comparing, using a computer, the signature sequences lected organ; to a database of known RNA transcripts to identify the (b) comparing, using a computer, the signature sequences mRNA transcripts that encode proteins to obtain a pre 35 to a database of known RNA transcripts to identify the liminary set of mRNA transcripts: mRNA transcripts that encode proteins to obtain a pre (c) comparing, using a computer, the identified mRNA liminary set of mRNA transcripts; transcripts to mRNA transcripts known to be expressed (c) identifying, using a computer, from among the mRNA in other organs; transcripts identified in (b) as encoding proteins, those (d) removing, using a computer, from said preliminary set 40 mRNA transcripts that are expressed in the preselected any mRNA transcripts that are known to be substantially organ at a level at least 2.5 times the level of expression of the same mRNA transcripts observed in other organs; expressed in other organs thus obtaining an intermediate (d) identifying, using a computer, from the transcripts in (c) set of mRNA transcripts; those transcripts that include a sequence encoding a (e) identifying, using a computer, from the remaining signal peptide, thereby identifying transcripts that mRNA transcripts in the intermediate set those mRNA 45 transcripts that include a sequence encoding a signal encode organ-specific proteins secreted or shed into the peptide, thereby identifying transcripts that encode blood; and organ-specific secreted proteins from said preselected (e) confirming, using a blood sample, the presence of the organ; and organ-specific secreted proteins by assessing the blood (f) confirming, using a blood sample, the presence of the 50 sample thereby identifying a set of at least 5 organ organ-specific secreted proteins by assessing the blood specific proteins shed into blood. sample, thereby identifying a set of at least 5 organ 4. The method of claim 3 wherein an identified organ specific proteins from said preselected organ that are specific protein is carboxypeptidase E precursor (CPE) hav shed into blood. ing SEQ ID NO:1520 or olfactory receptor, family 51, Sub 2. The method of claim 1 wherein one of said at least 5 55 family E, member 2 (OR51e2) having SEQID NO:1740. organ-specific proteins identified is carboxypeptidase E pre k k k k k