US 2013 O157891A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0157891 A1 Li et al. (43) Pub. Date: Jun. 20, 2013

(54) ORGAN SPECIFICDIAGNOSTIC PANELS Publication Classification AND METHODS FOR IDENTIFICATION OF ORGAN SPECIFIC PANEL (51) Int. Cl. GOIN 27/62 (2006.01) (76) Inventors: Xiao-Jun Li, Bellevue, WA (US); Paul (52) U.S. Cl. Edward Kearney, Montreal (CA) CPC ...... G0IN 27/62 (2013.01) (21) Appl. No.: 13/704,939 USPC ...... 506/9; 506/12

(22) PCT Filed: Jun. 24, 2011 (57) ABSTRACT (86). PCT No.: PCT/US11A1887 S371 (c)(1), The present application provides novel compositions, meth (2), (4) Date: Feb. 28, 2013 ods, and assays for use in identification of appropriate diag nostic markers in blood. These compositions, methods, and Related U.S. Application Data assays are capable of distinguishing normal levels of detect (60) Provisional application No. 61/358,372, filed on Jun. able markers from changes in marker levels that are indicative 24, 2010. of changes in health status. Patent Application Publication Jun. 20, 2013 Sheet 1 of 6 US 2013/O157891 A1

Figure 1

An example of a -specific panel of Sproteins {a,b,c,d,e}. Patent Application Publication Jun. 20, 2013 Sheet 2 of 6 US 2013/O157891 A1

Figure 2

stsdies Sirst corrested with iErg specific pateis Patent Application Publication Jun. 20, 2013 Sheet 3 of 6 US 2013/O157891 A1

Figure 3A

s SS RS Riit SF Patent Application Publication Jun. 20, 2013 Sheet 4 of 6 US 2013/O157891 A1

Figure 3B

stic libritis, FEpiGsts FLIES ------(xxxxarry arraywaxxwrxww.*\\\\yyyyyy------\\\\\\\“Yxxx sery-r------*\\\\\\\YYYYY.*\\\\\\\\xxyy s As s fair 93B - : . . HEB ~ . SS - S------x - ...... iii, d i : 57R-57t 3 ------

R R s EEIt is Patent Application Publication Jun 20 9 2013 Sheet 5 Of 6 US 2013/O157891 A1

Figure 4

Patent Application Publication Jun. 20, 2013 Sheet 6 of 6 US 2013/O157891 A1

Figure 5

Organ-Spec ificity of a Five - Panel

3. 3. 8

8,

??????????????????? zzzzzzzzzzzzzzzzzzzzzzzzzzzzzz);zrºzzzzzzzzzYYYYYYYYYXYYYYYYYYYYYYYYYYY - YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY zºzzzzzzzzzzzzzzzzºzzzzzzzzzzzzz

Orgar

US 2013/O157891 A1 Jun. 20, 2013

ORGAN SPECIFC DAGNOSTIC PANELS , hepatocytes, lung, lymph node, lymphocytes (b), lym AND METHODS FOR DENTIFICATION OF phocytes (t), monocytes, muscle (skeletal), muscle (Smooth), ORGAN SPECIFIC PANEL PROTEINS ovary, pancreas, pancreatic islet cells, prostate, prostate epi thelial cells, skin, epidermal keratinocytes, Small intestine, RELATED APPLICATIONS spleen, stomach, testes, thymus, , and . In 0001. This application is a national stage application, filed another aspect, the organ specific panel protein set is selected under 35 U.S.C. S371, of PCT Application No. PCT/US2011/ from proteins expressed by target provided in Tables 041887, filed on Jun. 24, 2011, which claims the benefit of 1-4. U.S. Provisional Application No. 61/358,372, filed Jun. 24, 0007. In another aspect, the organ specific panel protein set is selected such that the expression level of at least one of 2010, the contents of each of which are incorporated by the organ specific panel in the sample is above or below the reference herein in their entireties, including drawings. predetermined level. In another aspect, the expression levels BACKGROUND of the sample organ specific panel protein set and the control population organ specific panel protein set differ by at least 0002 One aim of modern diagnostic medicine is to better 10%. In another aspect, the organ specific panel protein set identify sensitive diagnostic methods to determine changes in comprises at least five organs. In another aspect, the organ health status. A variety of diagnostic assays and computa specific panel protein set comprises at least ten organs. In one tional methods are used to monitor health. Improved sensi aspect, the organ specific panel protein set is specific for the tivity is an important goal of diagnostic medicine. Early diag lung. In another aspect, the diagnostic method predicts a risk nosis and identification of disease and changes in health for developing lung disease. status may permit earlier intervention and treatment that will 0008 According to another embodiment, a method for produce healthier and more successful outcomes for the diagnosing a disease, condition or change in health status is patient. Diagnostic markers are important for assessing Sus provided, the method comprising (a) obtaining a sample of ceptibility to and diagnosing of disease and changes in health organ specific panel products from a subject; (b) mea status. In addition, diagnostic markers are important for pre Suring the presence or absence of a set of sample organ dicting response to treatment, determining prognosis, select specific panel gene products selected from the organ specific ing appropriate treatment and monitoring response to treat panel genes provided in Tables 1-4; (c) comparing the levels ment. of the set of sample organ specific panel gene products to a 0003. Many diagnostic markers are identified in the blood. predetermined control range for each organ-specific gene However, identification of appropriate diagnostic markers is product; and (d) diagnosing a disease, condition or change in challenging due to the complexity and variety of detectable health status based upon the difference between levels of the marker in the blood. Distinguishing between high abundance set of Sample organ specific panel gene products and the and low abundance detectable markers requires novel meth predetermined control range for each organ specific panel ods and assays to determine the differences between normal gene product. levels of detectable markers and changes of such detectable 0009. In one aspect, the biological sample is selected from markers that are indicative of changes in health status. The the group consisting of organs, tissue, bodily fluids and cells. present invention provides novel compositions, methods and In another aspect, the bodily fluid is selected from the group assays to fulfill these and other needs. consisting of blood, serum, plasma, urine, sputum, saliva, stool, spinal fluid, cerebral spinal fluid, lymph fluid, skin SUMMARY secretions, respiratory secretions, intestinal secretions, geni 0004. According to one embodiment, a method for pre tourinary tract secretions, tears, and milk. In another aspect, dicting a risk for development of a disease or change in health the biological sample is a blood sample. status is provided, the method comprising (a) obtaining a 0010. In one aspect, the one or more organ specific panel sample from a Subject; (b) measuring the presence or absence gene products are proteins. In another aspect, the one or more ofa set of sample organ specific panel proteins; (c) comparing organ specific panel gene products are RNA transcriptomes. the expression levels of the sample organ specific panel pro 0011. In one aspect, the disease is a lung disease. In tein set to predetermined expression levels of an identical set another aspect, the lung disease is a lung cancer selected from of organ specific panel proteins from a control population; (d) the group consisting of Small cell carcinoma, non-Small cell determining the expression level differences between the carcinoma, squamous cell carcinoma, adenocarcinoma, bron sample organ specific panel protein set and the predetermined cho-alveolar carcinoma, mixed pulmonary carcinoma, malig expression levels of the control population organ specific nant pleural mesothelioma and undifferentiated pulmonary panel protein set; and (d) predicting a risk for development of carcinoma. In another aspect, the lung disease is selected a disease or change in health status from the expression level from the group consisting of acute respiratory distress Syn differences between the sample organ specific panel protein drome (ARDS), alpha-1-antitrypsin deficiency, asbestos-re set and the control population organ specific panel protein set. lated lung diseases, asbestosis, asthma, bronchiectasis, bron 0005. In one aspect, the sample organ specific panel pro chitis, bronchopulmonary dysplasia (BPD), chronic teins are measured from a target organ. In another aspect, the bronchitis, chronic obstructive pulmonary disease (COPD), sample organ specific panel proteins are measured from a congenital cystic adenomatoid malformation, cystic fibrosis, plurality of organs. emphysema, hemothorax, idiopathic pulmonary fibrosis, 0006. In one aspect, the organ specific panel protein set is infant respiratory distress syndrome, lymphangioleiomyo selected from proteins expressed in the group of organs con matosis (LAM), pleural effusion pleurisy and other pleural sisting of adrenal gland, artery, bladder, (amygdala), disorders, pneumonia, pneumonoconiosis, pulmonary arte brain (nucleus caudate), breast, cervix, heart, kidney, renal rial hypertension, pulmonary fibrosis, respiratory distress cortical epithelial cells, renal proximal tubule epithelial cells, syndrome in infants, sarcoidosis and thoracentesis. US 2013/O157891 A1 Jun. 20, 2013

0012. In one aspect, the set of sample organ specific panel 0019. In one embodiment, the predetermined control gene products further comprises CLDN18, CPB2, WIF1, range is determined by analysis of a set of organs obtained by PPBP, and ALOX15B. healthy tissue donors. 0013. In one aspect, the levels of the set of sample organ 0020. In one embodiment, the one or more detection specific panel gene products is determined by a method reagents are specific to the first ten ranked lung cancer biom selected from the group consisting of mass spectrometry, an arkers in Table 4 that are in the organ of lung. MRM assay, an immunoassay, an ELISA, RT-PCR, a North ern blot, and Fluorescent In Situ Hybridization (FISH). In BRIEF DESCRIPTION OF THE DRAWINGS another aspect, the levels of the set of sample organ specific 0021 FIG. 1 shows a panel of five organ-specific proteins panel gene products are determined by an MRM assay. measured from different organs. 0014. In one aspect, the diagnostic method further com 0022 FIG. 2 is a graph illustrating the number of gene prises a diagnostic kit comprising a plurality of detection expression studies that correlated lung diseases with organ reagents to detect the set of sample organ specific panel gene specific proteins that relate to lung disease. products. In one aspect, the plurality of detection reagents are 0023 FIG. 3 is a set of graphs illustrating the median selected from the group consisting of antibodies, capture coefficient of variation (CV) as a function of maximum tag agents, multi-ligand capture agents and aptamers. count, evaluated from replicate datasets of the same samples. 0015. According to another embodiment, a method for (A) shows the different cDNA clones of the same samples. identifying a panel of disease-associated organ specific panel (B) shows the same cDNA clones but different sequencing gene products is provided, the method comprising (a) obtain U.S. ing a biological sample from a subject determined to have a 0024 FIG. 4 is a cluster dendrogram of 64 sequencing-by disease affecting a selected organ; (b) detecting a first level of synthesis (SBS) datasets of various human organs. one or more organ specific panel gene products selected from 0025 FIG. 5 is a bar graph illustrating the specificity of a any one or more of the organ specific panel genes provided in five-protein organ-specific protein panel (CLDN18, CPB2, Tables 1-4 in the biological sample; (c) comparing the first WIF1, PPBP and ALOX15B) and the specificities of constitu level of the one or more organ specific panel gene products to ent proteins. a predetermined control range; and (d) selecting one or more gene products as a member of the panel of disease-associated DETAILED DESCRIPTION organ specific panel gene products when the first level of one 0026. The present disclosure provides novel composi or more of the organ specific panel gene products in the tions, methods, assays and kits directed to diagnostic protein biological sample is above or below the corresponding pre markers or panels of markers that are organ-specific and determined control range. correlate to changes in health status or are diagnostic of a 0016. According to another embodiment, a method for disease. The markers identified herein are sensitive and accu generating a predetermined control range for one or more rate diagnostic markers and directed toward specific panels of organ specific panel gene products is provided, the method proteins that are identified in blood or tissue. The organ comprising the steps of (a) identifying one or more organ specific panels are groups or sets of organ-specific panel specific panel gene products using sequencing by Synthesis; proteins identified from organ samples obtained from popu (b) measuring the level of the one or more organ specific panel lations of normal human beings and specific patient popula gene product in a set of specific healthy organs; and (c) tions using the methods described herein. The present disclo determining a set of standard values for the one or more organ Sure provides computational methods to identify and specific panel gene product that is the predetermine control correlate organ-specific panel proteins and panels with dis range; wherein the predetermined control rage is compared to ease-associated proteins. The present disclosure identifies a biological sample from a subject to determine the health computational methods to select the composition of organ status of the subject. specific panel proteins and panels. 0027. The organ-specific diagnostic markers of the 0017. According to another embodiment, a method for present disclosure can be used for assessing Susceptibility to identifying a subject at risk for the development of lung and diagnosing of disease, conditions and changes in health cancer is provided, the method comprising (a) obtaining a status. In addition, the organ-specific diagnostic markers of sample from a Subject; (b) measuring expression levels of the present disclosure are important for predicting response to CLDN18, CPB2, WIF1, PPBP, and ALOX15B; and (c) pre and selection of treatment, monitoring treatment and deter dicting that the Subject is at risk for development of non-Small mining prognosis. The organ-specific diagnostic markers cell lung cancer based upon the presence of CLDN18, CPB2, may be used for staging the disease in patient (e.g., cancer) WIF1, PPBP, and ALOX15B in the sample. According to where multiple organs are involved. The organ-specific diag another embodiment, a method for diagnosing lung cancer is nostic markers may be used for monitoring the progression of provided, the method comprising (a) obtaining a sample from the disease (e.g., lung disease). Furthermore, the markers of a subject; (b) measuring expression levels of CLDN18, the present invention, alone or in combination, can be used for CPB2, WIF1, PPBP, and ALOX15B; and (c) predicting that detection of the source of metastasis found in anatomical the Subject is at risk for development of non-Small cell lung places other than the originating tissue. Also, one or more of cancer based upon the expression level of CLDN18, CPB2, the organ specific panel proteins and/or panels may be used in WIF1, PPBP, and ALOX15B in the sample. combination with one or more other disease markers (other 0018. In one aspect, the sample is a blood sample. In than those described herein), such as conventionally defined another aspect, the expression levels of CLDN18, CPB2, organ-specific protein, WIF1, PPBP and ALOX15B are determined by an MRM 0028. The diagnostic markers may optionally be deter assay. mined to be used as “detection reagents'. Detection reagents, US 2013/O157891 A1 Jun. 20, 2013

as used herein refer to any agent that that associates or binds amount of released or expressed organ specific diagnostic directly or indirectly to a molecule in the sample. In certain marker may be at a higher or lower level relative to the level embodiments, a detection reagent may comprise antibodies of organ specific diagnostic marker released or expressed in (or fragments thereof) either with a secondary detection an individual or individuals afflicted with the same disease, reagent attached thereto or without, nucleic acid probes, condition orchange in healthcare status. The measurement of aptamers, capture agents, or glycopeptides, etc. Further, a these organ specific diagnostic markers in patient samples panel” may comprise panels, arrays, mixtures, kits, or other provides information that the clinician can correlate with the arrangements of proteins, antibodies or fragments thereof to Susceptibility apatient has to a particular disease, condition or organ-specific panel proteins, nucleic acid molecules encod healthcare status, a probable diagnosis of a particular disease, ing organ-specific panel proteins, nucleic acid probes to that condition or health care status. hybridize to organ-specific nucleic acid sequences or capture 0032. According to the disclosed embodiments, the terms agents. Moreover, a panel may be derived from at least one “biomarker,” “marker,” “diagnostic marker are interchange organ or two or more organs. A panel may be derived from 3, able and may be an or nucleic acid sequence, 4, 5, 6, 7, 8, 9, 10 or more organs. The panels are comprised including, but not limited to, DNA, RNA, microRNA, pro of a plurality of detection reagents each of which specifically tein, peptide, or any other gene product that may be present detects a protein (or transcript). In most embodiments, the either in blood or any other tissue or bodily fluid. The methods detection reagents are substantially organ-specific but may of the present invention may be generalized to develop diag also comprise non-organ specific reagents for use as controls nostic panels for any disease or health condition that utilizes or other purposes. In certain aspects, the panels comprise DNA, RNA or protein measurements. detection reagents, each of which specifically detects an 0033. The terms “biomarkers.” “diagnostic markers.” organ-specific protein (or transcript). The term specifically is “markers' and “biomolecular sequences (amino acid and/or a term of art that would be readily understood by the skilled nucleic acid sequences) discovered using the disclosed meth artisan to mean, in this context, that the protein of interest is ods can be efficiently utilized as tissue or pathological mark detected by the particular detection reagent but other proteins ers for diagnosing, treating or preventing a disease, condition are not substantially detected. Specificity can be determined or change in health status. using appropriate positive and negative controls and by rou 0034. The terms “polypeptide,” “peptide,” and “protein' tinely optimizing conditions. are used interchangeably herein to refer to an amino acid 0029. The organ-specific diagnostic markers of the sequence comprising a polymer of amino acid residues. The present disclosure are unique as they are identified by com terms apply to amino acid polymers in which one or more putational methods that compare markers obtained from amino acid residues is an artificial chemical mimetic of a populations with specific diseases or diagnosis to a marker corresponding naturally occurring amino acid, as well as to data set obtained from the organs of healthy cadavers. The naturally occurring amino acid polymers and non-naturally marker data set obtained from healthy cadavers was the result occurring amino acid polymers. of using methods described herein to identify markers from 0035. The terms “glycopeptide' or “glycoprotein” refers the following tissue types: adrenal gland, artery, bladder, to a peptide that contains covalently bound carbohydrate. The brain (amygdala), brain (nucleus caudate), breast, cervix, carbohydrate can be a monosaccharide, oligosaccharide or heart, kidney, renal cortical epithelial cells, renal proximal polysaccharide. The terms “glycopeptide' or “glycoprotein’ tubule epithelial cells, liver, hepatocytes, lung, lymph node, refers to a peptide that contains covalently bound carbohy lymphocytes (b), lymphocytes (t), monocytes, muscle (skel drate. The carbohydrate can be a monosaccharide, oligosac etal), muscle (Smooth), ovary, pancreas, pancreatic islet cells, charide or polysaccharide. prostate, prostate epithelial cells, skin, epidermal kerati 0036. The term “amino acid refers to naturally occurring nocytes, Small intestine, spleen, stomach, testes, thymus, tra and synthetic amino acids, as well as amino acid analogs and chea, and uterus. amino acid mimetics that function in a manner similar to the 0030 Thus, using data obtained from a normal subject naturally occurring amino acids. Naturally occurring amino population as a baseline, the disclosed methods use these data acids are those encoded by the genetic code, as well as those sets that include expression levels of a plurality of markers. amino acids that are later modified, e.g., hydroxyproline, This set of markers may include all candidate markers which Y-carboxyglutamate, and O-phosphoserine. The term "amino may be suspected as being relevant to the detection of a acid analogs' refers to compounds that have the same basic particular disease, condition, or change in health status, chemical structure as a naturally occurring amino acid, i.e., a although, actual measured relevance is not required. Embodi carbon that is bound to a hydrogen, a carboxyl group, an ments of the disclosed methods may be used to determine amino group, and an R group, e.g., homoserine, norleucine, which of the candidate markers are most relevant to the diag methionine sulfoxide, methionine methyl sulfonium. Such nosis of the disease, condition or change in health status. analogs have modified R groups (e.g., norleucine) or modi 0031 Biomolecular sequences (amino acid and/or nucleic fied peptide backbones, but retain the same basic chemical acid sequences) uncovered using the disclosed methods can structure as a naturally occurring amino acid. The term be efficiently utilized as tissue or pathological markers and/or “amino acid mimetics' refers to chemical compounds that as drugs or drug targets for treating or preventing a disease. have a structure that is different from the general chemical The organ-specific diagnostic markers are released to the structure of an amino acid, but that functions in a manner bloodstream or are found in tissue under conditions of a similar to a naturally occurring amino acid. particular disease, condition or change in health status. 0037 Amino acids may be referred to herein by either Depending upon the circumstances, the amount of released or their commonly known three letter symbols or by the one expressed organ specific marker may be at a higher or lower letter symbols recommended by the IUPAC-IUB Biochemi level relative to normal. Similarly, when assessing the stage of cal Nomenclature Commission. , likewise, may a disease, condition, or change in health care status, the be referred to by their commonly accepted single-letter codes. US 2013/O157891 A1 Jun. 20, 2013

0038. The term “nucleic acid' or “nucleic acid sequence' nucleotides' as defined herein. In general, the term “poly refers to deoxyribonucleotides or ribonucleotides and poly ' embraces all chemically, enzymatically and/or mers thereof in either single- or double-stranded form, and metabolically modified forms of unmodified polynucle complements thereof. The term encompasses nucleic acids otides, as well as the chemical forms of DNA and RNA containing known nucleotide analogs or modified backbone characteristic of viruses and cells, including simple and com residues or linkages, which are synthetic, naturally occurring, plex cells. and non-naturally occurring, which have similar binding 0043. The term “antibody” as used herein refers to a pro properties as the reference nucleic acid, and which are tein of the kind that is produced by activated B cells after metabolized in a manner similar to the reference nucleotides. stimulation by an antigen and can bind specifically to the 0039. Unless otherwise indicated, aparticular nucleic acid antigen promoting an immune response in biological sys sequence also implicitly encompasses conservatively modi tems. Full antibodies typically consist of four subunits includ fied variants thereof (e.g., degenerate codon Substitutions) ing two heavy chains and two light chains. The term antibody and complementary sequences, as well as the sequence includes natural and synthetic antibodies, including but not explicitly indicated. Specifically, degenerate codon Substitu limited to monoclonal antibodies, polyclonal antibodies or tions may be achieved by generating sequences in which the fragments thereof. Exemplary antibodies include IgA, Ig|D. third position of one or more selected (or all) codons is sub IgGI, IgG2, IgG3, IgM and the like. Exemplary fragments stituted with mixed-base and/or deoxyinosine residues include Fab Fv, Fab' F(ab')2 and the like. A monoclonal (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et antibody is an antibody that specifically binds to and is al., J. Biol. Chem. 260:2605-2608 (1985); Rossolini et al., thereby defined as complementary to a single particular spa Mol. Cell. Probes 8:91-98 (1994)). The term nucleic acid is tial and polar organization of another biomolecule which is used interchangeably with gene, cDNA, mRNA, oligonucle termed an "epitope. In some forms, monoclonal antibodies otide, and polynucleotide. can also have the same structure. A polyclonal antibody refers 0040. A particular nucleic acid sequence also implicitly to a mixture of different monoclonal antibodies. In some encompasses “splice variants. Similarly, a particular protein forms, polyclonal antibodies can be a mixture of monoclonal encoded by a nucleic acid implicitly encompasses any protein antibodies where at least two of the monoclonal antibodies encoded by a splice variant of that nucleic acid. Any products binding to a different antigenic epitope. The different anti of a splicing reaction, including recombinant forms of the genic epitopes can be on the same target, different targets, or splice products, are included in this definition. a combination. Antibodies can be prepared by techniques that 0041. The term "oligonucleotide” refers to a relatively are well known in the art, such as immunization of a host and short polynucleotide, including, without limitation, single collection of Sera (polyclonal) or by preparing continuous Stranded deoxyribonucleotides, single- or double-stranded hybridoma cell lines and collecting the secreted protein ribonucleotides, RNA:DNA hybrids and double-stranded (monoclonal). . Oligonucleotides, such as single-stranded DNA probe 0044) The term “aptamers' as used here indicates oligo oligonucleotides, are often synthesized by chemical methods, nucleic acid or peptide molecules that binda specific target. In for example, using automated oligonucleotide synthesizers particular, nucleic acid aptamers can comprise, for example, that are commercially available. However, oligonucleotides nucleic acid species that have been engineered through can be made by a variety of other methods, including in vitro repeated rounds of in vitro selection or equivalently, SELEX recombinant DNA-mediated techniques and by expression of (systematic evolution of ligands by exponential enrichment) DNAS in cells and organisms. to bind to various molecular targets Such as Small molecules, 0042. The term “polynucleotide,” when used in singular or proteins, nucleic acids, and even cells, tissues and organisms. plural, generally refers to any polyribonucleotide or Aptamers are useful in biotechnological and therapeutic polydeoxribonucleotide, which may be unmodified RNA or applications as they offer molecular recognition properties DNA or modified RNA or DNA. Thus, for instance, poly that rival that of the antibodies. nucleotides as defined herein include, without limitation, 0045. The term “multi-ligand capture agents' used herein single- and double-stranded DNA, DNA including single indicates an agent that can specifically bind to a target through and double-stranded regions, single- and double-stranded the specific binding of multiple ligands comprised in the RNA, and RNA including single- and double-stranded agent. For example, a multi-ligand capture agent can be a regions, hybrid molecules comprising DNA and RNA that capture agent that is configured to specifically bind to a target may be single-stranded or, more typically, double-stranded or through the specific binding of multiple ligands comprised in include single- and double-stranded regions. In addition, the the capture agents. Multi-ligand capture agents can include term “polynucleotide' as used herein refers to triple-stranded molecules of various chemical natures (e.g., polypeptides regions comprising RNA or DNA or both RNA and DNA. polynucleotides and/or Small molecules) and comprise both The strands in Such regions may be from the same molecule or capture agents that are formed by the ligands and capture from different molecules. The regions may include all of one agents that attach at least one of the ligands. or more of the molecules, but more typically involve a region 0046. In particular, multi-ligand capture agents herein of some of the molecules. One of the molecules of a triple described can comprise two or more ligands each capable of helical region often is an oligonucleotide. The term “poly binding a target. The term “ligand’ as used herein indicates a nucleotide’ specifically includes cDNAs. The term includes compound with an affinity to bind to a target. This affinity can DNAs (including cDNAs) and RNAs that contain one or more take any form. For example, such affinity can be described in modified bases. Thus, DNAS or RNAs with backbones modi terms of non-covalent interactions. Such as the type of binding fied for stability or for other reasons are “polynucleotides’ as that occurs in that are specific for certain Substrates that term is intended herein. Moreover, DNAS or RNAs com and is detectable. Typically, those interactions include several prising unusual bases, such as inosine, or modified bases, weak interactions, such as hydrophobic, Van der Waals, and such as tritiated bases, are included within the term “poly hydrogenbonding which typically take place simultaneously. US 2013/O157891 A1 Jun. 20, 2013

Exemplary ligands include molecules comprised of multiple obtained from an organ). Such organs include but are not Subunits taken from the group of amino acids, non-natural limited to kidney, liver, heart, skin, large or Small intestine, amino acids, and artificial amino acids, and organic mol pancreas, and . Further included in this definition are ecules, each having a measurable affinity for a specific target bones and blood vessels (e.g., aortic transplants). (e.g., a protein target). More particularly, exemplary ligands 0052. In certain embodiments, the tissue or organ is “iso include polypeptides and peptides, or other molecules which lated meaning that it is not located within an organism. can possibly be modified to include one or more functional 0053 Examples of suitable biological samples which may groups. The disclosed ligands, for example, can have an affin optionally be used with preferred embodiments of the present ity for a target, can bind to a target, can specifically bind to a invention include but are not limited to blood, serum, plasma, target, and/or can be bindingly distinguishable from one or blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, more other ligands in binding to a target. Generally, the dis lymph fluid, the external Secretions of the skin, respiratory, closed multi-ligand capture agents will bind specifically to a intestinal, and genitourinary tracts, tears, milk, neuronal tis target. Where it is not necessary that the individual ligands Sue, lung tissue, any human organs or tissue, including any comprised in the multi-ligand capture agent be capable of tumor or normal tissue, any sample obtained by lavage (for specifically binding to the target individually, although this is example of the bronchial system or of the breast ductal sys also contemplated. tem), and also samples of in Vivo cell culture constituents. In a preferred embodiment, the biological sample comprises Diagnostic Assays lung tissue and/or sputum and/or a serum sample and/or a 0047. In some embodiments, the biomarkers are present in urine sample and/or any other tissue or liquid sample. The tissues and/or organs at normal physiological conditions, but sample can optionally be diluted with a suitable eluant before when expressed at a higher or lower level in tissue or cells are contacting the sample to an antibody and/or performing any indicative of a disease, condition orchange in health status. In other diagnostic assay. other embodiments, the biomarkers may be absent in tissues 0054 Numerous well known tissue or fluid collection and/or organs under normal physiological conditions, but methods can be utilized to collect a biological sample from a when expressed in tissue or cells, are indicative of a disease, subject in order to determine the level of DNA, RNA and/or condition or change in health status. In other embodiments, polypeptide of the variant of interest in the subject. Examples the biomarkers may be specifically released to the blood include, but are not limited to, fine needle biopsy, needle stream by changes in health, or diseases, and/or are over- or biopsy, core needle biopsy and Surgical biopsy (e.g., brain under-expressed as compared to normal levels. Measurement biopsy), and lavage. Regardless of the procedure employed, of biomarkers in patient samples provides information that once a biopsy/sample is obtained the level of the diagnostic may correlate with a diagnosis of a selected disease. In one marker can be determined and a diagnosis can thus be made. embodiment, the disease is a lung disease or lung cancer. 0055 As used herein, the term “level” refers to expression 0048. As used herein the phrase “diagnosing” refers to levels of RNA and/or protein and/or DNA copy number of a classifying a disease or a symptom, determining a severity of marker of the present invention. Determining the level of the the disease, monitoring disease progression, forecasting an same marker in normal tissues of the same origin is used as a outcome of a disease and/or prospects of recovery. The term comparison to detect an elevated expression and/or amplifi “detecting may also optionally encompass any of the above. cation and/ora decreased expression, of the marker compared 0049 Diagnosis of a disease according to the disclosed to the normal tissues. Typically the level of the marker in a methods can be affected by determining a level of a poly biological sample obtained from the subject is different (i.e., nucleotide or a polypeptide of the present invention in a increased or decreased) from the level of the same marker in biological sample obtained from the subject, wherein the a similar sample obtained from a healthy individual (ex level determined can be correlated with predisposition to, or amples of biological samples are described herein). presence or absence of the disease. It should be noted that a 0056. A “test sample' or “test amount of a marker refers “biological sample obtained from the subject’ (patient) may to an amount of a marker in a subject’s sample that is consis also optionally comprise a sample that has not been physi tent with a diagnosis a disease, condition or change in health cally removed from the subject, as described in greater detail status. In one embodiment, the disease is lung cancer. A test below. sample or test amount can be either in absolute amount (e.g., 0050. In some embodiments, the disclosed methods pro nanogram/mL or microgram/mL) or a relative amount (e.g., vide for obtaining a sample from a Subject or a patient. As relative intensity of signals). used herein, the term "subject” refers to any animal (e.g., a 0057. A “control sample' or “control amount of a marker ), including but not limited to humans, non-human can be any amount or a range of amounts to be compared primates, rodents, dogs, pigs, and the like. In certain embodi against a test amount of a marker. For example, a control ments, it is contemplated that one or more cells, tissues, or amount of a marker can be the amount of a marker in a organs are separated from an organism. The term "isolated population of patients with a specified disease (or one of the can be used to describe such biological matter. It is contem above indicative conditions) or a control population of indi plated that the methods of the present invention may be prac viduals without said disease (or one of the above indicative ticed on in vivo and/or isolated biological matter. conditions). A control amount can be either in absolute 0051. Though tissue is composed of cells, it will be under amount (e.g., nanogram/mL or microgram/mL) or a relative stood that the term “tissue' refers to an aggregate of similar amount (e.g., relative intensity of signals). cells forming a definite kind of structural material. Moreover, 0.058 An “increase or a decrease' in the level of a gene an organ is a particular type of tissue. The term “organ” refers product compared to a preselected control level as used herein to any anatomical part or member having a specific function refers to a positive or negative change in amount from the in the animal. Further included within the meaning of this control level. An increase is typically at least 10%, or at least term are Substantial portions of organs (e.g., cohesive tissues 20%, or 50%, or 2-fold, or at least 2-fold, 3-fold, 4, fold, US 2013/O157891 A1 Jun. 20, 2013

5-fold, to at least 10-fold to at least 20-fold to at least 40 fold of the polypeptide in one sample is significantly different or higher. Similarly, a decrease is typically at a similar fold from the amount of the polypeptide in the other sample. It difference or at least 10%, 20%, 30%, 40% at least 50%, or at should be noted that if the marker is detectable in one sample least 80%, or at least 90%, or even as high as more than 99% and not detectable in the other, then such a marker can be in reduction from the control level. considered to be differentially present. 0059. The terms “differentially expressed gene.” “differ 0062. The terms “cancer and "cancerous” refer to or ential gene expression' and their synonyms, which are used describe the physiological condition in that is typi interchangeably, refer to a gene whose expression is activated cally characterized by unregulated cell growth. Examples of to a higher or lower level in a Subject Suffering from a disease, cancer include but are not limited to, breast cancer, colon a condition or change in health status relative to its expression cancer, rectal cancer, lung cancer, prostate cancer, hepatocel in a normal population or control population. The terms also lular cancer, gastric cancer, pancreatic cancer, cervical can include genes whose expression is activated to a higher or cer, ovarian cancer, liver cancer, bladder cancer, cancer of the lower level at different stages of the same disease. It is also urinary tract, thyroid cancer, renal cancer, carcinoma, mela understood that a differentially expressed gene may be either noma, head and neck cancer, esophageal cancer, testicular activated or inhibited at the nucleic acid level or protein level, cancer, uterine cancer, brain cancer, lymphoma, sarcomas or may be subject to alternative splicing to result in a different and leukemia. polypeptide product. Such differences may be evidenced by a 0063. In one embodiment, the disease is a lung cancer. In change in mRNA levels, Surface expression, secretion or another embodiment, the disease is a lung disease. other partitioning of a polypeptide. Differential gene expres 0064. A lung cancer as described herein may include, but sion may include a comparison of expression between two or is not limited to, Small cell carcinoma, non-Small cell carci more genes or their gene products, or a comparison of the noma, squamous cell carcinoma, adenocarcinoma, broncho ratios of the expression between two or more genes or their alveolar carcinoma, mixed pulmonary carcinoma, malignant gene products, or even a comparison of two differently pro pleural mesothelioma or undifferentiated pulmonary carci cessed products of the same gene, which differ between nor Oa. mal Subjects and Subjects suffering from a disease, specifi 0065. A lung disease as described herein may include, but cally cancer, or between various stages of the same disease. is not limited to, acute respiratory distress syndrome (ARDS), Differential expression includes both quantitative, as well as alpha-1-antitrypsin deficiency, acute respiratory distress Syn qualitative, differences in the temporal or cellular expression drome (ARDS), asbestos-related lung diseases, asbestosis, pattern in a gene or its expression products among, for asthma, bronchiectasis, bronchitis, bronchopulmonary dys example, normal and diseased cells, or among cells which plasia (BPD), chronic bronchitis, chronic obstructive pulmo have undergone different disease events or disease stages. For nary disease (COPD), congenital cystic adenomatoid malfor the purpose of this invention, “differential gene expression' is mation, cystic fibrosis, emphysema, hemothorax, idiopathic considered to be present when there is at least an about two pulmonary fibrosis, infant respiratory distress syndrome, fold, or at least 2-fold, 3-fold, 4, fold, 5-fold, to at least lymphangioleiomyomatosis (LAM), pleural effusion pleu 10-fold to at least 20-fold to at least 40 fold or higher. Simi risy and other pleural disorders, pneumonia, pneumonoco larly, a difference between the expression of a given gene in niosis, pulmonary arterial hypertension, pulmonary fibrosis, normal and diseased subjects, or in various stages of disease development in a diseased subject. Differential gene expres respiratory distress syndrome in infants, sarcoidosis or tho sion may also be described as a percentage change when a racentesis. subject is compared typically at a similar fold difference or at 0066. The “pathology” of (tumor) cancer includes all phe least 10%, 20%, 30%, 40% at least 50%, or at least 80%, or at nomena that compromise the well-being of the patient. This includes, without limitation, abnormal or uncontrollable cell least 90%, or even as high as more than 99% in reduction from growth, metastasis, interference with the normal functioning the control level. of neighboring cells, release of cytokines or other secretory 0060. In one example, described herein, the organ specific products at abnormal levels, Suppression or aggravation of diagnostic markers may be used for staging a lung disease or inflammatory or immunological response, neoplasia, prema a lung cancer and/or monitoring the progression of the dis lignancy, malignancy, invasion of Surrounding or distant tis ease or cancer. Further, one or more of the organ specific diagnostic markers may optionally be used in combination Sues or organs, such as lymph nodes, etc. with one or more other lung disease or lung cancer biomark Computational Methods for Diagnosis, Prognosis ers (other than those described herein). and Otherwise Monitoring a Disease 0061. The phrase “differentially present” refers to differ ences in the quantity of a marker present in a sample taken 0067. The embodiments provided herein are also be from patients having a disease or one of the above indicative directed to a computational method or algorithm used for conditions) as compared to a comparable sample taken from prognosis, prediction, Screening, early diagnosis, staging, patients who do not have a disease or one of the above indica therapy selection and treatment monitoring of any selected tive conditions. For example, a nucleic acid fragment may be disease, condition or change in health status. Such a method differentially present between the two samples if the amount is based on (1) identification of organ-specific gene products of the nucleic acid fragment in one sample is significantly and/or panels, (2) assigning a weight to the organ-specific different from the amount of the nucleic acid fragment in the gene products and/or panels to reflect their value in prognosis, other sample, for example as measured by hybridization and/ prediction, Screening, early diagnosis, staging, therapy selec or NAT-based assays which involve nucleic acid amplifica tion and treatment monitoring a particular disease, and (3) tion technology, Such as PCR for example (or variations determination of threshold values used to divide patients into thereof such as real-time PCR for example). A polypeptide is groups with varying degrees of risk. Such methods are differentially present between the two samples if the amount described in detail in the examples below. US 2013/O157891 A1 Jun. 20, 2013

0068. The first step in generating data to be analyzed by 0075 3. The total tag count in the first k organs was at the algorithm is gene or protein expression profiling. In some least half of the total in all organs, i.e., S/Ss20.5, embodiments, an assay issued to detect and measure the where S was the total tag count in the first k organs. levels of specified genes (mRNAs) or their expression prod ucts (proteins) in a biological sample comprising cancer cells. 0076 A panel of n organ-specific panel proteins is organ specific if there is an organ in which all n organ-specific panel proteins, individually, are expressed. Although the term “pro Identification of Organ-Specific Panel Gene Products tein’ is used to describe organ-specific panels herein, this 0069. According to the embodiments described herein, definition applies to all suitable gene products, including organ-specific panel proteins and organ-specific panels are nucleic acid molecules and proteins and functional fragments provided. Previous methods have defined a protein (or other thereof. The term protein is used for convenience. gene product) as being organ-specific if the majority (50% or more) of its expression level across the organs and/or tissues 0077 More generally, every protein has an expression of the human body (or Some other species) is from one organ profile across a library of organs and/or tissues. If p denotes 2, 5, 6,9. For example, if the expression level of a protein the protein then let e(p) denote the expression profile across across 25 human organs was measured and greater than 50% organs and/or tissues. Furthermore, assume e(p) is normal of that expression was in the kidney then the protein would be ized so that e(p) represents a probability distribution, that is, considered kidney-specific. the Sum of e(p) across all organs/tissues is 1. Let S be a panel 0070 An organ-specific panel protein is a protein whose of n proteins, namely, p1, p2, ..., pn}. The joint probability expression levelacross a set or group of organs and/or tissues distribution of Sacross the organs/tissues is simply e(S)=C*e of the human body (or Some other species) is predominately (p1)*e(p2)*...*e(pn) where C is a constant normalization (50% or more) from a fixed number (k) or fewer organs where factor so that the sum of e(S) across all organs/tissues is 1. k is some predefined number such as 5 (FIG. 1). For example, Finally, let T be a percentage threshold, e.g., 80%, that defines if the expression level of a protein across 25 human organs organ-specificity for a panel. The S is organ-specific for an was measured and 90% of that expression was in k or fewer organ Q if the probability of Q is T or greater in e(S) and all organs (e.g., kidney, liver, lung, bladder and spleen), then the other organs have probability below T. protein would be considered kidney, liver, lung, bladder, spleen-specific. Equivalently, it would be considered kid 0078. The organ-specific panel proteins and panels ney-specific (and liver-specific, lung-specific, bladder-spe described herein may be associated with known disease-as cific and spleen-specific). This generalization is motivated by sociated proteins. We used the NextBio database obtained the fact that diagnostics are becoming increasingly multivari from NextBio, Inc. (Cupertino, Calif.) to compare the popu ate (i.e., measuring multiple analytes such as proteins or lation of markers obtained from the healthy cadaver donors genes) so that a multivariate definition of organ-specificity is with markers defined in various clinical studies related to lung required. For purposes of this invention, korgans refers to any disease and lung cancer. However, the computational meth number of the organs from the following exemplary tissue ods of the present invention may be generalized to any disease types: adrenal gland, artery, bladder, brain (amygdala), brain process. As described in the examples below, 115 novel lung (nucleus caudate), breast, cervix, heart, kidney, renal cortical specific proteins (k=5) were identified and compared to the epithelial cells, renal proximal tubule epithelial cells, liver, NextBio clinical study database which associates a list of hepatocytes, lung, lymph node, lymphocytes (b), lympho proteins (115) to clinical studies containing a statistically cytes (t), monocytes, muscle (skeletal), muscle (Smooth), significant Subset of these proteins (or their gene origins) ovary, pancreas, pancreatic islet cells, prostate, prostate epi where these proteins are modulated by disease. This enables thelial cells, skin, epidermal keratinocytes, Small intestine, the identification of proteins that are both organ-specific and spleen, Stomach, testes, thymus, trachea, and uterus. Thusk disease modulated. Such panels of proteins are then more may be from 1 to 5, to 10, to 20, to 25 to 25 to 30 organs or specific to an organ (and its diseases) than non-organ-specific tissue types. panels. (see Table 2). 0071. To evaluate whether a protein is an organ-specific 007.9 The 115 lung-specific proteins identified in panel protein, the following analysis is used. First, the pro Example 2 (Tables 2 and 5) were compared with disease tein's abundance in different organs was sorted from high to relevant genes in the NextBio studies. As anticipated, it was low. More specifically, the SBS tag counts of the protein were found that traditionally defined lung-specific proteins were Sorted Such that n2n-2 ... 2ns, where n, was the tag count highly indicative of lung diseases and lung cancers. UneX in organ. The protein is specific to the first k organs if its tag pectedly, we discovered that proteins that were not tradition counts satisfy all three conditions listed below: ally defined as lung specific were also highly correlated with 0072 1. Tag counts in the first k organs were at or above lung diseases and lung cancers. These proteins are organ the noise level of SBS data while those in other organs specific panel proteins, more specifically, lung-specific panel proteins according to the present invention. Two sets of these were below the noise level, i.e., n.210 and n<10; lung-specific proteins that had high potential to be biomark 0073 2. Tag counts in the first k organs were signifi ers for lung diseases or lung cancers were also identified. In cantly above those in other organs. one analysis, we determined that a five-protein lung-specific 0074. We used an exact binomial test to calculate the p panel of proteins according to the present invention were value distinguishing the drawing of n tags from a total of Sas biomarkers for lung canceras set forth in the below examples. tags with the drawing of n. tags from S2s tags, where S2s The five-protein panel demonstrated that the panel was both was the total tag count in all organs. The difference was lung-specific and highly indicative for lung cancers even considered significant if the two-sided p value was no greater though the proteins were not entirely lung-specific according than 0.05; to the traditional definition of an organ specific protein. US 2013/O157891 A1 Jun. 20, 2013

Methods of Measuring Protein Diagnostic Markers I0085 Differential gene expression can also be identified, or confirmed using the microarray technique. In a specific 0080. There are a variety of methods used to measure embodiment of the microarray technique, PCR amplified protein diagnostic markers. As anyone skilled in the art will inserts of cDNA clones are applied to a substrate in a dense determine, typical methods that measure changes in mRNA array. Preferably at least 10,000 nucleotide sequences are expression may be used to determine control and test levels of applied to the Substrate. The microarrayed genes, immobi proteins. lized on the microchip at 10,000 elements each, are suitable 0081 Methods of gene expression profiling directed to for hybridization under stringent conditions. Fluorescently measuring mRNA levels can be divided into two large groups: labeled cDNA probes may be generated through incorpora methods based on hybridization analysis of polynucleotides, tion of fluorescent nucleotides by reverse transcription of and methods based on sequencing of polynucleotides. The RNA extracted from tissues of interest. Labeled cDNA most commonly used methods known in the art for the quan probes applied to the chip hybridize with specificity to each tification of mRNA expression in a sample include northern spot of DNA on the array. After stringent washing to remove blotting and in situ hybridization (Parker & Barnes, Methods non-specifically bound probes, the chip is scanned by confo in Molecular Biology 106:247-283 (1999)); RNAse protec cal laser microscopy or by another detection method, Such as tion assays (Hood, Biotechniques 13:852-854 (1992)); and a CCD camera. Quantitation of hybridization of each arrayed reverse transcription polymerase chain reaction (RT-PCR) element allows for assessment of corresponding mRNA (Weis et al., Trends in Genetics 8:263-264 (1992)). Alterna abundance. With dual color fluorescence, separately labeled tively, antibodies may be employed that can recognize spe cDNA probes generated from two sources of RNA are hybrid cific duplexes, including DNA duplexes, RNA duplexes, and ized pairwise to the array. The relative abundance of the DNA-RNA hybrid duplexes or DNA-protein duplexes. Rep transcripts from the two sources corresponding to each speci resentative methods for sequencing-based gene expression fied gene is thus determined simultaneously. The miniatur analysis include Serial Analysis of Gene Expression (SAGE), ized scale of the hybridization affords a convenient and rapid and gene expression analysis by massively parallel signature evaluation of the expression pattern for large numbers of sequencing (MPSS). genes. Such methods have been shown to have the sensitivity 0082 RNA sequencing (“Whole Transcriptome Shotgun required to detect rare transcripts, which are expressed at a Sequencing” (“WTSS)) will be used in transcriptomics and few copies per cell, and to reproducibly detect at least refers to the use of high-throughput sequencing technologies approximately two-fold differences in the expression levels to sequence cDNA to get information about a sample's RNA (Schena et al., Proc. Natl. Acad. Sci. USA 93(2):106-149 content, and is used in the study of diseases like cancer. (1996)). Microarray analysis can be performed by commer 0083 General methods for mRNA extraction are well cially available equipment, following manufacturer's proto known in the art and are disclosed in standard textbooks of cols, such as by using the Affymetrix GeneChip(R) or other molecular biology, including Ausubel et al., Current Proto Suitable microarray technology. cols of Molecular Biology, John Wiley and Sons (1997). 0086. In some embodiments, genomic sequence analysis, Methods for RNA extraction from paraffin embedded tissues or genotyping, may be performed on the sample. This geno are disclosed, for example, in Rupp and Locker, Lab Invest. typing may take the form of mutational analysis such as single 56:A67 (1987), and De Andres et al., BioTechniques nucleotide polymorphism (SNP) analysis, insertion deletion 18:42044 (1995). While the practice of the invention will be polymorphism (InDel) analysis, variable number of tandem illustrated with reference to techniques developed to deter repeat (VNTR) analysis, copy number variation (CNV) mine mRNA levels in a biological (e.g., tissue) sample, other analysis or partial or whole genome sequencing. Methods for techniques, such as methods of proteomics analysis are also performing genomic analyses are known to the art and may included within the broad definition of gene expression pro include high throughput sequencing. Methods for performing filing, and are within the scope herein. In general, a preferred genomic analyses may also include microarray methods as gene expression profiling method for use with paraffin-em described. In some cases, genomic analysis may be per bedded tissue is quantitative reverse transcriptase polymerase formed in combination with any of the other methods herein. chain reaction (qRT-PCR), however, other technology plat For example, a sample may be obtained, tested for adequacy, forms, including mass spectroscopy and DNA microarrays and divided into aliquots. One or more aliquots may then be can also be used. used for cytological analysis of the present invention, one or 0084. A sensitive and flexible quantitative method is more may be used for RNA expression profiling methods of reverse transcriptase PCR (RT-PCR), which can be used to the present invention, and one or more can be used for compare mRNA levels in different sample populations, in genomic analysis. It is further understood the present inven normal and tumor tissues, with or without drug treatment, to tion anticipates that one skilled in the art may wish to perform characterize patterns of gene expression, to discriminate other analyses on the biological sample that are not explicitly between closely related mRNAs, and to analyze RNA struc provided herein. ture. A variation of the RT-PCR technique is the real time I0087 Serial analysis of gene expression (SAGE) is a quantitative PCR (qRT-PCR), which measures PCR product method that allows the simultaneous and quantitative analysis accumulation through a dual-labeled fluorigenic probe (i.e., of a large number of gene transcripts, without the need of TaqMan(R) probe). Real time PCR is compatible both with providing an individual hybridization probe for each tran quantitative competitive PCR, where an internal competitor Script. For more details see, e.g., Velculescu et al., Science for each target sequence is used for normalization, and with 270:484-487 (1995); and Velculescu et al., Cell 88:243-51 quantitative comparative PCR using a normalization gene (1997). contained within the sample, or a housekeeping gene for I0088 Gene expression analysis by massively parallel sig RT-PCR. For further details see, e.g., Held et al., Genome nature sequencing (MPSS), described by Brenner et al., Research 6:986-994 (1996). Nature Biotechnology 18:630-634 (2000), is a sequencing US 2013/O157891 A1 Jun. 20, 2013

approach that combines non-gel-based signature sequencing data using bioinformatics. Proteomics methods are valuable with invitro cloning of millions oftemplates on separate 5um Supplements to other methods of gene expression profiling, diameter microbeads. First, a microbead library of DNA tem and can be used, alone or in combination with other methods, plates is constructed by in vitro cloning. This is followed by to detect the products of the prognostic markers of the present the assembly of a planar array of the template-containing invention. microbeads in a flow cell at a high density (typically greater (0097 Transcriptome. than 3x10' microbeads per cm). The free ends of the cloned (0098. The term “transcriptome” is defined as the totality of templates on each microbead are analyzed simultaneously, RNA transcripts present in a sample (e.g., organ, tissue, using a fluorescence-based signature sequencing method that organism, population of cells or a single cell) at a certain point does not require DNA fragment separation. This method has of time. Transcriptomics includes, among other things, study been shown to simultaneously and accurately provide, in a of the global changes of RNA transcripts present in a sample. single operation, hundreds of thousands of gene signature (0099 Mass Spectrometry Methods. sequences from a yeast cDNA library. 0100. The use of mass spectrometry, in accordance with 0089. Immunoassays. the disclosed methods and organ specific panels can provide 0090. An "immunoassay” is an assay that uses an antibody information on not only the mass to charge ratio of ions to specifically bind an antigen. The immunoassay is charac generated from a sample, but also the relative abundance of terized by the use of specific binding properties of a particular Such ions. Under standardized experimental conditions, it is antibody to isolate, target, and/or quantify the antigen. therefore possible to compare the abundance of a noncovalent 0091 For example, solid-phase ELISA immunoassays are biomolecule-ligand complex ion with the ion abundance of routinely used to select antibodies specifically immunoreac the noncovalent complex formed between a biomolecule and tive with a protein (see, e.g., Harlow & Lane, Antibodies. A a standard molecule, such as a known Substrate or inhibitor. Laboratory Manual (1988), for a description of immunoassay Through this comparison, binding affinity of the ligand for formats and conditions that can be used to determine specific the biomolecule, relative to the known binding of a standard immunoreactivity). Typically, a specific or selective reaction molecule, may be ascertained. In addition, the absolute bind will be at least twice background signal or noise and more ing affinity can also be determined. typically more than 10 to 100 times background. 0101. A variety of mass spectrometry systems can be 0092 Exemplary detectable labels, optionally and prefer employed for identifying and/or quantifying organ-specific ably for use with immunoassays, includebut are not limited to proteins in biological samples. Mass analyzers with high magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., mass accuracy, high Sensitivity and high resolution include, horse radish peroxide, alkaline phosphatase and others com but are not limited to, ion trap, triple quadrupole, and time monly used in an ELISA), and calorimetric labels such as of-flight, quadrupole time-of-flight mass spectrometers and colloidal gold or colored glass or plastic beads. Alternatively, Fourier transform ion cyclotron mass analyzers (FT-ICR the marker in the sample can be detected using an indirect MS). Mass spectrometers are typically equipped with matrix assay, wherein, for example, a second, labeled antibody is assisted laser desorption (MALDI) and electrospray ioniza used to detect bound marker-specific antibody, and/or in a tion (ESI) sources, although other methods of peptide competition or inhibition assay wherein, for example, a ionization can also be used. In ion trap MS, analytes are monoclonal antibody which binds to a distinct epitope of the ionized by ESI or MALDI and then put into an ion trap. marker are incubated simultaneously with the mixture. Trapped ions can then be separately analyzed by MS upon 0093. Immunohistochemistry. selective release from the ion trap. Organ-specific proteins 0094. Immunohistochemistry methods are also suitable can be analyzed, for example, by single stage mass spectrom for detecting the expression levels of the prognostic biomar etry with a MALDI-TOF or ESI-TOF system. kers described herein. Thus, antibodies or antisera, preferably 0102 Mass spectrometry may be used to detect proteins in polyclonal antisera, and most preferably monoclonal anti a biological sample. MS relies on the discriminating power of bodies specific for each marker are used to detect expression. mass analyzers to select a specific analyte and on ion current The antibodies can be detected by direct labeling of the anti measurements for quantitation. In the field of analytical bodies themselves, for example, with radioactive labels, fluo chemistry, many Small molecule analytes (e.g., drug metabo rescent labels, hapten labels such as, biotin, or an lites, hormones, protein degradation products and pesticides) Such as horse radish or alkaline phosphatase. are routinely measured using this approach at high through Alternatively, unlabeled primary antibody is used in conjunc put with great precision (CV-5%). Most such assays employ tion with a labeled secondary antibody, comprising antisera, electrospray ionization followed by two stages of mass selec polyclonal antisera or a monoclonal antibody specific for the tion: a first stage (MS1) selecting the mass of the intact primary antibody. Immunohistochemistry protocols and kits analyte (parention) and, after fragmentation of the parent by are well known in the art and are commercially available. collision with gas atoms, a second stage (MS2) Selecting a 0095 Proteomics. specific fragment of the parent, collectively generating a 0096. The term “proteome' is defined as the totality of the selected reaction monitoring (SRM, plural MRM) assay. The proteins present in a sample (e.g., organ, tissue, organism, or two mass filters produce a very specific and sensitive response cell culture) at a certain point of time. Proteomics includes, for the selected analyte, which can be used to detect and among other things, study of the global changes of protein integrate a peak in a simple one-dimensional chromato expression in a sample (also referred to as “expression pro graphic separation of the sample. In principle, this MS-based teomics”). Proteomics typically includes the following steps: approach can provide absolute structural specificity for the (1) separation of individual proteins in a sample by 2-D gel analyte, and, in combination with appropriate stable-isotope electrophoresis (2-D PAGE); (2) identification of the indi labeled internal standards (SIS), it can provide absolute quan vidual proteins recovered from the gel, e.g., by mass spec titation of analyte concentration. These measurements have trometry or N-terminal sequencing, and (3) analysis of the been multiplexed to provide 30 or more specific assays in one US 2013/O157891 A1 Jun. 20, 2013 run. Such methods are slowly gaining acceptance in the clini isotopes, nonisotopic hybridization has become increasingly cal laboratory for the routine measurement of endogenous popular, with fluorescent hybridization (Nature Methods metabolites (e.g., in screening newborns for a panel of inborn 2005, 2, 237-238) now a common choice as it is considerably errors of metabolism) and some drugs (e.g., immunosuppre faster, usually has greater signal resolution, and provides sants). many options to simultaneously visualize different targets by 0103) Thus, in some embodiments, the mass spectrometry combining various detection methods. assay may include a multiple reaction monitoring (MRM) assay may be used. An MRM approach may be applied to the Kits measurement of specific peptides in complex mixtures Such 0106. In yet another aspect, the present invention provides as tryptic digests of plasma. In this case, a specific tryptic kits for aiding a diagnosis of a disease, Such as lung cancer, peptide can be selected as a Stoichiometric representative of wherein the kits can be used to detect the markers of the the protein from which it is cleaved, and quantitated against a present invention. For example, the kits can be used to detect spiked internal standard (a synthetic stable-isotope labeled any one or combination of markers described above, which peptide) to yield a measure of protein concentration. In prin markers are differentially present in samples of patients with ciple, Such an assay requires only knowledge of the masses of disease or a change in health status and normal Subjects the selected peptide and its fragment ions, and an ability to patients. make the stable isotope-labeled version. C-reactive protein, 0107. In one embodiment, a kit comprises: (a) a substrate apo A-I lipoprotein, human growth hormone and prostate comprising an adsorbent thereon, wherein the adsorbent is specific antigen (PSA) have been measured in plasma or Suitable for binding a marker, and (b) a washing solution or serum using this approach. Since the sensitivity of these instructions for making a washing solution, wherein the com assays is limited by mass spectrometer dynamic range and by bination of the adsorbent and the washing solution allows the capacity and resolution of the assisting chromatography detection of the marker as previously described. separation(s), hybrid methods have also been developed cou 0.108 Optionally, the kit can further comprise instructions pling MRM assays with enrichment of proteins by immun for suitable operational parameters in the form of a label or a odepletion and size exclusion chromatography or enrichment separate insert. For example, the kit may have standard of peptides by antibody capture (SISCAPA). In essence, the instructions informing a consumer/kit user how to wash the latter approach uses the mass spectrometer as a 'second anti probe after a sample of seminal plasma or other tissue sample body' that has absolute structural specificity. SISCAPA has is contacted on the probe. been shown to extend the sensitivity of a peptide assay by at 0109. In another embodiment, a kit comprises (a) an anti least two orders of magnitude and with further development body that specifically binds to a marker; and (b) a detection appears capable of extending the MRM method to cover the reagent. Such kits can be prepared from the materials full known dynamic range of plasma (i.e., to the pg/ml level). described above. 0104. In other embodiments, Matrix-Assisted Laser Des 0110. In either embodiment, the kit may optionally further orption/Ionization Mass Spectrometry (MALDI-MS) is comprise a standard or control information, and/or a control another method that can be used for studying biomolecules amount of material, so that the test sample can be compared (Hillenkamp et al., Anal. Chem., 1991, 63, 1193A-1203A). with the control information standard and/or control amount This technique ionizes high molecular weight biopolymers to determine if the test amount of a marker detected in a with minimal concomitant fragmentation of the sample mate sample is a diagnostic amount consistent with a diagnosis of rial. This is typically accomplished via the incorporation of lung cancer. the sample to be analyzed into a matrix that absorbs radiation Statistics from an incident UV or IR laser. This energy is then trans ferred from the matrix to the sample resulting in desorption of 0111. The statistically meaningful difference may have p the sample into the gas phase with Subsequent ionization and values that are statistically meaningfully higher or lower than minimal fragmentation. One of the advantages of MALDI the expression level of the patient group or control group. MS over ESI-MS is the simplicity of the spectra obtained as Preferably, the p value may be less than 0.05. MALDI spectra are generally dominated by singly charged 0112 Having described the invention with reference to the species. Typically, the detection of the gaseous ions generated embodiments and illustrative examples, those in the art may by MALDI techniques, are detected and analyzed by deter appreciate modifications to the invention as described and mining the time-of-flight (TO) of these ions. While MALDI illustrated that do not depart from the spirit and scope of the TOF MS is not a high resolution technique, resolution can be invention as disclosed in the specification. The examples are improved by making modifications to Such systems, by the set forth to aid in understanding the invention but are not use of tandem MS techniques, or by the use of other types of intended to, and should not be construed to limit its scope in analyzers, such as Fourier transform (FT) and quadrupole ion any way. The examples do not include detailed descriptions of traps. conventional methods. Such methods are well knownto those 0105. In situ hybridization (ISH) is used to visualize of ordinary skill in the art and are described in numerous defined nucleic acid sequences in cellular preparations by publications. All references cited above and in the examples hybridization of complementary probe sequences. Through below are hereby incorporated by reference in their entirety, nucleic acid hybridization, the degree of sequence identity as if fully set forth herein. can be determined, and specific sequences can be detected Example 1 and located on a given . The method comprises of three basic steps: fixation of a specimen on a microscope Generation of Organ Datasets. Using slide, hybridization of labeled probe to homologous frag Sequencing-By-Synthesis ments of genomic DNA, and enzymatic detection of the 0113 Data generated from transcriptomic profiling of 25 tagged target hybrids. Probe sequences can be labeled with human organs was analyzed using sequencing-by synthesis US 2013/O157891 A1 Jun. 20, 2013

(SBS). Organ-specific proteins as set forth herein resulted in counts less than 1,000, (2) were not annotated to classes the identification of 2,648 unique organ-specific proteins. As 1, 2, 3, or 4 under Solexa annotation, or (3) had same demonstrated by comparing lung-specific proteins with counts as any other tags in the same groups. Tag "GAT genes that were determined in transcriptomic studies on CAAATATCACTCTCCTA was annotated as class 4 human diseases, organ-specific panel proteins were highly under Solexa annotation and thus was used for estimat indicative of diseases or changes of health status. ing sequencing errors: 0.122 (v) Unannotated tags in the remaining SBS tag SBS Dataset of Human Tissues groups were identified as incidences of sequencing 0114. The comparative set of biomarkers comprised an errors, whose rates were estimated by the ratios of analysis of the transcriptomes in specific human organs. counts of unannotated tags to counts of the most abun Analysis was performed by Solexa (now Illumina, Inc.) San dant tags. In the above example, the most abundant tag Diego, Calif. A total of 25 human organs were collected from was annotated. So an incidence of A->C, A->G, or A->T a cohort of healthy donors. Most samples came from donors sequencing error was identified by each of the three who died in accidents. Organs were divided and pooled by unannotated tags. The corresponding error rate was esti type and donor gender. Other samples were purchased from mated at 673/85,974=0.0078, 39/85,974=0.00045, or Vendors. 173/85,974=0.0020, respectively; 0115 The data included 64 datasets: some organs con 0123 (vi) Sequencing error rates in each dataset were tained samples from multiple donors; some samples were estimated by the medians of corresponding incident analyzed in multiple sequencing runs. A detailed list of the sequencing error rates in the dataset; datasets is summarized in Table 6. 0.124 (vii) The overall sequencing error rates were esti 0116 Message RNA (mRNA) molecules were extracted mated by the medians of corresponding sequencing from the samples and assessed for quality. Samples of mRNA error rates in individual datasets and were listed in Table molecules that passed quality control were sent to Solexa 8: (now Illumina) for transcriptomic analysis under a service (0.125 (viii) For each SBS dataset, contributions by contract, using their then existing SBS protocol on the sequencing errors of the most abundant tags to counts of Genome Analyzer 1. The SBS data set from the analysis of other tags in the same SBS tag groups were estimated by each set of pooled organs contained a list of 20-base tags multiplying the counts of the most abundant tags with derived from transcripts in the samples and their correspond the corresponding sequencing error rates listed in Table ing abundance. The tags had a canonical initiation sequence 8. Sequence errors were rounded up to integers and of GATC due to the enzyme used in digesting cDNA mol Subtracted from the counts of other tags; and ecules. The tags were also annotated under the same annota 0.126 (ix) Only SBS tags with positive tag counts after tion system that was used by Solexa (now Illumina) for mas correcting for sequencing errors were kept for further sive parallel signature sequencing (MPSS) tags 2.3. The analysis. number of SBS tags in individual datasets ranged from 164, 0127 Second, sequences of primer-dimers and sequences 918 tags in dataset “HCC59 to 663,447 tags in dataset of REPEAT were removed. SBS tags that are ubiquitous in “HCC20. were annotated as REPEAT under Solexa annotation. These tags were not reliable for measuring tran Analysis of the SBS Data Scripts in samples and were thus removed from further analy sis. Similarly, SBS tags that were identical to primer-dimers 0117 The SBS data obtained as described above was ana listed in Table 7 were also removed from further analysis. lyzed to identify organ-specific proteins. First, sequencing I0128. Third, SBS tags to RNA RefSeq sequences were errors from tag counts were Subtracted and tags whose counts annotated and unannotated tags were removed. Two files of were below sequencing errors were removed. SBS tags are RNA RefSeq sequences were downloaded from National prone to Small sequencing errors, particularly in the end por Center for Biotechnology Information (NCBI) website: (1) tion of the base tags. The following steps were used to esti “human.ma.fna.gz’ (43.504 sequences, from ftp://ftp.ncbi. mate and correct sequencing errors occurring in the last bases nih.gov/refseq/H sapiens/mRNA Prot/); and (2) “rna.fa.gz' of tags: (42,753 sequences, from ftp://ftp.ncbi.nih.gov/refseq/H sa 0118 (i) For each dataset, SBS tags that differed in their piens/H sapiens/RNA?). Sequences in the two files were last bases were grouped together. For example, tags combined and reconciled, which led to a list of 44,706 RNA “GATCAAATATCACTCTCCTA (count 85.974), RefSeq sequences. The sequences were then theoretically “GATCAAATATCACTCTCCTC” (count 673), “GAT digested into 20-base tags with an initiation sequence of CAAATATCACTCTCCTT (count 173), “GAT GATC. Both sense and antisense tags were kept. Unique tags CAAATATCACTCTCCTG’ (count 39) were grouped were then annotated to RNA RefSeq, accession numbers: (1) together in dataset “HCC01 A': if they belonged to any sense sequences of RNAs, they were 0119 (ii) SBS tags that differed in the last bases of the classified as “F” (for “forward') and annotated with the cor sequence from any primer-dimers were removed from responding RefSeq, accession numbers; (2) if they belonged estimating sequencing errors. Primer-dimers used in to antisense sequences of RNAs, they were classified as “B” generating the SBS data were listed in Table 7: (for “backward') and annotated with the corresponding Ref I0120 (iii) The most abundant tags were identified from Seq accession numbers. It was common for a single SBS tag SBS tag groups. In the above example, tag "GAT to be annotated to multiple RNAs. For example, tag "GAT CAAATATCACTCTCCTA was identified as the most CAAAAAAACGTTCTTTG’ was classified as “F” and abundant tag in the group; annotated to RNAS “NM 00102.5091.1 and “NM 001090. I0121 (iv) SBS tag groups were removed from estimat 2'; and tag “GATCAAAAAAAAATTTTTGC was classi ing sequencing errors if their most abundant tags (1) had fied as “B” and annotated to RNAS NM 001136275.1 and US 2013/O157891 A1 Jun. 20, 2013

“NM 024595.2’. A total of 176,384 tags were classified as onciled along with information in the combined protein “F” and 168,605 as “B”. SBS tags that could not be annotated RefSeq sequence file. A total of 38,385 protein Refseq to RefSeq, accession numbers were removed from further accession numbers were assembled along with corre analysis. sponding genes and RNA RefSeq, accession numbers; 0129. Fourth, data was normalized to transcript per mil 0.135 (iii) SBS tags were mapped to protein RefSeq, lion (TPM) and all SBS data was assembled into a single file. accession numbers via their annotation to RNA RefSeq, Individual datasets were normalized by TPM, the same accession numbers and the mapping between protein method used for normalizing MPSS data 2.3. Briefly, a and RNA RefSeq, accession numbers; global normalization factor was calculated for each dataset by 0.136 (iv) SBS tags that could not be mapped to proteins dividing a million by the total count of all remaining SBS tags were removed from further analysis. A total of 31,867 in the dataset. Individual tag counts were then multiplied by SBS tags were kept. the normalization factor and rounded up to integers. Only 0.137 Seventh, the SBS tag counts were condensed to SBS tags with positive tag counts were kept for further analy protein abundance. It was common that multiple SBS tags sis. The number of remaining SBS tags in individual datasets were mapped to same proteins. To determine the abundance ranged from 27,864 tags in dataset “HCCHuHep” to 68.933 of proteins in our samples, the following steps were carried tags in dataset “HCC29. All remaining SBS data were out to condense the SBS tag counts to protein abundance: assembled into a single data file as a tag VS. dataset array. There were 192,647 unique SBS tags in the file. This file was 0.138 (i) For each protein, all SBS tags mapped to the used for downstream analysis. protein were collected; 0.139 (ii) The most abundant SBS tag (as evaluated by 0130 Fifth, SBS tags having normalized counts that were the total tag count in all datasets) was identified for the below a cutoff of 10 were removed from all samples. To protein; estimate the noise level in SBS data, replicate datasets gen erated from same samples were compared. For each pair of 0140 (iii) Less abundant SBS tags of the protein were replicate datasets, coefficients of variation (CVs) and maxi removed from further analysis if their abundance satis mum counts from counts of individual tags were calculated fied any of these three conditions: (1) their total tag count first. Tags with same maximum counts were then grouped in all datasets was less than half of that of the most together and the corresponding median CVs were calculated. abundant tag, (2) their highest count in all datasets was In the case where there were less than 100 tags in a group, tags less than 50, or (3) their Pearson correlation with the with lower and higher maximum counts were added to the most abundant tag was greater than 0.5. The majority of group until 100 or more tags were included. In the case where proteins kept their most abundant SBS tags after this 100 or more tags were included, the maximum count of the step. A few proteins however kept two comparable but group was replaced by the corresponding median. uncorrelated SBS tags, likely due to alternative splicing 0131 Two types of replicate datasets resulted: (1) datasets in the corresponding mRNAS; generated from different cDNA clones of same mRNA 0.141 (iv) SBS tags were also removed from further samples and (2) datasets generated in different sequencing analysis if they (1) could be mapped to another protein runs on same cDNA clones. FIG. 3 illustrates the median CV and (2) would be removed from that protein under con vs. maximum tag count for both types of replicate datasets. ditions listed above; Median CVs remained relatively flat for most values of tag 0.142 (v) Some SBS tags could be mapped to proteins count; however, a dramatic increase is shown as the tag count of multiple genes. In Such cases, predicted proteins were approached 10, indicating SBS data were no longer reliable at removed from the list of proteins that were mapped to that level. A cutoff of 10 was thereby selected as the noise the tags. SBS tags that were mapped to predicted pro level in SBS data. SBS tags having normalized counts that teins of multiple genes were removed from further were below the cutoff in all samples were removed from analysis; further analysis. A total of 32,853 SBS tags were kept. 0143 (vi) A total of 15,267 SBS tags were kept. Their 0132 Sixth, removed SBS tags that could not be mapped tag counts were used for measuring protein abundance to proteins were removed. Some SBS tags were annotated to in the samples. non-coding RNAS. Such tags were not useful for identifying 0144. Eighth, the quality of the SBS data was assessed, organ-specific proteins and needed to be removed from fur and outlier datasets were removed. To assess the quality of ther analysis. The following steps were carried out to deter SBS data in profiling human organs, unsupervised clustering mine which SBS tags to remove in accordance with this step: was carried out on the data. The distance between two 0.133 (i) Two files of protein RefSeq sequences were datasets was evaluated as 1-p, where p was the Spearman's downloaded from NCBI website: (1) “human-protein. rank correlation coefficient. The clustering was carried out on faa.gz' (37843 sequences, from ftp://ftp.ncbi.nih.gov/ R function “hclust” using a “single' method (see http://www. refseq/H sapiens/mRNA Prot/); and (2) “protein.fa. r-project.org/). The result was plotted in FIG. 4. Most datasets gz' (37391 sequences, from ftp://ftp.ncbi.nih.gov/ of same organs were clustered together or nearby. The excep refseq/H sapiens/H sapiens/protein?). Sequences in the tions were two datasets of muscle, two datasets of thymus and two files were combined and reconciled, which resulted five datasets of epithelial cells, which were clustered together in a list of 38,410 protein RefSeq sequences: regardless of their organ origins. The five datasets of epithe 0.134 (ii) Two files (“gene2accession.gz' and lial cells and the two datasets of hepatocytes and of pancreatic “gene2refseq.gZ') were downloaded from NCBI web islet cells were removed from further analysis. site (ftp://ftp.ncbi.nih.gov/gene/DATA/). The files con (0145 Ninth, the different datasets were condensed into tained the mappings between EntreZ genes, protein Ref data of different organs. AS listed in Table 6. Some organs Seq accession numbers and RNA RefSeq, accession included multiple samples and some samples generated mul numbers. Information in the files were parsed and rec tiple datasets. To compare protein abundance in different US 2013/O157891 A1 Jun. 20, 2013

organs, the SBS data of different datasets were condensed eases. Lung-specific proteins were uploaded to the NextBio into SBS data of different organs according to the following database (http://www.nextbio.com). The NextBio database is steps: a collection of results from most publicly available transcrip 0146 (i) Quantile-quantile (QQ) normalization 4 was tomic studies. We reviewed a total of 1,421 studies on human applied to datasets of same samples to reduce technical diseases and selected those studies that indicated at least one variations in the datasets. Protein abundance in the lung-specific protein for the diseases. The studies were sorted samples was then estimated by the corresponding from high to low by their correlation with lung-specific pro median in their belonging datasets; teins. The top 50 studies were listed in Table 9. 0147 (ii) QQ normalization was also applied to SBS 0156 Comparison Between Lung-Specific Proteins and data of samples of same organs to reduce biological Disease-Relevant Genes. variations in the samples. Protein abundance in the 0157. The results of the comparison of the 115 lung-spe organs was then estimated by the corresponding median cific proteins to the genes indicated in the transcriptomic in their belonging samples; studies identified by NextBio are illustrated in FIG. 2: Nine 0148 (iii) SBS tags whose counts were less than 10 in out of the top ten studies and 25 out of the top 50 studies were all 25 organs were removed from further analysis: related to lung diseases including lung cancers. This example 0149 (iv) The remaining 14.561 SBS tags were clearly demonstrates that organ-specific proteins are highly assembled in a tag VS. organ array and stored in a single indicative of diseases of the corresponding organ. file. 0158 To identify individual proteins that are indicative of lung diseases, we re-analyzed the data related to 115 lung Example 2 specific proteins and compared with the proteins that Identification and Relevance of Organ-Specific appeared in the top 26 studies on lung diseases. The results are Proteins summarized in Tables 1 and 2. 0159 Potential Biomarkers for Lung Diseases or Lung 0150. To evaluate whether a protein was organ specific, its Cancers. abundance in different organs was sorted from high abun dance to low abundance. More specifically, we sorted the SBS 0.160) Further, the top 10 studies on lung diseases (includ tag counts of the protein were sorted so that in 2n-2 ... 2ns. ing lung cancers) and the top 10 studies exclusively on lung wherein n, was the tag count in organ i. The protein was cancers were identified and the lung-specific proteins that specific to the first k organs if its tag counts satisfied all three were indicated in the studies were collected. The two sets of conditions listed below: lung-specific proteins were listed in Table 3 and Table 4. 0151 (i) Tag counts in the first k organs were at or above respectively. The proteins were sorted from high to low first the noise level of SBS data while those in other organs by their total occurrence in the corresponding studies and were below the noise level, i.e., n. 210 and n<10; then by their total weight in the studies. Since a study may 0152 (ii) Tag counts in the first k organs were signifi contain multiple datasets and a protein may be indicated in cantly above those in other organs. This condition was Some datasets, each protein in each study was weighed by the determined by application of an exact binomial test to fraction of datasets in which the protein was indicated. For the calculate the p value of distinguishing the drawing of n top 10 studies on lung diseases, SLC39A8 occurred in all tags from a total of Ss tags with the drawing of n. tags studies, 12 proteins (NKX2-1, SFTPB, C4BPA, SFTPD, from S2s tags, where Ss was the total tag count in all FAM65B, SFTPA2B, CEACAM6, CTSE, FOXA2, TREM1, organs. The difference was considered significant if the LRRC36, and ETV5) occurred 9 times, and 73 proteins two-sided p value was no greater than 0.05; and occurred at least 5 times. For the top 10 studies on lung 0153 (iii) The total tag count in the first k organs was at cancers, 5 proteins (SFTPB, CLDN18, SFTPD, CPB2 and least half of the total in all organs, i.e., S/Ss20.5, CEACAM6) occurred in all studies, 9 proteins (SLC39A8, where S was the total tag count in the first k organs. WIF1, NKX2-1, PPBP, ALOX15B, CTSE, SFTPC, FOXA2, 0154 Proteins were identified that were specific to up to and ETV5) occurred 9 times, and 69 proteins occurred at least five organs, i.e., ks5. Proteins specific to different organs 5 times. These proteins have a high potential to be biomarkers were summarized in Table 5. Proteins of different RefSeq, for the corresponding diseases. accession numbers but of same genes were grouped together 0.161 Definition of Organ-Specific Panels. and counted as single proteins. Proteins specific to more than 0162. As described in Example 1, organ-specific panel one organ were Summarized by number of proteins that cor proteins are specific to multiple organs. A panel of n proteins respond to each organ. As indicated in Table 5, a total of 2,648 is specific to an organ if the following two conditions are unique proteins were identified as organ specific and were satisfied: attributed to 4,239 entries. 0.163 (i) The n proteins are specific to the organ under the extended definition of organ-specific proteins, as Example 3 described herein; and 0.164 (ii) The joint specificity of the panel in the organ Identification of Lung-Specific Panel Proteins, is no less than 0.5. More specifically, assume the speci Lung-Specific Panels, and Relevance to Diagnosis of ficities of the p=1,..., n proteins in the o=1,..., M Lung-Related Diseases organs are {s,t} with S+S2+...+S-1 for all p. The 0155 To demonstrate the relevance of the organ-specific joint specificity of the panel in an organis then defined as proteins identified above to diseases of corresponding organs, Sc'ss. . . . *s, where c is a constant so that 115 lung-specific proteins (ks5) identified in Table 5 (**) S+S+...+S-1. The panel is specific to an organ if the were compared with genes that were identified in transcrip corresponding S20.5. Clearly a panel can be specific to tomic studies described above for many major human dis a single organ. US 2013/O157891 A1 Jun. 20, 2013

0165 A five-protein organ-specific, lung, panel was iden tiple Reaction Monitoring (MRM) assays across cohorts of tified by selecting five top-ranked lung cancer biomarkers (as lung cancer, non-cancerous lung disease and healthy control described above) that were not most abundant in the organ of blood samples. lung, but were present in lung. The five proteins developed by (0170 The panel of markers defined by the SBS data sets comparison of the SBS data set with the Nextbio analysis that correlate with each of the NextBio clinical studies listed were CLDN18, CPB2, WIF1, PPBP, and ALOX15B. None of below will be tested. The differentiation of the lung cancer the proteins was lung-specific under conventional definition groups by lung spot size is not available on the NextBio data of organ-specific proteins. As illustrated in FIG. 5, the panel sets, but we anticipate that marker expression levels will be was 100% lung-specific. As discussed above, all five proteins significantly increased or decreased based on degree of strati (and thus the panel) were highly indicative for lung cancers. fication of disease. This illustrates that a protein or a panel of proteins that are (0171 Samples. associated with an organ-associated disease do not need to be 0172. The table below describes the sample cohorts that specific to that organ alone. A protein or a panel of proteins will be used in a clinical study to evaluate the effectiveness of may be primarily specific to several different organs, yet be the lung-specific proteins as biomarkers of lung cancer after highly indicative for a disease in a completely different organ. detection of a lung spot by imaging. The major cohorts in the study are non-small cell lung cancer (NSCLC) samples and Example 4 non-cancer groups. Evaluation of Lung-Specific Panels as Biomarkers of Lung Cancer Major Cohort Minor Cohort 0166 Lung diseases encompass many disorders affecting Non-Cancer Granulomatous Lung Disease the lungs, such as asthma, chronic obstructive pulmonary Groups Chronic Obstructive Pulmonary Disease Chronic Lung Disease (includes IPF) disease, infections like influenza, pneumonia and tuberculo Normal - Smoker sis, lung cancer, and many other breathing problems. Among Normal - Nonsmoker cancers, lung cancer is the primary cause of cancer death Cancer Groups Lung Cancer <10 mm among both men and women in the U.S. More than 219,000 (NSCLC) Lung Cancer 10 mm to 14 mm Lung Cancer 15 mm to 19 mm Americans will be diagnosed with lung cancer (approxi Lung Cancer 20 mm and larger mately 15 percent of new cancer cases). More than 159,000 Advanced stage lung cancer will die from the disease, according to the American Cancer Lung cancer with previous cancers Society (2009). Although lung cancer accounts for 15 percent Lymphoma of cancer cases in the United States, it accounts for 28 percent of cancer death as lung cancer typically isn’t diagnosed until 0173 The cancer cohort is subdivided by lung spot size later and intractable stages, when efficacy of treatment is (<10 mm, 10 mm to 14 mm, 15 mm to 19 mm and 20 mm or reduced. larger). Also included are advanced stage lung cancer (which 0167 Early detection of lung cancer is difficult since clini can present with spots of any size), lung cancer as possible cal symptoms are often not present until the disease has metastasis and lymphoma. It is anticipated that as tumor size reached an advanced stage. Currently, diagnosis is aided by gets larger so does the likelihood of detecting a blood-based the use of chest X-rays, analysis of the type of cells contained tumor marker. Hence, the parsing of lung cancer samples by in sputum and fiberoptic examination of the bronchial pas size of spot detected by imaging. sages. Detection of lung cancer using low-dose computed 0.174. The non-cancer cohort includes confounding lung tomography, (CT) can identify many abnormalities in diseases (granulomatous lung disease, COPD, IPF) that may patients lungs. Unfortunately, this method has proven to be cause spots to appear on a CT scan or X-ray as well as healthy inefficient as CT scans show abnormalities that are not can controls, both Smokers and non-Smokers. cerous. CT scanning produces false positive results for cancer 0.175. The samples will be blood samples drawn before a third of the time. The rate of false positives related to CT tissue confirmation of disease (non-disease) state. scanning is twice the rate of standard X-ray screening and 0176 Circulating biomarkers of lung cancer will be able often leads to invasive and potentially harmful follow-up tests to distinguish samples with lung spots above a certain size including Surgery. Treatment regimens are determined by the (e.g., 10 mm) from non-cancer groups. type and stage of the cancer, and include Surgery, radiation therapy and/or chemotherapy. (0177 Assay Development. 0.178 Multiple Reaction Monitoring (MRM) is a mass 0168 Early detection of primary, metastatic, and recurrent spectrometry-based assay that enables highly multiplexed disease can significantly impact the prognosis of individuals assays to be developed rapidly 7. Depending on assay Suffering from lung cancer. Non-Small cell lung cancer diag parameters and mass spectrometric device, up to 100 protein nosed at an early stage has a significantly better outcome than assays can be multiplexed into a single MRM sample analysis when diagnosed at more advanced stages. Similarly, early 8. Hundreds of protein assays can be performed on a single diagnosis of Small cell lung cancer potentially has a better blood sample via aliquoting the sample. prognosis. Accordingly, there is a great need for more sensi 0179 MRMassays for all lung-specific panel proteins will tive and accurate assays and methods to measure health and be developed. Typically, two peptides and two transitions per detect disease and monitor treatment at earlier stages. peptide will be monitored for each protein giving four data 0169. Using the methods of the invention, panels of lung points per assay. Synthetic peptides will be utilized to develop specific proteins will be assessed as circulating biomarkers of the MRM assays thereby determining peptide retention time lung cancer. Markers will be analyzed using large scale Mul and transition masses. Due to the number of proteins (over US 2013/O157891 A1 Jun. 20, 2013

100) the protein assays will be grouped into two or three proteins. As this process is computing intensive, heuristic batches for separated MRM runs. search algorithms can be used to search the space of all panels 0180. In addition to the lung-specific panel proteins of size k. included in the MRM assays, lung-nonspecific markers of 0188 It is appreciated that certain features of the inven lung-cancer and/or lung-disease will be included in the MRM tion, which are, for clarity, described in the context of separate assays. These markers will be obtained from the literature or embodiments, may also be provided in combination in a from proprietary databases. These markers are added as it single embodiment. Conversely, various features of the may be the case that a diagnostic panel for lung cancer invention, which are, for brevity, described in the context of a includes both lung specific and non-specific markers. single embodiment, may also be provided separately or in any 0181 Sample Runs. suitable sub-combination. 0182 Each sample will be divided into 2 or 3 aliquots for 0189 Although the invention has been described in con MRM runs. Samples will be spiked with peptide standards for junction with specific embodiments thereof, it is evident that normalization of quantification across sample runs. Samples many alternatives, modifications and variations will be appar from each cohort will be matched based on clinical data ent to those skilled in the art. Accordingly, it is intended to (gender, age, collection site, etc.) and matched samples will embrace all Such alternatives, modifications and variations be run sequentially through the MRM assays to minimize that fall within the spirit and broad scope of the appended analytical bias. Protein assay measurements will be obtained claims. All publications, patents and patent applications men for each protein in each sample. tioned in this specification are herein incorporated in their 0183 Panel Evaluation. entirety by reference into the specification, to the same extent 0184. Due to the large number of protein assays, absolute as if each individual publication, patent or patent application quantification of each protein will not be determined via was specifically and individually indicated to be incorporated labeled peptides because of cost. Instead, normalized relative herein by reference. In addition, citation or identification of protein abundance across sample cohorts will be obtained. As any reference in this application shall not be construed as an the purpose is to Verify which lung-specific proteins are blood admission that such reference is available as prior art to the biomarkers of lung cancer, relative quantification of proteins present invention. is sufficient. REFERENCES 0185. For each protein, a statistical test (such as a false 0.190 1 Marioni J C, Mason C E. Mane S M, et. al. discovery rate adjusted one-side paired t-test) will be used to RNA-seq: an assessment of technical reproducibility and determine if the protein distinguishes cancerous samples comparison with gene expression arrays. Genome Res. above a certain spot size (say, e.g., 10 mm) from non-cancer 2008; 18(9): 1509-17. ous samples. Pairing of Samples in the statistical test will be (0191) 2) Jongeneel CV. Delorenzi M, Iseli C, et. al. An determined by the matching of samples as described above. atlas of human gene expression from massively parallel As there are four data points per protein, at least three of the signature sequencing (MPSS). Genome Res. 2005; 15(7): four data points must exhibit a significant statistical differ 1007-14. CCC. (0192 (3) Stolovitzky G A Kundaje A, Held GA, et. al. 0186 To verify that a specific panel of proteins (either all Statistical analysis of MPSS measurements: application to lung-specific proteins or a particular Subset of the lung-spe the study of LPS-activated macrophage gene expression. cific proteins) is, collectively, a diagnostic panel that distin Proc Natl AcadSci USA. 2005; 102(5): 1402-7. guishes cancerous samples above a certain spot size (e.g., 10 (0193 (4 Bolstad BM, Irizarry RA, Astrand M, Speed T mm) from non-cancerous samples, the following analysis is P. A comparison of normalization methods for high density performed. All data points for the proteins on the panel are oligonucleotide array data based on variance and bias. treated as if data points from a single protein and Submitted to Bioinformatics. 2003: 19(2): 185-93. the paired statistical test. If the false discovery rate adjusted (0194 5) Su Al, Wiltshire T, Batalov S, et. al. A geneatlas p-value of this test is significant (e.g., below 5%) then the of the mouse and human protein-encoding transcriptomes. panel is verified as diagnostic. The false discovery rate can be Proc Natl AcadSci USA. 2004: 101(16): 6062-7. i estimated using many methods including permutation testing (0195 (6) Hood L, Heath J R, Phelps ME, Lin B. Systems where the samples from all cohorts are iteratively randomized biology and new technologies enable predictive and pre to provide an estimate of the false discovery rate. ventative medicine. Science. 2004; 306(5696): 640-3. 0187. As a final measure, a search strategy to find novel 0.196 7. High sensitivity detection of plasma proteins by panels of lung specific and/or non-specific markers of lung multiple reaction monitoring of N-glycosites, Stahl-Zeng, cancer will be employed. More specifically, let k denote the Jianru et al., Molecular and Cellular Proteomics, 6 (10), number of proteins on a proposed diagnostic panel. Let n be 2007. the total number of lung specific and non-specific proteins in 0.197 8 High-throughput generation of selected reac the MRM assay. For every selection of k proteins from the tion-monitoring assays for proteins and proteomes, Picotti, total number n, perform the diagnostic statistical test Paola et al., Nature Methods, 7 (1), 2010. described above to determine if that panel of k proteins is 0198 (9 WO/2008/021290 “ORGAN-SPECIFIC PRO diagnostic. This process is repeated for every selection of k TEINS AND METHODS OF THEIR USE US 2013/O157891 A1 Jun. 20, 2013 16 US 2013/O157891 A1 Jun. 20, 2013 17 US 2013/O157891 A1 Jun. 20, 2013 18 US 2013/O157891 A1 Jun. 20, 2013 19 US 2013/O157891 A1 Jun. 20, 2013 20 US 2013/O157891 A1 Jun. 20, 2013 21 US 2013/O157891 A1 Jun. 20, 2013 22 US 2013/O157891 A1 Jun. 20, 2013 23 US 2013/O157891 A1 Jun. 20, 2013 24 US 2013/O157891 A1 Jun. 20, 2013 25 US 2013/O157891 A1 Jun. 20, 2013 26 US 2013/O157891 A1 Jun. 20, 2013 27 US 2013/O157891 A1 Jun. 20, 2013 28 US 2013/O157891 A1 Jun. 20, 2013 29 US 2013/O157891 A1 Jun. 20, 2013 30 US 2013/O157891 A1 Jun. 20, 2013 31 US 2013/O157891 A1 Jun. 20, 2013 32 US 2013/O157891 A1 Jun. 20, 2013 33 US 2013/O157891 A1 Jun. 20, 2013 34 US 2013/O157891 A1 Jun. 20, 2013 35 US 2013/O157891 A1 Jun. 20, 2013 36 US 2013/O157891 A1 Jun. 20, 2013 37 US 2013/O157891 A1 Jun. 20, 2013 38 US 2013/O157891 A1 Jun. 20, 2013 39 US 2013/O157891 A1 Jun. 20, 2013 40 US 2013/O157891 A1 Jun. 20, 2013 41 US 2013/O157891 A1 Jun. 20, 2013 42 US 2013/O157891 A1 Jun. 20, 2013 43 US 2013/O157891 A1 Jun. 20, 2013 44 US 2013/O157891 A1 Jun. 20, 2013 45 US 2013/O157891 A1 Jun. 20, 2013 46 US 2013/O157891 A1 Jun. 20, 2013 47

US 2013/O157891 A1 Jun. 20, 2013 49 US 2013/O157891 A1 Jun. 20, 2013 50 US 2013/O157891 A1 Jun. 20, 2013 51 US 2013/O157891 A1 Jun. 20, 2013 52 US 2013/O157891 A1 Jun. 20, 2013 53 US 2013/O157891 A1 Jun. 20, 2013 54 US 2013/O157891 A1 Jun. 20, 2013 55 US 2013/O157891 A1 Jun. 20, 2013 56 US 2013/O157891 A1 Jun. 20, 2013 57

US 2013/O157891 A1 Jun. 20, 2013 59 US 2013/O157891 A1 Jun. 20, 2013 60 US 2013/O157891 A1 Jun. 20, 2013 61 US 2013/O157891 A1 Jun. 20, 2013 62 US 2013/O157891 A1 Jun. 20, 2013 63 US 2013/O157891 A1 Jun. 20, 2013 64 US 2013/O157891 A1 Jun. 20, 2013 65 US 2013/O157891 A1 Jun. 20, 2013 66 US 2013/O157891 A1 Jun. 20, 2013 US 2013/O157891 A1 Jun. 20, 2013 68

TABLE 2 Gene Unk(2)ene Gene ID Symbol Gene Name Synonyms Entry

153 ADRB1 (2), beta-1-, receptor ADRB1RB1ARBETA1AR S.999.13 RHR 181 AGRP agouti related protein homolog AGRTARTASC2)P2|MGC118963 S.104633 (mouse) 247 ALCX1SB (2) 15(2)poxygenase, 15-LOX-2 S.111256 type B 344 APOC2 apolipoprotein C-II S.75615 722 C4BPA complement component 4 HS.1012 binding protein, alpha 1084 CEACAM3 (2) antigen CD66DCEACGM1MGC119875 Hs.11 related cell adhesion molecule 3 W264W282 1361 CPB2 carboxypeptidase B2 (plasma) PUPCPBTAR S.S12937 1510 CTSE cathepsin E ATE S.644082 1669 DEFA4 defensin alpha 4, corticostatin EF4HNP-4HP-4 S.S91391 P4MGC12O099MGC138296 1755 DMBT1 deleted in malignant brain P340MGC164738muclin S.2796.11 umors 1 1991 ELANE elastase, neutrophil expressed LA2GEIHLEHNEINE S.998.63 MN E 2119 ETV5 ets variant 5 RM S.43697 2266 FGG fibrinogen gamma chain S.S462SS 2295 FOXF-2 orkhead box F2 KHL6FREAC2 S.484423 2352 FOLR3 olate receptor 3 (gamma) R-GFR HS.352 ammalgamma-NFR 2525 FUT3 lucosyltransferase 3 D174FT38FucT S.169238 (galactoside 3(4)-L- IIILEILesMGC131739 lucosyltransferase, Lewis blood group 2921 CKCL3 chemokine (C-X-C motif) S.89690 igand 3 31 O1 HK3 hemokinase 3 (white cell) S.411695 3170 FOXA2 orkhead box A2 S.155651 3577 IL8RA interleukin Breceptor, alpha S.194778

3579 interleukin B receptor, beta HS.846 39.18 LAMC2 laminin, gamma 2 s S.591484 4317 MMP8 matrix metallopeptidase 8 S.161839 neutrophill collagenase) 4318 MMP9 matrix metallopeptidase 9 LG4BGELBMMP-9 S.297413 (gelatinase B, 92 kDa gelatinase, 92 kDa type IV collagenase) 4332 MNDA myeloid cell nuclear PYHIN3 S.153837 differentiation antigen 4585 MUC4 mucin 4, cell Surface associated HSA276359 S.369646 468O CEACAM6 carcinoembryonic antigen CD66cCEANCA S.460814 related cell adhesion molecule 6 (non-specific cross reacting antigen) 4778 NFE2 nuclear factor (erythroid S.75643 derived 2), 45 kDa 4821 NKK2-2 NK2 homeobox2 NKK22NKK2B S.S10922 5473 PPBP pro-platelet basic protein B-TG1 Beta-TGCTAP HS.2164 (chemokine (C-X-C IIICTAPSCTAPIIICKCL7 motif ligand 7) LA PF4LDGFMDGFINAP 2PBPSCYB7ITC1 TC2 TGBTGB2ITHBGBTHBGB1 5657 PRTN3 proteinase 3 ACPAAGP7C HS.328 ANCACANCAMBNIMBT NP4P29IPR-3PR3 5923 RASGRF1 Ras protein-specific guanine CDC25CDC25LGNRPGRF1 HS.459035 nucleotide-releasing factor 1 GRFSSH GRFSSPP131.87 6323 sodium channel, voltage-gated, FEB3|GEPSP2HSSCINAC1 HS.2266.4 type I, alpha subunit Nav1.1|SCN1|SMEI 6361 CCL17 chemokine (C-C motif) ligand 17 A-15285-3 ABCD HS.546294 2MGC138271MGC138273 US 2013/O157891 A1 Jun. 20, 2013 69

TABLE 2-continued SCYA17ITARC 6364 CCL20 chemokine (C-C motif) ligand 20 Ckb4LARCMIP S.75498 3bMIP3ASCYA2OST38 SFRPS secreted frizzled-related protein S SARP3 S.27956S 6436 SFTPA2B surfactant protein A2B ACO68139.3SFTPA2SP S.523O84 2ASP-A1SP-A2SPAII S.535295 S.71915S 6439 SFTPB surfactant protein B PSP S.512690 SFTB3SFTP3 SMDP1

6440 SFTPC Surfactant protein C PSP-CSFTP2SMDP2SP-C S.1074 6441 SFTPD surfactant protein D COLEC7PSP SFTP4SP-D 6532 solute carrier family 6 HTTS S.134.662 (neurotransmitter transporter, TTLPRISHTTHTTIOCD1 serotonir(2), member 4 sERTInSERT 6868 ADAM17 ADAM metallopeptidase D1568CSVPMGC71942 S.404914 domain 17 AC E 708O NKK2-1 NK2 homeobox 1 HBHCINK S.94367 IOC2.1NIOC2ATEBP F1 ITTF-1 ITTF1 7356 secretoglobin, family 1A, OCC16|CCSPUGB S.S23732 member 1 (uteroglobin) 8796 SCEL scC2in LJ21667MGC22531 S.S34699 8807 IL18RAP interleukin 18 receptor S.158315 accessory protein 8972 MGAM maltase-glucoarylase (alpha S.122785 glucosidase) 8999 CDKL2 cyclin-dependent kinase-like 2 S.S93698 (2) CDC2-related kinase) solute carrier family 7 (cationic S.S13147 amino acid transporter, y+ system), member 7 9173 interleukin 1 receptor-like 1 D ER4 FIT S.66 1. MGC32623ST2ST2

9476 NAPSA napsin Aaspartic peptidase S.714418

9496 TBX4 T-box 4 SPS S.143907 95O2 XAGE2 X antigen family, member 2 S.522654 9750 FAM6SB family with sequence similiarity S.S59459 65, member B 9914 ATP2C2 ATPase, Ca++ transporting, type S.6168 2C, member 2 SPCA2 10675 CSPGS chondroitin sulfate proteoglyca) MGC44084NGC S.451.27 S (neuroglycan C) 1100S SPINIS serine peptidase inhibitor, Kasal D S.331SSS type 5 FLJ97536FLJ975.96 FLJ997.94LEKTILETI3 NETSNSIVAKTI 11082 ESM1 endothelial cell-specific endocC2) S.129944 molecule 1 11197 WIF1 WNT inhibitory factor 1 WIF-1 S.284122 11254 SLC6A14 solute carrier family 6 (amino ATEGIBMIQ11 S.522109 acid transporter), member 14 23.569 PADI4 peptidylarginine(), PAD pad4|PADISPDI4) S.522969 type IV PDIS 23.584 WSIG2 V-set and immunoglobulin 2210413P1ORikCTHCTXL S.112377 domain containing 2 25975 EGFL6 EGF-like-domain, multiple 6 S.12844 W8O 26253 CLEC4E C-type lectin domain family 4, CLECSP9MINCLE S.236,516 member E 27074 LAMP3 lysosomal-associated CD2O8DC S.518448 membrane-protein 3 LAMPDCLAMPLAMPTSC4CB 29992 PILRA paired immunoglobin-like type 2 FDFO3 S.4444O7 receptor alpha SO487 PLA2G3 phopholipase A2, group III GIN-S PLA2ISPLA2IN S.1496.23 S1208 CLDN18 claudin 18 SFTASSFTPJ S.6SS324 51267 CLEC1A C-type lectin domain family 1, CLBC1 MGC34328 S.29549 member A 53905 DUOX1 dual oxidase 1 LNOX1MGC13884OMGC138841 S.272.813 NOXEF11THOX1 S4210 TREM1 triggering receptor expressed TREM-1 on myeloid cells 1 55118 CRTAC1 cartilage a acidic protein 1 S.SOO736 US 2013/O157891 A1 Jun. 20, 2013 70

TABLE 2-continued 55282 URRC36 eucine rich repeat containing FLJ11004RORBP7OXLHSRF2 S.125139 36 56948 SOR39U1 short chain S.643552 dehydrogenase/reductase S.713590 amily 39U, member 1 57126 CD177 CD177 molecule HNA2AINB1 PRV1 S.2321.65 57214 KIAA1199 KIAA1199 TMEM2L S.459088 64116 SLC39A8 solute carrier family 39 (zinc BIGM103LZT S.288O34 ransporter), member 8 HS6PP3105 ZIP8 64,581 CLECTA C-type lectin domain family 7, BGRICLECSF12DBCTIN1 S.143929 member A 8O329 ULBP1 UL16 binding protein 1 S.6532SS 81027 TUBB1 ubulin, beta 1 dS43119.4 S.3O3O23 84106 PRAM1 PML-RARA regulated aG) MGC39864PML S.465812 molecule 1 RARPRAM-1 89822 KCNK17 potassium channel, Subfamily K, K2p17.1|TALK S.162282 member 17 2TALK2TASK-4TASKA 90273 CEACAM21 carcinoembryonic antigen CEACAM3FLJ1354OMGC119874 related cell adhesion molecule R29124 1 21 GGTLC1 gamma-glutarnyltransferase light chain 1 92747 C20orf114 chromosome 20 open reading frame 114 14548 NLRP3 NLR family, pyrin domain AGTAVPRLAHAI1 AVP S.159483 containing 3 AVPC1orf7cias1|CLR1.1 FCASFCUFLJ95925 MWSNALP3IPYPAF1 1SO19 solute carrier family 26, member 9 17156 secretoglobin, family 3A, LU103PNSP1 UGRP1 S.483765 member 2 26O14 OSCAR osteoclast associated, MGC33613 PIGR3 S.3476SS immunoglobulin-like receptor 286O2 chromosome 20 open reading LLC1bA196N14.1 S.43977 frame 85 44448 TSPAN19 tetraspanin 19 FLJ44351 S.156962 46429 LOC146429 Putative solute carrier family 22 S.447544 member ENSG 00000182157 57310 PESP4 phosphatidylethanolamine S.491242 binding protein 4

95814 SDR16CS short chain dehydrogenase/reductase family 16C, member 5 2OOO10 SLCSA9 solute carfier family 5 MGC132517MGC132523 S.378.90 (sodium glucose cotransporter), SGLT4 member 9 2OOSO4 gastrokine 2 S.16757 2O3190 leucine-rich repeat LGI family, S.33470 member 3 219790 rhotelin 2 DKFZp686I10120 IPLEKHK1. S.S8.559 bAS31F24.1 21.9995 MS4A15 membrane-spanning 4 FLJ34527MGC35295 domains, Subfamily A, member 15 221472 FGD2 FYVE, RhoGEF and PH domain S.SO9664 containing 2 222487 GPR97 G protein-coupled receptor 97 S.383403 253970 SFTA3 Surfactant associated 3 S.SO916S 284.340 CKCL17 chemokine (C-X-C motif) S.445586 ligand 17 339145 FAM928 family with sequence similarity S.125713 92, member B 3531.89 solute carrier organic anion OARP-FHOATP S.12764 transporter family, member 4C1 M1|OATP4C1|OATPXPROZ176 SLC21A2O 3.87914 SHISA2 shisa homolog 2 (Menopus C13orf13 PRO286.31 TMEM46 S.433791 (2)) WGAR9166bA287O19.2 hShisa 388743 CAPN8 calpain 8 nCL-2 S.291.487 S.67O199 389376 SFTA2 Surfactant associated 2 GSLS41SFTPGUNOS41 S.211267 401546 C9Crf152 open reading MGC131682b A47012O2 S.125608 frame 152 644524 NK2 homeobox 4 S.456.662

US 2013/O157891 A1 Jun. 20, 2013 75

TABLE 2-continued 4317 Spleen Lung O.88571 105 19 4318 Spleen LymphNodel O.80672 119 11 Lung 4332 Spleen Monocyte Lung 0.7991.5 234 15 0.01797385 Thymus Lymph Node 4585 Trachea Prostate(2) O.99045 1257 28 4.51E-09 Lung Lung Trachea Small O.94884 215 34 9.2OE-10 intestine 4778 Spleen Monocytes Lung 78 14 O.OO164659 48.21 Lung 112 93 8.12E-70 5473 Monocytes Spleen Lung 2O2 19 O.OOO15646 5657 LiverSpleen Lung 350 25 4.32E-09 5923 BrainLung O.88889 36 16 2.24 6323 BrainLung 0.97778 45 12 2.57 6361 Lung 0.70769 65 46 3.30 6364 Traches Moncytes. O.91441 222 16 O.O3684.798 Lung|LymphNode SmallIntestine Pancreas O.86364 154 18 7.68E-OS Cervix Lung 6436 Lung Tested Prostate O.9994.6 22049 21 1.53E-OS 6439 Lung O.99834 2413 2409 O 6440 Lung O.9995 1006S 10060 O 6441 Lung O. 9433 388 366 O 6532 SmallIntestineLung O.95.283 106 19 3.27E-08 6868 Monocyted Lung O.S 98 O.OO879528 708O Lung O.83O86 112 93 8.12E-70 7356 Lung TrachealProstate O.99744 1954 162 2.5OE-66 8796 Skin Lung O. 94118 170 50 4.48E-19 8807 Lung Spleen Lymph 0.77953 127 16 O.O1510884 Node 8972 SmallIntestineLung O.9405 437 29 Spleen 8999 BrainLung Kidney 344 0.7515.2 16S 17 O.OO686636 Testes 90S6 Lung O.S 38 19 2.43E-05 9173 Lung Kidney O.91515 330 89 7.46E-36 9476 Lung Kidney 0.97537 1137 83 1.77E-31 9496 Lung O.S 26 13 O.OOOS3352 95O2 Lung 1 12 12 O 9750 Lymphocytes Monocytes O.84.818 303 21 O.0043658.2 Spleen Lymph Node Lung 9914 Traches Prostate Lung O.91.99 412 21 O.OO476179 SkinSmall Intestine 10675 BrainLung 1OO 33 1.28E-13 11 OOS Skin Lung 88 11 O.OO666.219 11082 Lung 16 10 S.S2E-06 11197 BrainLung 51 15 7.4OE-OS 11254 Triachea Lung 147 34 111E-08 23.569 Spleen Monocytes Lung 170 12 O.O1058896 23584 Stomach Prostate Triachea 284 17 O.OOO24028 Bladder Lung 25975 Breast Lung O.86391 169 40 2.6OE-10 262S3 Monocytes Lung Spleen O.868O2 197 22 O.OOO2416 27074 Lung Testes Lymph i O.92.267 375 19 O.O175883 Node 29992 Monocytes Lung Spleen O.71895 306 30 7.27E-06 SO487 Lung Skin O.7812S 32 10 O.OO656645 S1208 StomachLung O.9956 910 342 2.53E-177 51267 Cervix Lung Thymus 195 2O O.OO636968 Lymph Node 53905 Lung Trachea Skin 4 289 24 Prostate S4210 Monocytes Lung O.95993 658 94 7.53E-33 55118 Lung Bladder O.7O186 161 48 3.29E-15 55282 Testes Lung O.88732 142 38 3.09E-13 56948 Ovary Uterus Lung O.91.538 130 18 O.OO128318 57126 Prostate|Spleen Lung O.83784 185 24 O.OOO40478 57214 Lymph Node Thymus O. 92691 643 17 O.O48.32478 Trachea Lung Testes 64116 Lung O.629O3 62 39 4.25 64,581 Monocytes Lung O.78341 217 28 2.65E-07 8O329 Testes Lung 0.66667 45 10 O.OO192OOS 81027 Monocytes Spleen Lung O.94.631 149 11 O.OO713838 84106 Monocytes Spleen Lung O.6087 69 13 O.O1269249 US 2013/O157891 A1 Jun. 20, 2013 76

TABLE 2-continued 89822 Muscle|Lung 2 O.60976 123 25 O.OOO11395 90273 Spleen Lung|Lymph 3 O.SOS88 85 10 O.O4224.856 Node 92.086 Lung Testes 2 1 87 40 6.2OE-24 92747 Trachea Lung 2 O.98832 1456 23 O.OOO19723 114548 Monocytes Lung 2 O.8O876 251 18 0.00897576 115019 Stomach LungHeart 3 O.88991 109 11 O.OO13 OO68 117156 Lung|Trachea 2 O. 99814 537 121 9. OSE-58 126014 Monocytes Lung 2 O.68852 61 15 O.O1656,776 128602 Trachea Testes Lung 3 O.96796 437 18 O.O2918739 144448 Trachea Lung|Testes 3 O.98.182 55 10 O.OO341.98 146429 Lung 1 O.74O74 27 2O 1.OSE-11 157310 Muscle|LungHeart 3 0.858.33 120 22 O.OO1283.71 1958.14 Lung|SkinTrachea 4 O.95.114 307 24 1.56EOS Small Intestine 200010 Small Intestine Liver 3 O.84211 152 18 O.O22943.63 Lung 200504 Stomach Lung 2 0.99573 1172 57 3.23E-21 203190 Brain Lung 2 O.94937 79 2O 7.OOE-08 219790 Lung 1 0.59574 47 28 3.76E-12 219995 Lung 1 0.87719 57 50 1.27E-43 221472 Monocytes|Lymphocytes. 5 O.67816 174 6 O.3484.967 Spleen Lung Lymph Node 222487 Spleen Lung 2 O.64286 84 21 111E-06 253970 Lung|Trachea 2 O.98477 394 5 0.00517536 284340 Trachea Lung Stomach 5 O.98854 1396 44 6.6OE-12 Prostate Pancreas 339145 Trachea Lung Brain 5 0.97059 442 3 O.2273.54 Lymph NodeISpleen 353189 Kindney Lung Liver 3 O.88S25 61 O O.O1394215 387914 Kindney Trachea Lung 4 O.74839 131 3 O.OOSOO661 Muscle 388743 Stomach Lung 2 O.66234 77 6 O.O106598 389376 Lung 1 0.95758 495 474 O 401546 Lung Stomach Prostate 5 O.88 1OO 6 O.OOO31355 Small Intestine Trachea 644524 Lung 1 O.83O86 112 93 8.12E-70 653509 Lung Testes Prostate 3 O.99946 22049 21 1.53E-OS 728242 Lung 1 1 12 2 O 729238 Lung Testes Prostate 3 O.99946 22049 21 1.53E-OS 1E+08 Monocytes Lung|Spleen 3 O.646SS 116 O O.4593421 1E+08 Lung 1 O.88 25 22 8.81E-18

(2) indicates text missing or illegible when filed

TABLE 3 Lung Disease genes from top 10 studies GeneID GeneSymbol GeneName Occurance Weight 64116 LC39A8 solute carrier family 39 (zinc transporter), member 8 1 6.121078 708O NK2 homeobox 1 6.171765 6439 FTPB surfactant protein B 6.167255 722 4BPA complement component 4 binding protein, alpha 5.568627 6441 TPD surfactant protein D S.321471 9750 AM6SB family with sequence similarity 65, member B S.O8823S 6436 TPA2B surfactant protein A2B 4.852451 468O ACAM6 carcinoembryonic antigen-related cell adhesion molecule 6 4.818922 1510 cathepsin E 4.72O784 3170 XA2 forkhead box A2 4708627 S4210 EM1 triggering receptor expressed on myeloid cells 1 4.698.922 55282 RC36 leucine rich repeat containing 36 4.302745 2119 5 ets variant 5 4.141765 9476 PSA napsin Aaspartic peptidase 5.778431 S1208 DN18 claudin 18 5.071078 11197 F1 WNT inhibitory factor 1 4.861569 92O86 GTLC1 gamma-glutamyltransferase light chain 1 4.848,333 2266 fibrinogen gamma chain 4.818824 8999 cyclin-dependent kinase-like 2 (CDC2-related kinase) 4.7898.04 2OOSO4 :N 2 gastrokine 2 4.760588 1361 carboxypeptidase B2 (plasma) 466 1078 64.40 SFTPC Surfactant protein C 4.580.098 247 ALOX1SB arachidonate 15-, type B 4490098 US 2013/O157891 A1 Jun. 20, 2013 77

TABLE 3-continued Lung Disease genes from top 10 studies GeneID GeneSymbol GeneName Occurance Weight S390S DUOX1 dual oxidase 1 8 4.3593.14 1755 DMBT1 deleted in malignant brain tumors 1 8 4.148.333 5473 PPBP pro-platelet basic protein (chemokine (C X-C motif) ligand 7 8 4.0981.37 5923 RASGRF1 Ras protein-specific guanine nucleotide-releasing factor 1 8 3.9066.67 9914 ATP2C2 ATPase, Ca++ transporting, type 2C, member 2 8 3.471667 1958.14 SDR16CS short chain dehydrogenase/reductase family 16C, member 7 4.809608 253970 SFTA3 Surfactant associated 3 7 4.6272SS 11254 SLC6A14 Solute carrier family 6 (amino acid transporter), member 14 7 4.391667 23S84 VSIG2 V-set and immunoglobulin domain containing 2 7 4.358431 153 ADRB1 adrenergic, beta-1-, receptor 7 4.2792.16 27074 LAMP3 lysosomal-associated membrane protein 3 7 4.231667 SS118 CRTAC1 cartilage acidic protein 1 7 4.096667 8796 SCE Sciellin 7 4.016667 7356 SCGB1A1 Secretoglobin, family 1A, member 1 (uteroglobin) 7 3.932549 10675 CSPG5 chondroitin Sulfate proteoglycan 5 (neuroglycan C) 7 3.7396.08 4332 MNDA myeloid cell nuclear differentiation antigen 7 3.658431 2295 FOXF2 forkhead box F2 7 3.532255 64581 CLEC7A C-type lectin domain family 7, member A 7 3.SO1667 2921 CXCL3 chemokine (C X-C motif) ligand 3 7 3.365784 458S MUC4 mucin 4, cell Surface associated 7 2.400098 389376 SFTA2 Surfactant associated 2 6 4.366667 388743 CAPN8 calpain 8 6 3.9866.67 6532 SLC6A4 solute carrier family 6 (neurotransmitter transporter, sero(2) 6 3.748O39 284340 CXCL17 chemokine (C X-C motif) ligand 17 6 3.326667 15731O PEBP4. phosphatidylethanolamine-binding protein 4 6 3.177255 57214 KIAA1199 KIAA1199 6 2.881.078 4318 MMP9 matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 6 2.625,784 221472 FGD2 FYVE, RhoGEF and PH domain containing 2 6 2.2798.04 29992 PILRA paired immunoglobin-like type 2 receptor alpha 6 2.2793.14 3918 LAMC2 aminin, gamma 2 6 2.066765 344 APOC2 apolipoprotein C-II 6 1937647 2OOO10 SLC5A9 Solute carrier family 5 (sodium glucose cotransporter), mer 5 3.58.3333 146429 LOC146429 Putative solute carrier family 22 member ENSG0000018215(2) 5 3.53 25975 EGFL6 EGF-like-domain, multiple 6 5 3.2566.67 S1267 CLEC1A C-type lectin domain family 1, member A 5 3.245098 11SO19 SLC26A9 solute carrier family 26, member 9 5 2.99 2525 FUT3 lucosyltransferase 3 (galactoside 3(4)-L-fucosyltransferase, 5 2.88. 90S6 SLC7A7 solute carrier family 7 (cationic amino acid transporter, y + 3) 5 2.84.6078 6868 ADAM17 ADAM metallopeptidase domain 17 5 2.7 401546 C9orf152 chromosome 9 open reading frame 152 5 2.588824 89822 KCNK17 potassium channel, Subfamily K, member 17 5 2.54 63.23 SCN1A Sodium channel, voltage-gated, type I, alpha subunit 5 2.52.1961 117156 SCGB3A2 secretoglobin, family 3A, member 2 5 2.453333 128602 C20orf3S chromosome 20 open reading frame 85 5 2.439412 4778 NFE2 nuclear factor (erythroid-derived 2), 45 kDa 5 2.166471 9496 TBX4 T-box 4 5 2.141765 1100S SPINKS serine peptidase inhibitor, Kazal type 5 5 2.10OS88 21.9995 MS4A15 membrane-spanning 4-domains, subfamily A, member 15 5 1.993.333 353189 SLCO4C1 Solute carrier organic anion transporter family, member 4C 5 1.6066.67 11082 ESM endothelial cell-specific molecule 1 5 1.382157 219790 RTKN2 rhotekin 2 4 2.5931.37 3101 HK3 hexokinase 3 (white cell) 4 2.4566.67 3579 IL8RB interleukin 8 receptor, beta 4 2.3766.67 3577 IL8RA interleukin 8 receptor, alpha 4 2.35 9173 IL1RL1 interleukin 1 receptor-like 1 4 2.041765 26253 CLEC4E C-type lectin domain family 4, member E 4 .793,333 2O3190 LGI3 leucine-rich repeat LGI family, member 3 4 .7OO98 114548 NLRP3 NLR family, pyrin domain containing 3 4 623333 6364. CCL20 chemokine (C-C motif) ligand 20 4 S97941 8807 IL18RAP interleukin 18 receptor accessory protein 4 S9549 6425 SFRP5 secreted frizzled-related protein 5 4 517647 841 06 PRAM1 PML-RARA regulated adaptor molecule 1 3 2.416667 2352 FOLR3 folate receptor 3 (gamma) 3 2.044118 SO487 PLA2G3 phospholipase A2, group III 3 .75 81027 TUBB1 tubulin, beta 1 3 318627 8O329 ULBP1 UL16 binding protein 1 3 O.98.3333 56948 SDR39U1 short chain dehydrogenase/reductase family 39U, member 3 O.94 57126 CD177 CD177 molecule 3 0.573.333 126O14 OSCAR osteoclast associated, immunoglobulin-like receptor 2 .75 387914 SHISA2 shisa homolog 2 (Xenopus laevis) 2 1666.67 90273 CEACAM21 carcinoembryonic antigen-related cell adhesion molecule 2 2 1S 181 AGRP agouti related protein homolog (mouse) 2 O83333 33914S FAM92B family with sequence similarity 92, member B 2 O78431 US 2013/O157891 A1 Jun. 20, 2013 78

TABLE 3-continued Lung Disease genes from top 10 studies GeneID GeneSymbol GeneName Occurance Weight 23569 PADI4 peptidyl arginine deiminase, type IV 2 1.0666.67 6361 CCL17 chemokine (C-C motif) ligand 17 2 1 92747 C20orf114 chromosome 20 open reading frame 114 2 O.8431.37 1669 DEFA4 defensin, alpha 4, corticostatin 2 0.776471 8972 MGAM maltase-glucoamylase (alpha-glucosidase) 2 O.735294 5657 PRTN3 proteinase 3 2 O.694118 4317 MMP8 matrix metallopeptidase 8 (neutrophil collagenase) 2 O.45 653509 SFTPA1 Surfactant protein A1 1 1 222487 GPR97 G protein-coupled receptor 97 1 0.75 1991 ELANE elastase, neutrophil expressed 1 0.75 144448 TSPAN19 tetraspanin 19 1 0.666,667 95O2 XAGE2 Xantigen family, member 2 1 0.666,667 1084 CEACAM3 carcinoembryonic antigen-related cell adhesion molecule 3 1 O.O8

(2) indicates text missing or illegible when filed

TABLE 4 Lung Cancer genes from top 10 studies GeneID GeneSymbol GeneName Ce Weight 6439 SFTPB surfactant protein B 1O 6.04372549 S1208 CLDN18 claudin 18 10 S.96S196078 64.41 SFTPD surfactant protein D 10 S.197941176 1361 CPB2 carboxypeptidase B2 (plasma) 10 4.996372549 4680 CEACAM6 carcinoembryonic antigen-related cell adhesion 10 4195392.157 molecule 6 (non-specific cross reacting antigen) 64116 SLC39A8 solute carrier family 39 (zinc transporter), member 8 9 S.1897 OS882 11197 WIF1 WNT inhibitory factor 1 9 5.183137255 7080 NKX2-1 NK2 homeobox 1 9 S.107058824 5473 PPBP pro-platelet basic protein (chemokine (C X—C motif) 9 4.62754902 igand 7) 247 ALOX15B arachidonate 15-lipoxygenase, type B 9 4.34892.1569 1510 CTSE cathepsin E 9 4.32O784314 64.40 SFTPC Surfactant protein C 9 4.2899O1961 317O FOXA2 orkhead box A2 9 4.21843.1373 21.19 ETV5 ets variant 5 9 2.75.1568627 722 CABPA complement component 4 binding protein, alpha 8 4.568.6274S1 97SO FAM6SB amily with sequence similarity 65, member B 8 431372549 8796 SCE Sciellin 8 4.246O78431 S923 RASGRF1 Ras protein-specific guanine nucleotide-releasing factor 1 8 3994901961 S421O TREM1 triggering receptor expressed on myeloid cells 1 8 3.69892.1569 1755 DMBT1 deleted in malignant brain tumors 1 8 3.658137255 SS282 LRRC36 eucine rich repeat containing 36 8 3.636O78431 4332 MNDA myeloid cell nuclear differentiation antigen 8 3.49372549 6436 SFTPA2B surfactant protein A2B 8 3.385.784314 3918 LAMC2 aminin, gamma 2 8 2.543235294 92O86 GGTLC1 gamma-glutamyltransferase light chain 1 7 4.34.8333333 153 ADRB1 adrenergic, beta-1-, receptor 7 4.212S4902 9476 NAPSA napsin Aaspartic peptidase 7 3.87843.1373 2266 FGG fibrinogen gamma chain 7 3.818823.529 7356 SCGB1A1 Secretoglobin, family 1A, member 1 (uteroglobin) 7 3.609019608 2OOSO4 GKN2 gastrokine 2 7 3.527254902 8999 CDKL2 cyclin-dependent kinase-like 2 (CDC2-related kinase) 7 3.423137255 S390S DUOX1 dual oxidase 1 7 3.3593.13725 10675 CSPG5 chondroitin Sulfate proteoglycan 5 (neuroglycan C) 7 3.3396O7843 2295 FOXF2 forkhead box F2 7 3.20872549 4318 MMP9 matrix metallopeptidase 9 (gelatinase B, 92 kDa 7 3.155196078 gelatinase, 92 kDa type IV collagenase) 9914 ATP2C2 ATPase, Ca++ transporting, type 2C, member 2 7 2.99.127451 4778 NFE2 nuclear factor (erythroid-derived 2), 45 kDa 7 2.44294.1176 1100S SPINKS serine peptidase inhibitor, Kazal type 5 7 2.31823S294 458S MUC4 mucin 4, cell Surface associated 7 2.262843.137 23S84 VSIG2 V-set and immunoglobulin domain containing 2 6 3.85843.1373 6532 SLC6A4 Solute carrier family 6 (neurotransmitter transporter, 6 3.4931372SS Serotonin), member 4 253970 SFTA3 Surfactant associated 3 3.2272S4902 15731O PEBP4. phosphatidylethanolamine-binding protein 4 3.1772S4902 6868 ADAM17 ADAM metallopeptidase domain 17 3.05294.1176 27074 LAMP3 lysosomal-associated membrane protein 3 2.96696O784 US 2013/O157891 A1 Jun. 20, 2013 79

TABLE 4-continued Lung Cancer genes from top 10 studies GeneID GeneSymbol GeneName Ce Weight 9173 IL1RL1 interleukin 1 receptor-like 1 6 2.735882353 57214 KIAA1199 KIAA1199 6 2.6261764.71 90S6 SLC7A7 Solute carrier family 7 (cationic amino acid transporter, 6 2.56372549 y+ system), member 7 117156 SCGB3A2 secretoglobin, family 3A, member 2 6 2.553333333 64581 CLEC7A C-type lectin domain family 7, member A 6 2.5O16666.67 2921 CXCL3 chemokine (C X-C motif) ligand 3 6 2.365784314 11082 ESM1 endothelial cell-specific molecule 1 6 18841.17647 221472 FGD2 FYVE, RhoGEF and PH domain containing 2 6 18798O3922 344 APOC2 apolipoprotein C-II 6 1.737647059 25975 EGFL6 EGF-like-domain, multiple 6 S 3.256666,667 3579 IL8RB interleukin 8 receptor, beta S 3.02372549 388743 CAPN8 calpain 8 S 2.92 284340 CXCL17 chemokine (C X-C motif) ligand 17 S 2.826666,667 1958.14 SDR16CS short chain dehydrogenase/reductase family 16C, S 2.8096O7843 member 5 S1267 CLEC1A C-type lectin domain family 1, member A 5 2.77843.1373 11254 SLC6A14 Solute carrier family 6 (amino acid transporter), member 5 2.725 14 2525 FUT3 fucosyltransferase 3 (galactoside 3(4)-L- 5 2.723137255 fucosyltransferase, Lewis blood group) 401546 C9orf152 chromosome 9 open reading frame 152 S 2.588823.529 63.23 SCN1A Sodium channel, voltage-gated, type I, alpha subunit S 2.32196O784 SS118 CRTAC1 cartilage acidic protein 1 S 2.096666,667 21.9995 MS4A15 membrane-spanning 4-domains, subfamily A, member S 1.99.3333333 15 6364. CCL20 chemokine (C-C motif) ligand 20 S 1.892O58824 29992 PILRA paired immunoglobin-like type 2 receptor alpha 5 17793 13725 8807 IL18RAP interleukin 18 receptor accessory protein S 1.771960784 389376 SFTA2 Surfactant associated 2 4 2.7 2OOO10 SLC5A9 Solute carrier family 5 (sodium glucose cotransporter), 4 2.616666667 member 9 3577 IL8RA interleukin 8 receptor, alpha 4 2.546O78431 146429 LOC146429 Putative solute carrier family 22 member 4 2.53 ENSGOOOOO1821.57 11SO19 SLC26A9 solute carrier family 26, member 9 4 2.49 89822 KCNK17 potassium channel, Subfamily K, member 17 4 2.2O6666667 128602 C20orf3S chromosome 20 open reading frame 85 4 2.106O78431 3101 HK3 hexokinase 3 (white cell) 4 19272S4902 26253 CLEC4E C-type lectin domain family 4, member E 4 1.793333333 9496 TBX4 T-box 4 4 1.6417647O6 2O3190 LGI3 leucine-rich repeat LGI family, member 3 4 1.6OO98O392 353189 SLCO4C1 Solute carrier organic anion transporter family, member 4 1.273333333 4C 56948 SDR39U1 short chain dehydrogenase/reductase family 39U, 4 O.998823.529 member 1 219790 RTKN2 rhotekin 2 3 1593.137255 8972 MGAM maltase-glucoamylase (alpha-glucosidase) 3 1.235294.118 2352 FOLR3 folate receptor 3 (gamma) 3 1144117647 114548 NLRP3 NLR family, pyrin domain containing 3 3 1.123333333 8O329 ULBP1 UL16 binding protein 1 3 O.98.3333333 1669 DEFA4 defensin, alpha 4, corticostatin 3 O.95294.1176 6425 SFRP5 secreted frizzled-related protein 5 3 O.85098O392 841 06 PRAM1 PML-RARA regulated adaptor molecule 1 2 1.416666667 SO487 PLA2G3 phospholipase A2, group III 2 1.25 90273 CEACAM21 carcinoembryonic antigen-related cell adhesion 2 1.15 molecule 21 23569 PADI4 peptidyl arginine deiminase, type IV 2 1.0666,666.67 81027 TUBB1 tubulin, beta 1 2 O.985294.118 1991 ELANE elastase, neutrophil expressed 2 O.926470588 181 AGRP agouti related protein homolog (mouse) 2 O.808823.529 5657 PRTN3 proteinase 3 2 O.6941 17647 6361 CCL17 chemokine (C-C motif) ligand 17 2 O.SS8823.529 4317 MMP8 matrix metallopeptidase 8 (neutrophil collagenase) 2 O.45 1084 CEACAM3 carcinoembryonic antigen-related cell adhesion 2 O.315294.118 molecule 3 57126 CD177 CD177 molecule 2 O.24 222487 GPR97 G protein-coupled receptor 97 1 O.75 126O14 OSCAR osteoclast associated, immunoglobulin-like receptor 1 O.75 387914 SHISA2 shisa homolog 2 (Xenopus laevis) 1 O.S 33914S FAM92B family with sequence similarity 92, member B 1 O.4117647O6 92747 C20orf114 chromosome 20 open reading frame 114 1 O.176470588 US 2013/O157891 A1 Jun. 20, 2013 80

TABLE 5 Summary of organ-specific proteins in different organs. Specific to k organs and most Specific to k organs but not abundant in the organ most abundant in the organ Organ k = 1 k = 2 k = 3 k = 4 k = 5 1 sks 5 k = 2 k = 3 k = 4 k = 5 2 sks 5 Total Adrenal Gland 4 11 6 1 2 34 23 19 7 8 57 91 Artery 1 1 O O O 2 7 21 3 9 50 52 Bladder 3 1 O O 1 5 3 3 4 9 19 24 Brain 313 98 41 12 8 472 52 27 3 12 104 576 Breast 6 6 1 1 2 16 11 10 9 8 38 S4 Cervix 2 6 4 1 O 13 9 7 1 3 30 43 Heart 4 24 6 1 2 57 36 22 8 2 68 25 Kidney 32 17 1 5 3 68 28 29 24 5 86 S4 Liver 101 50 4 10 O 175 35 35 6 4 90 26S Lung 8 7 9 4 1 39 30 27 O 9 76 15 Lymph Node 5 1 4 6 3 19 10 6 7 15 58 77 Lymphocytes 2 4 6 9 9 40 12 7 4 2 25 65 Monocytes 2 16 2 O 1 41 8 6 4 10 38 79 Muscle 42 41 4 7 O 104 22 21 5 6 S4 58 Ovary 1 5 4 1 1 22 8 6 5 5 34 56 Pancreas 3 8 4 6 3 44 18 3 7 7 55 99 Prostate 6 9 3 1 3 32 24 34 29 17 104 36 Skin 1OO 19 3 7 4 143 21 5 1 16 63 2O6 Small Intestine 85 47 22 12 2 168 25 27 7 17 86 2S4 Spleen 8 14 4 6 2 44 17 27 5 16 75 19 Stomach 1 9 8 O 2 30 5 1 3 6 35 65 Testes 814 123 9 5 3 964 69 37 21 15 142 1106 Thymus O O O 1 O 1 10 7 8 14 39 40 Trachea 47 40 7 7 4 105 67 41 8 10 136 241 Ulerus 2 2 4 1 1 10 9 4 3 3 29 39

Total 1682 SS9 246 104 S7 2648 SS9 492 312 228 1591 4239

TABLE 6 Information on sequencing-by-synthesis (SBS) datasets that were used for identifying organ-specific proteins. Organ Label Tissue Type Patient ID Sex Sample Label Dataset AdrenalGland Adrenal Gland 23209 M AdrenalGland M 23209 HCC38 Artery Artery 23060 M Artery M 23060 HCC39 Bladder Bladder THEB196 Bladder F THEB196 HCC11. A Bladder Bladder THEB196 Bladder F THEB196 HCC11 B Bladder Bladder 23060 M Bladder M 23060 HCC10 Bladder Bladder 21.538 M Bladder M 21538 HCC42 Brain Brain (Amygdala) BR4-8L BrainAmygdala F BR4-8L HCC26 Brain Brain (Nucleus BR4-1OL BrainNucleusCaudate F BR4- HCC27 Caudate) OL Breast Breast 108046 Breast F 108046 HCCO1 A Breast Breast 108046 Breast F. 108046 HCCO1 B Breast Breast 108046 Breast F. 108046 HCC17 A Breast Breast 108046 Breast F. 108046 HCC17 B Breast Breast 108O34 Breast F 108034 HCC19 Breast Breast 108O34 Breast F. 108034 HCCO2 A Breast Breast 108O34 Breast F. 108034 HCCO2 B Cervix Cervix 1-21 Cervix F 1-21 HCCOS Heart Heart 19941 Heart F 19941 HCCS1 Heart Heart 23060 M Heart M 23060 HCC18 Kidney Kidney 3010O2 Kidney F 30.1002 HCC53 Kidney Kidney 3.01028 M Kidney M 301028 HCCS2 Kidney Renal Cortical RenalCorticalEpithelialCells HCCHECReCo Epithelial Cells Kidney Renal Epithelial RenalEpithelialCells HCCHECRena Cells Kidney Renal Proximal Renal ProximalTubuleEpithelialCells HCCHuECRPT Tubule Epithelial Cells Liver Liver S3891 M Liver M 53891 HCCS4 Liver Liver S6310 M Liver M 56310 HCCO8 Liver Hepatocytes F Hepatocytes HCCHuHep Lung Lung 3O1008 F Lung F 30.1008 HCCS6. A US 2013/O157891 A1 Jun. 20, 2013 81

TABLE 6-continued Information on sequencing-by-synthesis (SBS) datasets that were used for identifying organ-specific proteins. Organ Label Tissue Type Patient ID Sex Sample Label Dataset Lung Lung 3O1008 Lung F 30.1008 HCC56 B Lung Lung 3O1008 Lung F 30.1008 HCC56 C Lung Lung AST6161 M Lung M AST6161 HCC55 LymphNode Lymph Node 20951 LymphNode F 20951 HCC46 LymphNode Lymph Node 19941 LymphNode F 19941 HCC57 A LymphNode Lymph Node 19941 LymphNode F 19941 HCC57 B LymphNode Lymph Node THEB196 M LymphNode M THB196 HCC25 Lymphocytes Lymphocytes (B) NF11 + NF4 F LymphocytesB F NF11 + NF4 HCC14 Lymphocytes Lymphocytes (B) NMS10 M LymphocytesB M NMS10 HCC21 Lymphocytes Lymphocytes (T) NF11 LymphocytesT F NF11 HCC15 Monocytes Monocytes NF11 Monocytes F NF11 HCC16 Monocytes Monocytes NMSS M Monocytes M. NMS5 HCC2O Muscle Muscle (Skeletal) 54509 M MuscleSkeletal M 54509 HCCS8 Muscle Muscle (Smooth) 20951 MuscleSmooth F 20951 HCC36 Ovary Ovary 23011 Ovary F 23011 HCCO6 Pancreas Pancreas 3010O2 Pancreas F 30.1002 HCC60 Pancreas Pancreas 301OO1 M Pancreas M. 301001 HCCS9 Pancreas Pancreatic Islet PancreaticIsletCells F Islets HCC4Ob Cells Prostate Prostate 23060 M Prostate M 23060 HCCO3. A Prostate Prostate 23060 M Prostate M 23060 HCCO3 B Prostate Prostate 21.538 M Prostate M. 21.538 HCCO)4 Prostate Prostate Epithetal M ProstateEpithetalCells HCCHECPros Cells Skin Skin 20951 Skin F 20951 HCC30 Skin Epidermal EpidermalKeratinocytes HCCHEK Keratinocytes SmallIntestine Small Intestine 3010O3 SmallIntestine F 301003 HCC62 SmallIntestine Small Intestine 21.538 M SmallIntestine M 21538 HCC31 Spleen Spleen 20951 Spleen F 20951 HCC23 Spleen Spleen 19941 Spleen F 19941 HCC64 Spleen Spleen 21.538 M Spleen M 21538 HCCSO Stomach Stomach 19941 Stomach F 19941 HCC6S Stomach Stomach 23060 M Stomach M 23060 HCC24 Stomach Stomach S6310 M Stomach M 56310 HCCSOA Testes Testes 23060 M Testes M 23060 HCC09 Thymus Thymus 20951 Thymus F 20951 HCC34 Thymus Thymus 23060 M Thymus M 23060 HCC33 Trachea Trachea 20951 Trachea F 20951 HCC29 Uterus Uterus 23011 Uterus F 23011 HCCO7

TABLE 7 TABLE 8 List of primer-dimers used in generating Error rates of sequencing the last base of sequencing-by-synthesis (SBS) data. sequencing-by-synthesis (SBS) tags.

GATCTCGTATGCCGTCTTCT Error Rate

GATCGTATGCCGTCTTCTGC A->C O.0099 A->G O.OO10 GATCCGTATGCCGTCTTCTG A->T O.OO27 GATCGTCGGACTGTAGAACT C->A O.OO59 C->G O.OO11 GATCGCCGTATCATTCGTAT C->T O.0022 G->A O.OO11 GATCGCCGTATCATTTCGTA G->C O.OO21 G->T O.OO32 GATCGCCGTATCATCGTATG T->A O.OO18 T->C O.OO32 GATCCCCCCCCCCCCCCCCC T->G O.OO17 US 2013/O157891 A1 Jun. 20, 2013 82

TABLE 9 Top 50 transcriptomic studies on human diseases that had the highest correlation with lung-specific proteins (k S 5). The most significant datasets of the studies and their corresponding p values were also listed. Lung No. Study Name Public Id Dataset P-Value Related Pre- and post-natal Congenital GSE4772 Lung from CCAM postnatal 18OE-14 Yes Cystic Adenomatoid Malformation Subjects vs. CCAM fetuses of Lung Samples Gene expression in primary GSE15240 Small cell lung cancer primary 2.40E-14 Yes tumors and tumor derived cell Xenograft VS normal lung lines in Small cell lung cancer Lung tissue from idiopathic GSE10667 Lung from patients with acute 3.90E-14 Yes pulmonary fibrosis and usual exacerbations of idiopathic interstitial pneumonia pulmonary fibrosis vs normal Lung tumors with early GSE10799 Primary lung adenocarcinoma 3.2OE-12 Yes dissemination of tumor cells into that metastasized to bone vs normal bone marrow lung Overcoming resistance to GSE12102 Ewing sarcoma cells from 3.8OE-12 No conventional drugs in Ewings patients with tumor metastasis VS tumor S80O8. relapse Adenocarcinoma and squamous Non-Small cell lung cancer - 4.10E-12 Yes cell carcinoma in human Non Squamous cell carcinoma vs adenocarcinoma Small Cell Lung Cancer Profiling of NSCLC patients for Lung cancer in females - 2.1OE-11 Yes predicting recurrence free Survival Squamous cell carcinoma vs adenocarcinoma Gene expression-based survival Non-Small cell lung 2.OOE-10 Yes prediction in lung adenocarcinoma adenocarcinoma moderately differentiated vs well differentiated 9 expC project Lung cancer Subset GSE2109 Lung Cancer Pathological T2 vs T1 14OE-09 Yes Gene expression profiles in GSE2549 Malignant pleural mesothelioma 2.8OE-09 Yes malignant pleural mesothelioma tumors vs normal lung tissue Bone marrow gene expression in GSE15061 Bone marrow from patients with 3.SOE-09 No acute myelocytic leukemia and acute myelocytic leukemia vs non myelodysplastic syndrome eukemia controls Classification of High-grade GSE1037 LCNEC lung tumor vs norrnal 6.7OE-09 Yes neuroendocrine tumors of the lung ung tissue Inflammatory bowel disease leal mucosa Crohns illeltis no 7.2OE-09 No before and after first infliximab response to infliximab—before treatinent treatment vs normal control A Predictive Response Signature Olcerative colitis colon before 18OE-08 No to Infliximab Treatment in 5 mg/kg b.w.. infliximab 8 wk Ulcerative Colitis treatment—non-responder VS responder Pancreatic tumor compared to GS E16515 Pancreatic tumor vs adjacent 2.2OE-08 No normal pancreatic tissue normal pancreatic tissue Diversity of gene expression in GS E3398 Primary lung cancer tumors vs adjacent 3.8OE-08 Yes adenocarcinoma of the lung normal tissue APL subtype M3 expression GS E12662 M3 AML vs promyelocytes 4.OOE-08 No compared to other subtypes and rom normal bone marrow normal promyelocytes Neurocrine Body Atlas Hs: GS E3526 Lung - relative gene expression 5.7OE-08 Yes Relative gene expression Non Small Cell Lung Cancer GS E1987 NSCL - Squamous cell carcinoma vs normal 9.4OE-08 Yes ung Progenitor and Stem Cell GS E10438 Hematopoietic cells—first stem 1.7OE-07 No Populations from Human Umbilical cell fraction vs whole cord Cord Blood blood 21 Squamous cell carcinoma of GS E3578 Cervical tumor 2.6OE-07 No cervix before and during chemoradiotherapy treatment 2 d VS prior radiotherapy or O treatinent chemoradiotherapy 22 Squamous Lung Cancer and GS E3268 Squamous cell lung cancer tumor 2.8OE-07 Yes adjacent normal tissue issue vs adjacent tissue 23 Pediatric systemic inflammatory GS E13904 Whole blood from children with S.2OE-07 No response syndrome, sepsis, and Septic shock at d3 vs normal septic shock spectrum 24 Gene expression profiling of AML refractory anemia with S.SOE-07 No CEBPA double and single mutant excess blasts vs AML FAB and CEBPAWTAML class MO 25 Human bone marrow GSE9894 Bone Marrow - CD11b+ cells vs Mesenchymal 5.7OE-07 No mesenchymal stem cells Stem Cells 26 Prognostic gene signature for Bone marrow nuclear cells from 6.4OE-07 No normal karyotype AML AML patients- FAB M4 vs FAB M1 GPL96 27 Human primary lung E-MEXP Primary lung adenocarcinomas VS normal 7.7OE-07 Yes adenocarcinomas 231 tissues US 2013/O157891 A1 Jun. 20, 2013 83

TABLE 9-continued Top 50 transcriptomic studies on human diseases that had the highest correlation with lung-specific proteins (k S 5). The most significant datasets of the studies and their corresponding p values were also listed. Lung No. Study Name Public Id Dataset P-Value Related 28 Gene expression signature of GSE10072 Lung Adenocarcinoma vs Normal 8.2OE-07 Yes cigarette Smoking & its role in lung adenocarcinoma 29 Differentiation of human GSE3306 Fetal lung epithelial cells + 11 OE-06 Yes pulmonary type 2 cells in vitro dexamethasone + 8-Br-cAMP + isobutylxanthine 72 hr vs control 30 Study of Multiple Solid Cancers GSE5364 Lung tumor vs paired normal 2.3OE-06 Yes ung 31 Cell Specific Expression in GSE5580 Leukocytes of severe trauma 2.3OE-06 No Trauma-Related Human T-Cell & patients CHGN vs Healthy Monocyte 32 Lung cancer dataset GSE3141 Lung adenocarcinoma VS lung 3.2OE-06 Yes Squamous cell carcinoma 33 Blood leukocytes infected with GSE6269 PBMC + influenza virus vs. Gram- 3.SOE-06 No Influenza A virus, gram- and bacteria gram-bacteria infection GPL570 34 Adiacent normal and tumor GSE7670 Tumor part of lung 4.OOE-06 Yes portions of lung cancer adenocarcinoma vs normal bart 35 Obiective classification of colon GSE4183 Colon biopsies from inflammatory 4.4OE-06 No biopsy specimens bowel diseases patients vs normal colon 36 Airway epithelium of nonsmokers, GSE10006 Small airway epithelial cells from 5.3OE-06 Yes normal Smokers, and Smokers Smoker with COPD vs. Smoker with COPD or early COPD without COPD 37 Expression data from human GSE 10714 Colon biopsy from ulcerative 5.4OE-06 No colonic biopsy sample colitis VS healthy control 38 Pediatric septic shock GSE8121 Whole blood from children with 6.5OE-06 No Septic shock at day 1 vs normal children 39 Metastases of breast cancer GSE14020 Metastasis of breast cancer to 8.90E-O6 Yes ung vs liver GPL96 40 Systemic inflammatory response GSE4607 Whole blood from septic shock OOE-OS No syndrome and septic shock Subject VS control 41 Lung Squamous cell carcinoma GSE6044 Lung adenocarcinoma from OOE-OS Yes and adenocarcinoma before and patient before platinum therapy vs normal after platinum therapy ung 42 Barrett's esophagus and GSE6059 Esophageal adenocarcinoma vs normal 1OE-05 No adenocarcinoma compared to esophageal tissue normal esophageal and duodenal tissue 43 Expression data from pulmonary GSE14378 Pulmonary metastasis of renal 2OE-OS Yes metastases of clear-cell renal cell cell carcinoma multiple vs few carcinoma 44 Whole blood of patients with GSE11545 Whole blood from patients with 2OE-OS No single and double primary tumors breast and gastric and healthy controls cancer vs breast cancer 45 Blood Leukocyte Microarrays to GSE8650 PBMCs from SLE patients vs Healthy 2OE-OS No Diagnose Systemic Onset individuals Juvenile Idiopathic Arthritis 46 Response to burn injury and GSE2328 Burn injury response vs healthy 3OE-OS No inflammation Subjects 47 Lung from familial and sporadic GSE5774 Lung from sporadic idiopathic SOE-OS Yes cases of interstitial pneumonia interstitial pneumonia patient vs familial 48 Progression and response in GSE4170 CML in blast crisis vs. chronic 2.OOE-OS No chronic myeloid leukemia phase 49 Microarray deconvolution for GSE11057 Effector memory T-cells fraction vs unpurified 2.1OE-05 No quantifying Subsets of T-cells in PBMC population PBMCs 50 PBMC from patients with GSE16129 PBMC from patient + methicillin 2.1OE-05 No methicillin-resistant and resistant Staphylococcus aureus VS healthy Susceptible S. aureus infections control GPL96 GPL97

1. A method for predicting a risk for development of a (c) comparing the expression levels of the sample organ disease or change in health status comprising: specific panel protein set to predetermined expression levels of an identical set of organ specific panel proteins (a) obtaining a sample from a Subject; from a control population; (b) measuring the presence or absence of a set of sample (d) determining the expression level differences between organ specific panel proteins; the sample organ specific panel protein set and the pre US 2013/O157891 A1 Jun. 20, 2013

determined expression levels of the control population 16. The method of claim 12, wherein the one or more organ organ specific panel protein set; specific panel gene products is a protein. 2. The diagnostic method of claim 1, wherein the sample 17. The method of claim 12, wherein the one or more organ organ specific panel proteins are measured from a target specific panel gene product is an RNA transcriptome. Organ. 18. The method of claim 12, wherein the disease is a lung 3. The diagnostic method of claim 1, wherein the sample disease. organ specific panel proteins are measured from a plurality of 19. The method of claim 18, wherein the lung disease is a Organs. lung cancer selected from the group consisting of Small cell 4. The diagnostic method of claim 1, wherein the organ carcinoma, non-Small cell carcinoma, squamous cell carci specific panel protein set is selected from proteins expressed noma, adenocarcinoma, broncho-alveolar carcinoma, mixed in the group of organs consisting of adrenal gland, artery, pulmonary carcinoma, malignant pleural mesothelioma and bladder, brain (amygdala), brain (nucleus caudate), breast, undifferentiated pulmonary carcinoma. cervix, heart, kidney, renal cortical epithelial cells, renal 20. The method of claim 18, wherein the lung disease is proximal tubule epithelial cells, liver, hepatocytes, lung, selected from the group consisting of acute respiratory dis lymph node, lymphocytes (b), lymphocytes (t), monocytes, tress syndrome (ARDS), alpha-1-antitrypsin deficiency, muscle (skeletal), muscle (Smooth), ovary, pancreas, pancre asbestos-related lung diseases, asbestosis, asthma, bron atic islet cells, prostate, prostate epithelial cells, skin, epider chiectasis, bronchitis, bronchopulmonary dysplasia (BPD), mal keratinocytes, Small intestine, spleen, stomach, testes, chronic bronchitis, chronic obstructive pulmonary disease thymus, trachea, and uterus. (COPD), congenital cystic adenomatoid malformation, cystic 5. The diagnostic method of claim 1, wherein the organ fibrosis, emphysema, hemothorax, idiopathic pulmonary specific panel protein set is selected from proteins expressed fibrosis, infant respiratory distress syndrome, lymphangiolei by target genes provided in Tables 1-4. omyomatosis (LAM), pleural effusion pleurisy and other 6. The diagnostic method of claim 5, wherein the organ pleural disorders, pneumonia, pneumonoconiosis, pulmo specific panel protein set is selected Such that the expression nary arterial hypertension, pulmonary fibrosis, respiratory level of at least one of the organ specific panel in the sample distress syndrome in infants, sarcoidosis and thoracentesis. is above or below the predetermined level. 21. The method of claim 12, wherein the set of sample 7. The diagnostic method of claim 6, wherein the expres organ specific panel gene products further comprises sion levels of the sample organ specific panel protein set and CLDN18, CPB2, WIF1, PPBP, and ALOX15B. the control population organ specific panel protein set differ 22. The method of claim 21, wherein the levels of the set of by at least 10%. sample organ specific panel gene products is determined by a 8. The diagnostic method of claim 7, wherein the organ method selected from the group consisting of mass spectrom specific panel protein set comprises at least five organs. etry, an MRMassay, an immunoassay, an ELISA, RT-PCR, a 9. The diagnostic method of claim 7, wherein the organ Northern blot, and Fluorescent In Situ Hybridization (FISH). specific panel protein set comprises at least ten organs. 23. The method of claim 21, wherein the levels of the set of 10. The diagnostic method of claim 8, wherein the organ sample organ specific panel gene products is determined by specific panel protein set is specific for the lung. an MRM assay. 11. The diagnostic method of claim 10, wherein the method 24. The method of claim 12, further comprising a diagnos predicts a risk for developing lung disease. tic kit comprising a plurality of detection reagents to detect 12. A method for diagnosing a disease, condition orchange the set of sample organ specific panel gene products. in health status comprising: 25. The method of claim 25, wherein the plurality of detec (a) obtaining a sample of organ specific panel gene prod tion reagents are selected from the group consisting of anti ucts from a Subject; bodies, capture agents, multi-ligand capture agents and (b) measuring the presence or absence of a set of sample aptamers. organ specific panel gene products selected from the 26. A method for identifying a panel of disease-associated organ specific panel genes provided in Tables 1-4: organ specific panel gene products, comprising: (c) comparing the levels of the set of sample organ specific (a) obtaining a biological sample from a subject deter panel gene products to a predetermined control range for mined to have a disease affecting a selected organ; each organ-specific gene product; and (b) detecting a first level of one or more organ specific (d) diagnosing a disease, condition or change in health panel gene products selected from any one or more of the status based upon the difference between levels of the set organ specific panel genes provided in Tables 1-4 in the of sample organ specific panel gene products and the biological sample: predetermined control range for each organ specific (c) comparing the first level of the one or more organ panel gene product. specific panel gene products to a predetermined control 13. The method of claim 12, wherein the biological sample range. is selected from the group consisting of organs, tissue, bodily (d) selecting one or more gene products as a member of the fluids and cells. panel of disease-associated organ specific panel gene 14. The method of claim 13, wherein the bodily fluid is products when the first level of one or more of the organ selected from the group consisting of blood, serum, plasma, specific panel gene products in the biological sample is urine, sputum, saliva, stool, spinal fluid, cerebral spinal fluid, above or below the corresponding predetermined con lymph fluid, skin secretions, respiratory secretions, intestinal trol range. secretions, genitourinary tract secretions, tears, and milk. 27. A method for generating a predetermined control range 15. The method of claim 12, wherein the biological sample for one or more organ specific panel gene products compris is a blood sample. ing the steps of US 2013/O157891 A1 Jun. 20, 2013

(a)identifying one or more organ specific panel gene prod (a) obtaining a sample from a subject; ucts using sequencing by Synthesis; (b) measuring expressionlevels of CLDN18, CPB2, WIF1, (b) measuring the level of the one or more organ specific PPBP, and ALOX15B; and panel gene product in a set of specific healthy organs; (c) predicting that the subject is at risk for development of (c) determining a set of standard values for the one or more non-Small cell lung cancer based upon the expression organ specific panel gene product that is the predeter level of CLDN18, CPB2, WIF1, PPBP and ALOX15B mine control range; wherein the predetermined control in the sample. rage is compared to a biological sample from a subject to 30. The method of claim 28, wherein the sample is a blood determine the health status of the subject. sample. 28. A method for identifying a subject at risk for the devel 31. The method of claim 28, wherein the expression levels opment of lung cancer comprising: of CLDN18, CPB2, WIF1, PPBP, and ALOX15B are deter (a) obtaining a sample from a Subject; mined by an MRM assay. (b) measuring expression levels of CLDN18, CPB2, WIF1, 32. The method of claim 1, wherein the predetermined PPBP, and ALOX15B; and control range is determined by analysis of a set of organs (c) predicting that the subject is at risk for development of obtained by healthy tissue donors. non-Small cell lung cancer based upon the presence of 33. The method of claim 1, wherein the one or more detec CLDN18, CPB2, WIF1, PPBP, and ALOX15B in the tion reagents are specific to the first ten ranked lung cancer sample. biomarkers in Table 4 that are in the organ of lung. 29. A method for diagnosing lung cancer comprising: k k k k k