(12) United States Patent (10) Patent No.: US 9,689,041 B2 Ye Et Al

USOO9689041B2

(12) United States Patent (10) Patent No.: US 9,689,041 B2 Ye et al. (45) Date of Patent: Jun. 27, 2017

(54) METHOD AND KIT FOR DETERMINING IN 2005/0287544 A1 12/2005 Bertucci et al. VTRO THE PROBABILITY FOR AN 2008/0311574 A1 12/2008 Manne et al. 2011/O117083 A1 5, 2011 Bais et al. INDIVIDUAL TO SUFFER FROM COLORECTAL CANCER FOREIGN PATENT DOCUMENTS (75) Inventors: Xun Ye, Shanghai (CN); Fei Wu, CN 101.068936 A 11/2007 Shanghai (CN); Qinghua Xu, Zhejiang CN 101111604. A 1, 2008 EP O 201 184 A2 12, 1986 (CN); Fang Liu, Shanghai (CN); Xia EP 2 169 O78 A1 3, 2010 Meng, Shanghai (CN); Bruno Mougin, EP 2 177 615 A1 4/2010 Lyons (FR) FR 2 780 059 A1 12/1999 FR 2 816 711 A1 5, 2002 FR 2816958 A1 5, 2002 (73) Assignee: BIOMERIEUX, Marcy 1 Etoile (FR) WO WO 89.10977 A1 11, 1989 WO WO 90/01069 A1 2, 1990 (*) Notice: Subject to any disclaimer, the term of this WO WO 90.03382 A1 4f1990 patent is extended or adjusted under 35 WO WO 9006995 A1 6, 1990 WO WO 91 (02818 A1 3, 1991 U.S.C. 154(b) by 610 days. WO WO 91/19812 A1 12, 1991 WO WO 94,12670 A1 6, 1994 (21) Appl. No.: 14/007,439 WO 97,452O2 A1 12/1997 WO WO 99/15321 A1 4f1999 (22) PCT Filed: Mar. 23, 2012 WO 99,355OO A1 7, 1999 WO WO 99,53304 A1 10, 1999 WO WO 99,65926 A1 12, 1999 (86). PCT No.: PCT/CN2012/072931 WO WOOO/O5338 A1 2, 2000 S 371 (c)(1), WO WOOO,71750 A1 11, 2000 (2), (4) Date: Oct. 18, 2013 WO WO O1/44506 A1 6, 2001 (Continued) (87) PCT Pub. No.: WO2012/130103 OTHER PUBLICATIONS PCT Pub. Date: Oct. 4, 2012 Xu et al. (Cancer Biology & Therapy, 2011, vol. 11, No. 2, p. (65) Prior Publication Data 188-195).* Nov. 19, 2015 Office Action issued in U.S. Appl. No. 13/698.219. US 2014/OO578O2 A1 Feb. 27, 2014 Jul. 9, 2015 Office Action issued in U.S. Appl. No. 13/698.219. Abstract of ComBat: 'Combatting Batch Effects. When Combining (30) Foreign Application Priority Data Batches of Gene Expression Microarray Data, obtained from http:// www.bu.edu/lab/wp-assets/ComBat/Abstract.html on Feb. 14. Mar. 25, 2011 (CN) ...... PCT/CN2011/072155 2013. Tachibana, T., et al., “Increased Intratumor Vol.24-Positive Natural (51) Int. Cl. Killer T Cells: A Prognostic Factor for Primary Colorectal Carci CI2O I/68 (2006.01) nomas,” Clin. Cancer Res., Oct. 15, 2005, pp. 7322-7327, vol. 11. (52) U.S. Cl. Liu, J. et al., “Gene expression profiling for nitric oxide prodrug CPC ..... CI2O 1/6886 (2013.01); C12O 2600/158 JS-K to kill HL-60 myeloid leukemia cells.” Genomics, 2009, pp. (2013.01) 32-38, vol. 94. (58) Field of Classification Search (Continued) None See application file for complete search history. Primary Examiner — Stephanie K Mummert (74) Attorney, Agent, or Firm — Oliff PLC (56) References Cited U.S. PATENT DOCUMENTS (57) ABSTRACT 4,672,040 A 6/1987 Josephson The present invention provides a method for determining in 4,683, 195 A 7, 1987 Mullis et al. vitro, in a peripheral blood sample, the probability for an 4,683.202 A 7, 1987 Mullis individual to Suffer from a colorectal cancer, using the 4,800,159 A 1, 1989 Mullis et al. 4,981,783 A 1/1991 Augenlicht comparison of the amount of expression products of nucleic 5,234,809 A 8, 1993 Boom et al. acids of genes of the individual to be tested with the amount 5,399,491 A 3, 1995 Kacian et al. of expression products of nucleic acids of the same genes 5,445,934 A 8, 1995 Fodor et al. obtained from a CRC group of patients constituting the 5,700,637 A 12/1997 Southern positive control and with the amount of expression products 5,744,305 A 4, 1998 Fodor et al. 5,750,338 A 5, 1998 Collins et al. of nucleic acids of the same genes obtained from a CNC 5,807,522 A 9, 1998 Brown et al. group of individuals constituting the negative control; and a 2004.0002082 A1 1/2004 Feinberg kit comprising specific binding partners for said expression 2004/0033516 A1 2/2004 Mougin products. 2004/0265230 A1* 12/2004 Martinez ...... A61K 39.0011 424,149 2005/O130170 A1 6/2005 Harvey et al. 27 Claims, No Drawings US 9,689,041 B2 Page 2

(56) References Cited Tyagi et al., “Molecular Beacons: Probes that Fluoresce Upon Hybridization.” Nature Biotechnology, Mar. 1996, vol. 14, pp. 303.308. FOREIGN PATENT DOCUMENTS Kricka, "Nucleic Acid Detection Technologies—Labels, Strategies, WO WO O1/44507 A1 6, 2001 and Formats.” Clinical Chemistry, 1999, vol. 45, No. 4, pp. 453 WO WO O2/4O711 A1 5, 2002 458. WO WO O2/O903.19 A1 11, 2002 Keller et al., “Non-Radioactive Labeling Procedures,” and “Hybrid WO WO O2/O90584 A2 11/2002 ization Formats and Detection Procedures.” DNA Probes, 1993, pp. WO 2005/O54508 A2 6, 2005 V and 173-253. WO WO2O08063414 * 5, 2008 ...... C12O 1/68 Chee et al., “Accessing Genetic Information with High-Density WO 2009/049228 A2 4/2009 DNA Arrays,” Science, Oct. 5, 1996, vol. 274, No. 5287, pp. WO 2009.126804 A2 10/2009 610-614. WO 2010/040571 A2 4/2010 Pease et al., “Light-Generated Oligonucleotide Arrays for Rapid WO 2010/0563.74 A2 5, 2010 DNA Sequence Analysis.” Proc. Natl. Acad. Sci. USA, May 1994, vol. 91, pp. 5022-5026. OTHER PUBLICATIONS Cheng et al., “Preparation and Hybridization Analysis of DNA Clemson, C.M. et al., “An Architectural Role for a Nuclear Noncod RNA from E. coli on Microfabricated Bioelectronic Chips,” Nature ing RNA: NEAT1 RNA is Essential for the Structure of Biotechnology, Jun. 1998, vol. 16, pp. 541-546. Paraspeckles.” Molecular Cell, Mar. 27, 2009, pp. 717-726, vol. 33. Livache et al., “Preparation of a DNA Matrix Via an Electrochemi Sheu, B., et al., “Up-regulation of Inhibitory Natural Killer Recep cally Directed Copolymerization of Pyrrole and Oligonuclieotides tors CD94/NKG2A with Suppressed Intracellular Perforin Expres Bearing a Pyrrole Group.” Nucleic Acids Research, 1994, vol. 22. sion of Tumor-Infiltrating CD8+ T Lymphocytes in Human Cervical No. 15, pp. 2915-2921. Carcinoma,” Cancer Res., Apr. 1, 2005, pp. 2921-2929, vol. 65, No. Ginot, “Oligonucleotide Micro-Arrays for Identification of 7. Unknown Mutations: How Far From Reality?” Human Mutation, McGilvray, R.W., et al., “NKG2D Ligand Expression in Human 1997, vol. 10, pp. 1-10. Colorectal Cancer Reveals Associations with Prognosis and Evi Cheng et al., “Microchip-Based Devices for Molecular Diagnosis of dence for Immunoediting.” Clin. Cancer Res., Nov. 15, 2009, pp. Genetic Diseases.” Molecular Diagnosis, 1996, vol. 1, No. 3, pp. 6993-7002, vol. 15, No. 22. 183-200. Apr. 1, 2011 International Search Report issued in International Bustin, “Quantification of mRNA Using Real-Time Reverse Tran Application No. PCT/EP2010/057843. scription PCR (RT-PCR): Trends and Problems,” Journal of Apr. 1, 2011 Written Opinion of the International Searching Author Molecular Endocrinology, 2002, vol. 29, p. 23-39. ity issued in International Application No. PCT/EP2010/057843. Giulietti et al., “An Overview of Real-Time Quantitative PCR: Dec. 19, 2014 Office Action issued in U.S. Appl. No. 13/698.219. Applications to Quantify Cytokine Gene Expression.” Methods, Apr. 16, 2014 Office Action issued in U.S. Appl. No. 13/698.219. 2001, vol. 25, pp. 386–401. U.S. Appl. No. 13/698.219, filed Nov. 15, 2012 in the name of Ye Irizarry et al., “Exploration, Normalization and Summaries of High et al. Density Oligonucleotide Array Probe Level Data.” Biostatistics, Killer Cell Lectin-Like Receptor Subfamily K. Member 1. 2003, vol. 4, No. 2, pp. 249-264. Datasheet online. Weizmann Institute of Science, Oct. 23, 2013 Maglottet al., “Entrez Gene: Gene-Centered Information at NCBI.” retrieved on Apr. 4, 2014). Retrieved from the Internet: genecards.org/cgi-bin/carddisp.pl?gene–KLRK1 D. Johnson et al., “Adjusting Batch Effects in Microarray Expression Killer Cell Lectin-Like Receptor Subfamily B. Member 1. Data Using Empirical Bayes Methods,” Biostatistics, 2007, vol. 8, Datasheet online. Weizmann Institure of Science, Oct. 23, 2013 No. 1, pp. 118-127. retrieved on Apr. 4, 2014). Retrieved from the Internet: . Genbank, Accession No. NM 016613, 2000. Related RAS Viral (R-Ras) Oncogene Homolog 2. Datasheet Genbank, Accession No. NM 001 128424, 2000. online. Weizmann Institute of Science, Oct. 23, 2013 retrieved on Jul. 5, 2012 International Search Report issued in International Apr. 4, 2014. Retrieved from the Internet: . Oct. 1, 2013 International Preliminary Report on Patentability Nielsen et al., “Sequence-Selective Recognition of DNA by Strand issued in International Application No. PCT/CN2012/072931. Displacement with a Thymine-Substituted Polyamide,” Science, Dec. 6, 1991, vol. 254, pp. 1497-1500. * cited by examiner US 9,689,041 B2 1. 2 METHOD AND KIT FOR DETERMINING IN c) performing analysis of results of step b), wherein VTRO THE PROBABILITY FOR AN if the result for the tested individual is close to or equal to INDIVIDUAL TO SUFFER FROM the result obtained from the group of individuals pre COLORECTAL CANCER viously diagnosed as colorectal cancer patients, then 5 the tested individual is classified as a colorectal cancer FIELD OF THE INVENTION patient, and if the result for the tested individual is close to or equal to The present invention relates to the detection of a col the result obtained from the group of individuals pre orectal cancer, especially to a method and kit for determin viously verified as non colorectal cancer individuals, 10 then the tested individual is classified as a non colorec ing the probability to suffer from such a cancer. tal individual. BACKGROUND The amount of the expression product is directly linked to the expression level of a gene defined by its nucleic acid Sequence. Colorectal cancer (CRC), also called colon cancer or large The expression level of at least one of the above nucleic bowel cancer is the fifth most common form of cancer in the acids is a Sufficient information for determining if the United States, the fourth common cancer in China and the individual is a CRC patient or not. But, in a preferred third leading cause of cancer-related death in Europe. The embodiment of the invention, in the step a), it is determined early detection of CRC is the key to successful treatment and the amount of the expression products from nucleic acid patient Survival and represents a major public health chal sequences selected from the group consisting of SEQ ID lenge. Indeed, CRC is often curable particularly when NO: 1, SEQ ID NO: 2 or SEQ ID NO: 3 or SEQID NO: 4, diagnosed at early stages. Several screening strategies are SEQID NO. 5 or SEQID NO: 6, SEQID NO: 7 or SEQ ID already in place in various countries. Conventional CRC NO: 8, SEQID NO:9, SEQID NO: 10 and SEQID NO: 11. screening tests include fecal occult blood test (FOBT), The amount of expression product(s) from the nucleic sigmoidoscopy, colonoscopy, double contrast barium 25 acid(s) is determined by bringing the expression product(s) enema, or digital rectal examination. All of them have into contact with at least one binding partner specific for advantages and limitations, but compliance remains less each expression product. than expected mainly due to logistics or discomfort for the Expression product(s) means RNA transcript(s) or poly patients. peptides(s). Accordingly, in the method of the invention, it Search for peripheral blood biomarkers aimed at early 30 is determined the amount of at least one RNA transcript or detection of CRC became a focus since several years, at least one polypeptide. The term RNA transcripts is intended to mean total RNA, especially for its convenience. Meantime, blood-based test i.e. coding or non coding RNA directly obtained from the feasibility was supported by very few studies, which have peripheral blood sample or indirectly obtained from the shown that gene biomarkers in blood could differentiate blood sample after cell lysis. Especially, total RNA com CRC patients from controls. These studies were based on the prises transfer RNAs (tRNA), messenger RNAs (mRNAs), flow cytometry that is a technique for counting and exam such as the mRNAs transcribed from the target gene, but ining microscopic particles, such as cells by Suspending also transcribed from any other gene, and ribosomal RNAs. them in a stream of fluid and analyzing them by using an By way of indication, when the RNA is intracellular electronic detection apparatus. 40 RNA, it can be extracted from the cells present in the blood The present inventors have found that differentially sample by a step of lysis of, in order to release the nucleic expressed genes represented important biomarkers in acids contained in the cells of the individual to be tested. By peripheral blood samples. They did not used classical tech way of example, use may be made of the methods of lysis nique of flow cytometry but the determination of differential as described in patent applications: WO 0005338 regarding expression of genes from whole blood. It is non usual to 45 mixed magnetic and mechanical lysis, WO99/53304 regard determine an expression level of genes via the analysis of ing electrical lysis, WO 99/15321 regarding mechanical transcripts in whole blood, because it is commonly admitted lysis. Those skilled in the art may use other well-known by the persons skilled in the art that it is very difficult to methods of lysis, such as thermal or osmotic shocks or retrieve a specific information when it is diluted in a chemical lyses using chaotropic agents such as guanidinium complex mixture of RNAs (total RNA) without a step of 50 salts (U.S. Pat. No. 5,234.809). It is also possible to provide specific purification. An advantage of the present method is an additional step for separating the nucleic acids from the also to avoid this step of purification of RNA. other cellular constituents released in the lysis step. This Accordingly, the present invention relates to a method for generally makes it possible to concentrate the nucleic acids. determining in vitro, in a peripheral blood sample, the In the method of the invention the RNA transcript can be probability for an individual to suffer from a colorectal 55 detected and quantified by hybridization, amplification or cancer, the method comprising the steps of sequencing. Especially, to be detected and quantified, the a) determining, in the peripheral blood sample, the amount RNA transcript is brought into contact with at least one of at least one expression product from at least one nucleic probe or at least one primer under predetermined conditions acid sequence and no more than 7 nucleic acid sequences, which enable hybridization of said probe and/or said primer said nucleic acid sequence being selected from the 60 to the RNA transcript. But in another embodiment of the sequences identified in SEQ ID NOs: 1 to 11, invention, DNA copies of the RNA transcript are prepared b) comparing the amount of said expression product deter and said DNA copies are determined by bringing them into mined in step a) with a reference amount of the expression contact with at least one probe or at least one primer under product for a group of individuals previously diagnosed as predetermined conditions which enable hybridization of said colorectal cancer patients and with a reference amount of 65 probe and/or said primer to the DNA copies. the expression product for a group of individuals previ More precisely, in the methods described above RNA ously verified as non colorectal cancer individuals, transcript or DNA copies are brought into contact with at US 9,689,041 B2 3 4 least one hybridization probe and at least one primer and comprising from 5 to 100 nucleotides, preferably from 15 to more particularly at least one hybridization probe and two 30 nucleotides that allow the initiation of an enzymatic primers. polymerization, for instance an enzymatic amplification The term “hybridization' is intended to mean the process reaction. The term “enzymatic amplification reaction' is during which, under appropriate conditions, two nucleotide intended to mean a process which generates multiple copies fragments bind with stable and specific hydrogen bonds so of a nucleotide fragment through the action of at least one as to form a double-stranded complex. These hydrogen enzyme. Such amplification reactions are well known to bonds form between the complementary adenine (A) and those skilled in the art and mention may in particular be thymine (T) (or uracile (U)) bases (this is referred to as an made of the following techniques: PCR (polymerase chain A-T bond) or between the complementary guanine (G) and 10 reaction), as described in U.S. Pat. No. 4,683,195, U.S. Pat. cytosine (C) bases (this is referred to as a G-C bond). The No. 4,683.202 and U.S. Pat. No. 4,800,159, LCR (ligase hybridization of two nucleotide fragments may be complete chain reaction), disclosed, for, example, in patent applica (reference is then made to complementary nucleotide frag tion EPO 201184, RCR (repair chain reaction), described in ments or sequences), i.e. the double-stranded complex patent application WO 90/01069, 3SR (self sustained obtained during this hybridization comprises only A-T 15 sequence replication) with patent application WO 90/06995, bonds and C-G bonds. This hybridization may be partial NASBA (nucleic acid sequence-based amplification) with (reference is then made to Sufficiently complementary patent application WO 91/02818, TMA (transcription medi nucleotide fragments or sequences), i.e. the double-stranded ated amplification) with U.S. Pat. No. 5,399,491 and RT complex obtained comprises A-T bonds and C-G bonds that PCR. make it possible to form the double-stranded complex, but When the enzymatic amplification is a PCR, it is used at also bases not bound to a complementary base. The hybrid least two amplification primers, specific for a target gene, ization between two nucleotide fragments depends on the that allow the amplification material specific for the target working conditions that are used, and in particular on the gene. The material specific for the target gene then prefer stringency. The stringency is defined in particular as a ably comprises a complementary DNA obtained by reverse function of the base composition of the two nucleotide 25 transcription of messenger RNA derived from the target fragments, and also by the degree of mismatching between gene (reference is then made to target-gene-specific cDNA) two nucleotide fragments. The stringency can also depend or a complementary RNA obtained by transcription of the on the reaction parameters, such as the concentration and the cDNAS specific for a target gene (reference is then made to type of ionic species present in the hybridization solution, target-gene-specific cRNA). When the enzymatic amplifi the nature and the concentration of denaturing agents and/or 30 cation is a PCR carried out after a reverse transcription the hybridization temperature. All these data are well known reaction, reference is made to RT-PCR. and the appropriate conditions can be determined by those The term “hybridization probe' is intended to mean a skilled in the art. In general, depending on the length of the nucleotide fragment comprising at least 5 nucleotides, such nucleotide fragments that it is intended to hybridize, the as from 5 to 100 nucleotides, in particular from 10 to 75 hybridization temperature is between approximately 20 and 35 nucleotides, such as 15-35 nucleotides and 60-70 nucleo 70.degree. C., in particular between 35 and 65.degree. C. in tides, having a hybridization specificity under given condi a saline Solution at a concentration of approximately 0.5 to tions so as to form a hybridization complex with the material 1 M. A sequence, or nucleotide fragment, or oligonucleotide, specific for a target gene. In the present invention, the or polynucleotide, is a series of nucleotide motifs assembled material specific for the target gene may be a nucleotide together by phosphoric ester bonds, characterized by the 40 sequence included in a messenger RNA derived from the informational sequence of the natural nucleic acids, capable target gene (reference is then made to target-gene-specific of hybridizing to a nucleotide fragment, it being possible for mRNA), a nucleotide sequence included in a complemen the series to contain monomers having different structures tary DNA obtained by reverse transcription of said messen and to be obtained from a natural nucleic acid molecule ger RNA (reference is then made to target-gene-specific and/or by genetic recombination and/or by chemical Syn 45 cDNA), or else a nucleotide sequence included in a comple thesis. A motif is a derivative of a monomer which may be mentary RNA obtained by transcription of said cloNA as a natural nucleotide of nucleic acid, the constitutive ele described above (reference will then be made to target-gene ments of which are a Sugar, a phosphate group and a specific crNA). The hybridization probe may include a label nitrogenous base; in DNA, the Sugar is deoxy-2-ribose, in for its detection. The term “detection' is intended to mean RNA, the sugar is ribose; depending on whether DNA or 50 either a direct detection Such as a counting method, or an RNA is involved, the nitrogenous base is selected from indirect detection by a method of detection using a label. adenine, guanine, uracile, cytosine and thymine; alterna Many methods of detection exist for detecting nucleic acids tively the monomer is a nucleotide that is modified in at least (see, for example, Kricka et al., Clinical Chemistry, 1999, no one of the three constitutive elements; by way of example, 45 (4), p. 453-458 or Keller G. H. et al., DNA Probes, 2nd the modification may occur either at the level of the bases, 55 Ed., Stockton Press, 1993, sections 5 and 6, p. 173-249. The with modified bases such as inosine, methyl-5-deoxycyti term “label' is intended to mean a tracer capable of gener dine, deoxyuridine, dimethylamino-5-deoxyuridine, ating a signal that can be detected. A non limiting list of diamino-2,6-purine, bromo-5-deoxyuridine or any other these tracers includes enzymes which produce a signal that modified base capable of hybridization, or at the level of the can be detected, for example, by colorimetry, fluorescence or Sugar, for example the replacement of at least one deoxyri 60 luminescence. Such as horseradish peroxidase, alkaline bose with a polyamide (P. E. Nielsen et al. Science, 254, phosphatase, beta-galactosidase, glucose-6-phosphate dehy 1497-1500 (1991)), or else at the level of the phosphate drogenase; chromophores such as fluorescent, luminescent group, for example its replacement with esters in particular or dye compounds; electron dense groups detectable by selected from diphosphates, alkyl- and arylphosphonates electron microscopy or by virtue of their electrical properties and phosphorothioates. 65 Such as conductivity, by amperometry or Voltametry meth For the purpose of the present invention, the term “ampli ods, or by impedance measurement; groups that can be fication primer' is intended to mean a nucleotide fragment detected by optical methods such as diffraction, surface US 9,689,041 B2 5 6 plasmon resonance, or contact angle variation, or by physi nomenon, i.e. the pairing, by complementarity, of the bases cal methods such as atomic force spectroscopy, tunnel effect, of two DNA and/or RNA sequences. The biochip method is etc.; radioactive molecules such as 'P S or 'I. based on the use of capture probes attached to a solid For the purpose of the present invention, the hybridization Substrate, on which probes a sample of target nucleotide probe may be a “detection' probe. In this case, the “detec fragments directly or indirectly labeled with fluorochromes tion' probe is labeled by means of a label. The detection is made to act. The capture probes are positioned specifically probe may in particular be a “molecular beacon' detection on the Substrate or chip and each hybridization gives a probe as described by Tyagi & Kramer (Nature biotech, specific piece of information, in relation to the target nucleo 1996, 14:303-308). These “molecular beacons” become tide fragment. The pieces of information obtained are cumu fluorescent during the hybridization. They have a stem-loop 10 type structure and contain a fluorophore and a “quencher” lative, and make it possible, for example, to quantify the group. The binding of the specific loop sequence with its level of expression of one or more target genes. In order to complementary target nucleic acid sequence causes the stem analyze the expression of a target gene, a Substrate com to unroll and the emission of a fluorescent signal during prising a multitude of probes, which correspond to all or part excitation at the appropriate wavelength. The detection 15 of the target gene, which is transcribed to mRNA, can then probe in particular may be a “reporter probe' comprising a be prepared. For the purpose of the present invention, the “color-coded barecode” according to NanoStringTM's tech term “low-density substrate' is intended to mean a substrate nology. comprising fewer than 50 probes. For the purpose of the For the detection of the hybridization reaction, use may be present invention, the term “medium-density substrate” is made of target sequences that have been labeled, directly (in intended to mean a Substrate comprising from 50 probes to particular by the incorporation of a label within the target 10 000 probes. For the purpose of the present invention, the sequence) or indirectly (in particular using a detection probe term “high-density substrate' is intended to mean a substrate as defined above). It is in particular possible to carry out, comprising more than 10 000 probes. before the hybridization step, a step consisting in labeling The cRNAs or cDNAs specific for a nucleic acid of a and/or cleaving the target sequence, for example using a 25 target gene that it is desired to analyze are then hybridized, labeled deoxyribonucleotide triphosphate during the enzy for example, to specific capture probes. After hybridization, matic amplification reaction. The cleavage may be carried the substrate or chip is washed and the labeled cDNA or out in particular by the action of imidazole or of manganese cRNA/capture probe complexes are revealed by means of a chloride. The target sequence may also be labeled after the high-affinity ligand bound, for example, to a fluorochrome amplification step, for example by hybridizing a detection 30 probe according to the sandwich hybridization technique type label. The fluorescence is read, for example, with a described in document WO 91/19812. Another specific scanner and the analysis of the fluorescence is processed by preferred method of labeling nucleic acids is described in information technology. By way of indication, mention may application FR 2780059. be made of the DNA chips developed by the company According to a preferred embodiment of the invention, 35 Affymetrix ("Accessing Genetic Information with High the detection probe comprises a fluorophore and a quencher. Density DNA arrays”, M. Chee et al., Science, 1996, 274, According to an even more preferred embodiment of the 610-614. "Light-generated oligonucleotide arrays for rapid invention, the hybridization probe comprises an FAM DNA sequence analysis, A. Caviani Pease et al., Proc. Natl. (6-carboxy-fluorescein) or ROX (6-carboxy-X-rhodamine) Acad. Sci. USA, 1994, 91, 5022-5026), for molecular diag fluorophore at its 5' end and a quencher (Dabsyl) at its 3' end. 40 noses. In this technology, the capture probes are generally The hybridization probe may also be a “capture' probe. In small in size, around 25 nucleotides. Other examples of this case, the “capture' probe is immobilized or can be biochips are given in the publications by G. Ramsay, Nature immobilized on a solid Substrate by any appropriate means, Biotechnology, 1998, No. 16, p. 40-44; F. Ginot, Human i.e. directly or indirectly, for example by covalence or Mutation, 1997, No. 10, p. 1-10; J. Cheng et al. Molecular adsorption. As Solid Substrate, use may be made of synthetic 45 diagnosis, 1996, No. 1 (3), p. 183-200; T. Livache et al. materials or natural materials, optionally chemically modi Nucleic Acids Research, 1994, No. 22 (15), p. 2915-2921 J. fied, in particular polysaccharides Such as cellulose-based Cheng etal, Nature Biotechnology, 1998, No. 16, p. 541-546 materials, for example paper, cellulose derivatives such as or in U.S. Pat. No. 4,981,783, U.S. Pat. No. 5,700,637, U.S. cellulose acetate and nitrocellulose or dextran, polymers, Pat. No. 5,445,934, U.S. Pat. No. 5,744,305 and U.S. Pat. copolymers, in particular based on Styrene-type monomers, 50 No. 5,807,522. The main characteristic of the solid substrate natural fibers such as cotton, and synthetic fibers such as should be to conserve the hybridization characteristics of the nylon; inorganic materials such as silica, quartz, glasses or capture probes on the target nucleotide fragments while at ceramics; latices; magnetic particles; metal derivatives, gels, the same time generating a minimum background noise for etc. The solid substrate may be in the form of a microtitra the method of detection. Three main types of fabrication can tion plate, of a membrane as described in application WO-A- 55 be distinguished for immobilizing the probes on the sub 94/12670 or of a particle. It is also possible to immobilize on Strate. the substrate several different capture probes, each being First of all, there is a first technique which consists in specific for a target gene. In particular, a biochip on which depositing pre-synthesized probes. The attachment of the a large number of probes can be immobilized may be used probes is carried out by direct transfer, by means of micropi as substrate. The term “biochip' is intended to mean a solid 60 pettes or of microdots or by means of an inkjet device. This substrate that is small in size, to which a multitude of capture technique allows the attachment of probes having a size probes are attached at predetermined positions. The biochip, ranging from a few bases (5 to 10) up to relatively large sizes or DNA chip, concept dates from the beginning of the 1990s. of 60 bases (printing) to a few hundred bases (microdepo It is based on a multidisciplinary technology that integrates sition). microelectronics, nucleic acid chemistry, image analysis and 65 Printing is an adaptation of the method used by inkjet information technology. The operating principle is based on printers. It is based on the propulsion of very Small spheres a foundation of molecular biology: the hybridization phe of fluid (volumes1 nl) at a rate that may reach 4000 US 9,689,041 B2 7 8 dropS/second. The printing does not involve any contact can be analyzed by detecting the mRNAs (messenger between the system releasing the fluid and the surface on RNAS) that are transcribed from the target gene at a given which it is deposited. moment. Microdeposition consists in attaching long probes of a The invention preferably relates to the determination of few tens to several hundred bases to the surface of a glass the expression level of a target gene by detection of the slide. These probes are generally extracted from databases mRNAs derived from this target gene according to any of and are in the form of amplified and purified products. This the protocols well known to those skilled in the art. Accord technique makes it possible to produce chips called microar ing to a specific embodiment of the invention, the expression rays that carry approximately ten thousand spots, called level of several target genes is determined simultaneously, 10 by detection of several different mRNAs, each mRNA being recognition Zones, of DNA on a surface area of a little less derived from a target gene. than 4 cm. Sup.2. The use of nylon membranes, referred to as By way of amplification, it is possible, to determine the “macroarrays', which carry products that have been ampli expression level of the target gene as follows: 1) After fied, generally by PCR, with a diameter of 0.5 to 1 mm and having extracted the total RNA (comprising the transfer the maximum density of which is 25 spots/cm. Sup.2, should 15 RNAs (tRNAs), the ribosomal RNAs (rRNAs) and the not however be forgotten. This very flexible technique is messenger RNAs (mRNAs) from the whole blood, a used by many laboratories. In the present invention, the reverse transcription step is carried out in order to obtain the latter technique is considered to be included among bio complementary DNAs (or cDNAs) of said mRNAs. By way chips. A certain Volume of sample can, however, be depos of indication, this reverse transcription reaction can be ited at the bottom of a microtitration plate, in each well, as carried out using a reverse transcriptase enzyme which in the case in patent applications WO-A-00/71750 and FR makes it possible to obtain, from an RNA fragment, a 00/14896, or a certain number of drops that are separate complementary DNA fragment. The reverse transcriptase from one another can be deposited at the bottom of one and enzyme from AMV (Avian Myoblastosis Virus) or from the same Petri dish, according to another patent application, MMLV (Moloney Murine Leukaemia Virus) can in particu FROO/14691. 25 lar be used. When it is more particularly desired to obtain The second technique for attaching the probes to the only the cDNAs of the mRNAs, this reverse transcription Substrate or chip is called in situ synthesis. This technique step is carried out in the presence of nucleotide fragments results in the production of short probes directly at the comprising only thymine bases (poly T), which hybridize by Surface of the chip. It is based on in situ oligonucleotide complementarity to the polyA sequence of the mRNAS So as synthesis (see, in particular, patent applications WO 30 to form a poly T-polyA complex which then serves as a 89/10977 and WO 90/03382) and is based on the oligo starting point for the reverse transcription reaction carried nucleotide synthesizer process. It consists in moving a out by the reverse transcriptase enzyme. cDNAs comple reaction chamber, in which the oligonucleotide extension mentary to the mRNAs derived from a target gene (target reaction takes place, along the glass Surface. gene-specific cDNA) and cDNAs complementary to the Finally, the third technique is called photolithography, 35 mRNAs derived from genes other than the target gene which is a process that is responsible for the biochips (cDNAs not specific for the target gene) are then obtained. developed by Affymetrix. It is also an in situ synthesis. 2) The amplification primer(s) specific for a target gene is Photolithography is derived from microprocessor tech (are) brought into contact with the target-gene-specific niques. The surface of the chip is modified by the attachment cDNAs and the cDNAs not specific for the target gene. The of photolabile chemical groups that can be light-activated. 40 amplification primer(s) specific for a target gene Once illuminated, these groups are capable of reacting with hybridize(s) with the target-gene-specific cDNAs and a the 3' end of an oligonucleotide. By protecting this surface predetermined region, of known length, of the cDNAs with masks of defined shapes, it is possible to selectively originating from the mRNAs derived from the target gene is illuminate and therefore activate areas of the chip where it specifically amplified. The cDNAs not specific for the target is desired to attach one or other of the four nucleotides. The 45 gene are not amplified, whereas a large amount of target Successive use of different masks makes it possible to gene-specific cDNAs is then obtained. For the purpose of the alternate cycles of protection/reaction and therefore to pro present invention, reference is made, without distinction, to duce the oligonucleotide probes on spots of approximately a “target-gene-specific cDNAs or to “cDNAs originating few tens of square micrometers (Lm). This resolution from the mRNAs derived from the target gene'. This step makes it possible to create up to several hundred thousand 50 can be carried out in particular by means of a PCR-type spots on a surface area of a few square centimeters (cm). amplification reaction or by any other amplification tech Photolithography has advantages: in bulk in parallel, it nique as defined above. By PCR, it is also possible to makes it possible to create a chip of N-mers in only simultaneously amplify several different cDNAs, each one 4.times.N cycles. All these techniques can be used with the being specific for different target genes, by using several present invention. According to a preferred embodiment of 55 pairs of different amplification primers, each one being the invention, the at least one specific reagent of step b) specific for a target gene: reference is then made to multiplex defined above comprises at least one hybridization probe amplification. 3) The expression of the target gene is deter which is preferably immobilized on a substrate. This sub mined by detecting and quantifying the target-gene-specific strate is preferably a low-, high- or medium-density Sub cDNAs obtained in step 2) above. This detection can be strate as defined above. 60 carried out after electrophoretic migration of the target These hybridization steps on a Substrate comprising a gene-specific cDNAS according to their size. The gel and the multitude of probes may be preceded by an enzymatic medium for the migration can include ethidium bromide so amplification reaction step, as defined above, in order to as to allow direct detection of the target-gene-specific increase the amount of target genetic material. cDNAS when the gel is placed, after a given migration The determination of the expression level of a target gene 65 period, on a UV (ultraviolet)-ray light table, through the can be carried out by any of the protocols known to those emission of a light signal. The greater the amount of skilled in the art. In general, the expression of a target gene target-gene-specific cDNAS, the brighter this light signal. US 9,689,041 B2 10 These electrophoresis techniques are well known to those from the whole blood, a reverse transcription step is carried skilled in the art. The target-gene-specific cDNAS can also out as described above in order to obtain the cDNAs of the be detected and quantified using a quantification range mRNAs of the biological material. The polymerization of obtained by means of an amplification reaction carried out the complementary RNA of the cDNA is subsequently until saturation. In order to take into account the variability carried out using a T7 polymerase enzyme which functions in enzymatic efficiency that may be observed during the under the control of a promoter and which makes it possible various steps (reverse transcription, PCR, etc.), the expres to obtain, from a DNA template, the complementary RNA. sion of a target gene of various groups of patients can be The crNAs of the cDNAs of the mRNAs specific for the normalized by simultaneously determining the expression of target gene (reference is then made to target-gene-specific a "housekeeping gene, the expression of which is similar in 10 the various groups of patients. By realizing a ratio of the cRNA) and the cFNAs of the cDNAs of the mRNAs not expression of the target gene to the expression of the specific for the target gene are then obtained. 2) All the housekeeping gene, i.e. by realizing a ratio of the amount of cRNAs are brought into contact with a substrate on which target-gene-specific cDNAS to the amount of housekeeping are immobilized capture probes specific for the target gene gene-specific cDNAs, any variability between the various 15 whose expression it is desired to analyze, in order to carry experiments is thus corrected. Those skilled in the art may out a hybridization reaction between the target-gene-specific refer in particular to the following publications: Bustin SA, cRNAs and the capture probes, the cRNAs not specific for J Mol Endocrinol, 2002, 29: 23-39; Giulietti A Methods, the target gene not hybridizing to the capture probes. When 2001, 25: 386–401. it is desired to simultaneously analyze the expression of By way of hybridization, the expression of a target gene several target genes, several different capture probes can be can be determined as follows: 1) After having extracted the immobilized on the Substrate, each one being specific for a total RNA from the whole blood, a reverse transcription step target gene. The hybridization reaction may also be preceded is carried out as described above in order to obtain cDNAs by a step consisting in labeling and/or cleaving the target complementary to the mRNAs derived from a target gene gene-specific cFNAS as described above. 3) A step consist (target-gene-specific cDNA) and cDNAS complementary to 25 ing of detection of the hybridization reaction is Subsequently the mRNAs derived from genes other than the target gene carried out. The detection can be carried out by bringing the (cDNA not specific for the target gene). 2) All the cDNAs substrate on which the capture probes specific for the target are brought into contact with a Substrate, on which are gene are hybridized with the target-gene-specific cFNA into immobilized capture probes specific for the target gene contact with a “detection' probe labeled with a label, and whose expression it is desired to analyze, in order to carry 30 out a hybridization reaction between the target-gene-specific detecting the signal emitted by the label. When the target cDNAs and the capture probes, the cDNAs not specific for gene-specific crNA has been labeled beforehand with a the target gene not hybridizing to the capture probes. The label, the signal emitted by the label is detected directly. The hybridization reaction can be carried out on a solid substrate use of cFNA is particularly advantageous when a substrate which includes all the materials as indicated above. Accord 35 of biochip type on which a large number of probes are ing to a preferred embodiment, the hybridization probe is hybridized is used. immobilized on a substrate. Preferably, the substrate is a When the expression product is a polypeptide it can be low-, high- or medium-density Substrate as defined above. detected by bringing it in contact with at least one specific The hybridization reaction may be preceded by a step ligand, such as defined below. In a preferred embodiment the consisting of enzymatic amplification of the target-gene 40 expressed polypeptide is brought into contact with at least specific cDNAs as described above, so as to obtain a large two specific ligands, such as defined below. Specific ligand amount of target-gene-specific cDNAS and to increase the means for example an antibody or an affinity protein named probability of a target-gene-specific cDNA hybridizing to a NanofitinTM. capture probe specific for the target gene. The hybridization reaction may also be preceded by a step consisting in 45 Nanofitins are affinity proteins with competitive features. labeling and/or cleaving the target-gene-specific cDNAS as They present a competitive affinity, similar to antibodies. described above, for example using a labeled deoxyribo The term “antibody or antibodies' embraces polyclonal nucleotide triphosphate for the amplification reaction. The antibodies, monoclonal antibodies, humanized antibodies, cleavage can be carried out in particular by the action of recombinant antibodies. Their production methods are well imidazole and manganese chloride. The target-gene-specific 50 known by the person skilled in the art. cDNA can also be labeled after the amplification step, for example by hybridizing a labeled probe according to the The present invention also includes a kit for determining sandwich hybridization technique described in document in vitro the probability for an individual to suffer from a WO-A-91/19812. Other preferred specific methods for colorectal cancer comprising at least one binding partner labeling and/or cleaving nucleic acids are described in 55 specific for at least one nucleic acid sequence and no more applications WO 99/65926, WO 01/44507, WO 01/44506, than 7 binding partners specific for 7 expression products of WO 02/090584, WO 02/090319. 3) A step consisting of 7 nucleic acid sequences, wherein the at least one binding detection of the hybridization reaction is Subsequently car partner is specific for at least one expression product of at ried out. The detection can be carried out by bringing the least one nucleic acid sequence selected from the group substrate on which the capture probes specific for the target 60 consisting of sequences set forth in SEQ ID NOs: 1 to 11. gene are hybridized with the target-gene-specific cDNAs Especially, the kit comprises a combination of 7 binding into contact with a “detection’ probe labeled with a label, partners which are specific for the expression products of 7 and detecting the signal emitted by the label. When the nucleic acid sequences having the sequences set forth in target-gene-specific cDNA has been labeled beforehand with SEQID NOs: SEQID NO: 1, SEQID NO:2 or SEQID NO: a label, the signal emitted by the label is detected directly. 65 3 or SEQID NO: 4, SEQID NO: 5 or SEQID NO: 6, SEQ The expression of a target gene can also be determined in ID NO: 7 or SEQ ID NO: 8, SEQ ID NO:9, SEQ ID NO: the following way: 1) After having extracted the total RNA 10 and SEQ ID NO: 11. US 9,689,041 B2 11 12 In the kit the specific binding partner comprises: 2. RNA Extraction and Microarray Experiments at least one hybridization probe, Total RNA was extracted with the PAXgeneTM Blood or at least one hybridization probe and at least one primer, RNA System (PreAnalytix) following manufacturers O instructions. The quantity of total RNA was measured by at least one hybridization probe and two primers, or spectrophotometer at optical density 260 nanometers and the at least one specific ligand or at least two specific ligands, quality was assessed using the RNA 6000 Nano LabChip(R) Such as antibody and/or affinity protein. Kit on a BioAnalyzer Agilent 2100 (Agilent Technologies, Finally, the invention concerns the use of at least one Palo Alto, Calif., U.S.A.). Only samples with RNA Integrity specific binding partner for at least one expression product Number between 7 and 10 were analyzed. 50 nanograms of of at least one nucleic acid sequence and no more than 7 10 total RNA was then reversely transcripted and linearly specific binding partners for 7 expression products of 7 amplified to single strand cDNA using Ribo-SPIATM tech nucleic acid sequences, said at least one nucleic acid nology with WTOvationTM RNA Amplification System sequence having a sequence selected from the group con (NuGEN Technologies Inc., San Carlos, Calif., U.S.A.) sisting of nucleic acid sequences set forth in SEQID NOS 1 according to the manufacturer's standard protocol, and the to 11, in the manufacture of a composition for determining 15 products were purified with QIAquickTM PCR purification in vitro the probability for an individual to suffer from a kit (QIAGEN GmbH, Hilden, Germany). 2 micrograms of colorectal cancer. amplified and purified cDNA were subsequently fragmented Especially, the use of a combination of 7 specific binding with RQ1 RNase-Free DNase (Promega Corp., Fitchburg, partners which are specific for 7 expression products of 7 Wis., U.S.A.) and labeled with biotinylated deoxynucleoside nucleic acid sequences having the sequences set forth in triphosphates by Terminal Transferase (Roche Diagnostics SEQID NOs: SEQID NO: 1, SEQID NO:2 or SEQID NO: Corp., Indianapoli, Ind., U.S.A.) and GeneChip(R) DNA 3 or SEQID NO: 4, SEQID NO: 5 or SEQID NO: 6, SEQ Labeling Reagent (Affymetrix Inc., Santa Clara, Calif., ID NO: 7 or SEQ ID NO: 8, SEQ ID NO:9, SEQ ID NO: U.S.A). The labeled cDNA was hybridized onto the 10 and SEQ ID NO: 11. GeneChip HG U133 Plus 2.0 Array (Affymetrix) in a Specific binding partner comprises: 25 Hybridization Oven 640 (Agilent Technologies) at 60 rota at least one hybridization probe, tions per minute, 50° C. for 18 hours. The HGU133 Plus 2.0 or at least one hybridization probe and at least one primer, Array contains 54.675 probe sets representing approxi O mately 39,000 best-characterized human genes. After at least one hybridization probe and two primers, or hybridization, the arrays were washed and stained according at least one specific ligand or at least two specific ligands, 30 to the Affymetrix protocol EukGE-WS2v4 using a Such as antibody and/or affinity protein. GeneChip(R) Fluidics Station 450 (Affymetrix). The arrays were scanned with the GeneChip(R) Scanner 3000 (Affyme EXAMPLE trix). 3. Statistical Analysis I) Materials and Methods 35 Microarray data quality control was performed according 1. Patients and Sample Collection to the Suggestions of standard Affymetrix quality control Peripheral blood samples from 161 colorectal patients parameters. The Affymetrix expression arrays were prepro (CRC) and 148 colonoscopy negative control patients cessed globally by Robust Multi-chip Average method (CNCs) were collected, between 2006 and 2010. The CRC (RMA) with background correction, quantile normalization patients were recruited at the Department of Colorectal 40 and median polish Summarization (Irizarry R A et al., Surgery, FDUSCC, China. The tumors were staged accord Biostatistics 203: 4:249-64). ing to the International Union Against Cancer (UICC) For cohort 1 data, the probesets with extreme signal recommended tumor-node-metastasis (TNM) system. No intensity (lower than log 2 (50) or higher than 2E14) were patient received preoperative radiotherapy or chemotherapy. filtered out. Then, biological knowledge based filtering were Patients suffering from hereditary colorectal cancer or 45 performed using the information of Entrez Gene Database inflammatory bowel disease (Crohn's disease or ulcerative (Maglott D et al., Nucleic Acids Research 2007: 35:D26 colitis) were excluded from this study. The CNCs, without 31). Probesets without Entrez, Gene ID annotation were any symptom of polyps or colorectal cancer, which had been removed. For multiple probesets mapping to the same confirmed by colonoscopy, were enrolled from the Commu Entrez, Gene ID, only the probeset with the largest value of nity Hospital in Shanghai area and FDUSCC. For each 50 Inter Quantile Range were retained and the others were patient, 2.5 ml of peripheral blood were collected into removed. After two-steps filtering, 9,859 probesets were PAXgeneTM Blood RNA tubes (PreAnalytiX GmbH, Hom kept for the downstream analysis. To reduce the likelihood brechtikon, CH) and processed according to manufacturers of batch effect, Combat method was applied to the filtered guidelines. expression data (Johnson W E et al., Biostatistics 2007; The study involves two separate cohorts of participants. 55 8:118-27). Differentially Expressed Gene (DEG) analysis Cohort 1 consists of 100 CRC patients and 100 CNCs. For was performed by the Significance Analysis of Microarrays CRC patients, blood samples were collected in FDUSCC at (SAM) method (False Discovery Rate=0.05: Type="Two least one week after colonoscopy, before surgery. For CNCs, class unpaired”; test statistic="t-statistic'; number of per blood samples were collected in a Community Hospital in mutations=1,000) (Tusher V. G. et al., PNAS USA 2001, the Shanghai area one week before the colonoscopy. The 60 98:51 16-21). Significant gene selection and predictive gene expression profiles from these samples were analyzed model construction were performed using a 5-fold cross as a train set to search for significant genes associated with validation process with RFE-SVM method. Among the 200 CRC and identify molecular signature. Cohort 2 includes 61 samples in train set, 160 were randomly selected to form a CRC patients and 48 CNCs. Samples were collected in the learning set; the predictive models were created with the same way as cohort 1. Cohort 2 was used as an independent 65 different sizes ranging from 1 to 100 genes scored by test set to verify the signature performance that observed in RFE-SVM; and the model performance was assessed using the cohort 1. the rest of 40 samples. This process was repeated 1,000 US 9,689,041 B2 13 14 times. Our result Suggested that a maximum 97% accuracy TABLE 1-continued was achievable with the 100-gene based SVM predictive models. The signature size optimization took into account Clinical characteristics of the Patients the prediction performance, signature complexity and economy. Finally, we identified seven core genes with Train set Test set overall 90% accuracy to meet our target performance. The CRC Control CRC Control seven genes were selected by t-test P value, fold change, Variable (n = 100) (n = 100) (n = 61) (n = 48) biological function and not related to age or gender factors. Stage II 36 19 II) Results (36.0%) (31.1%) 10 Stage 24 17 1. Characteristics of the Colorectal Cancer and Control III (24.0%) (27.9%) Patient Populations Stage IV 24 17 Among 309 participants in the two cohorts, there were (24.0%) (27.9%) 161 CRC patients and 148 CNCs. The demographic and clinical characteristics of the patients are summarized in the 15 2.7-Gene CRC Biomarker Panel: Identification and Valida table 1. tion Train set: the inventors performed significant gene selec tion and prediction model construction based on 5-fold cross TABLE 1. validation process. The process was run for 1,000 iterations. Within each iteration, they recorded the unique top-7 gene Clinical characteristics of the Patients set and its corresponding prediction model performance Train set Test set accessed by internal test fold. Eventually, the overall per formance was estimated by taking the average performance CRC Control CRC Control of 1,000 prediction models in the internal test fold. The Variable (n = 100) (n = 100) (n = 61) (n = 48) results show that an overall accuracy performance of 90.0% is achievable with prediction models. The inventors have Age - yr 25 selected the best 7-gene prediction model, for which 90.0% Mean 57.6 56.5 SS.4 55.2 accuracy, 89.0% sensitivity and 91.0% specificity for the Range 27-78 38-74 34-82 38-70 train set. Sex - Test set: the inventors have then verified the performance no. (%) of the signature of the above prediction model identified in 30 Male 50 50 32 3 the train set in an independent cohort (test set) including 109 (50.0%) (50.0%) (52.5%) (6.3%) samples, 61 CRCs and 48 CNCs. The overall performance Female 50 50 29 45 of this signature are 83.0% (CI %:739, 88.9) accuracy, (50.0%) (50.0%) (47.5%) (93.7%) 84.0% (CI %:71.5, 91.4) sensitivity, and 81.0% (CI %:66.9, Tumor site - 86.6) specificity. no. (%) 35 3. Analysis of Discriminative Capacities of Individual Genes from the Signature Observed from the Train Set. Colon 41 33 The table 2 below summarizes the individual performance (41.0%) (54.1%) Rectum 59 28 of said 7 genes. For each gene are given the individual (59.0%) (45.9%) characteristics like Probeset id (Affymetrix probeset iden Stage I 16 8 40 tification), T test P value observed between 100 CNCs and (16.9%) (13.1)% 100 CRCs, and Fold Change observed between 100 CNCs and 100 CRCs. TABLE 2

Gene SEQ ID Mean t-test P Fold Direction Probeset id: Symbol: * NOS: signal ** value Change (in CRC) 227062 at NEAT1 1 621 3.84 10' 1.46 up 223204 at FAM198B 2, 3, 4 97 3.56 10-2 1.52 up 205785 at ITGAM 5, 6 95 1.35 107 1.32 up 213906 at MYBL1 7, 8 139 3.51 108 1.38 down 209339 at SIAH2 9 252 8.06 106 1.25 up 1553589 a at PDZK1IP1 10 407 9.32 10 1.37 down 1553991 s at VSIG10 11 65 1.47 10- 1.41 up Up: means that the mean signal for the CRC group is higher than in the CNC group Down means that the mean signal for the CRC group is lower than for CNC group * means Probeset id according to Affymetrix annotation version in 2010 (https://www.affymetrix.com analysisinetaffx.xmlauery, affx?netaffx=netaffx4 annot& requested=403680) means the identified gene and its variants or related sequences to said gene or variants *** means average signals observed for 100 CRCs and 100 CNCs array experiments

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS : 11

<21 Os SEQ ID NO 1 &211s LENGTH: 22743

US 9,689,041 B2 37 38 - Continued t cct taaaat ttgttctato tacttatatt taatttgatt taatggittac togcc.cagata 294 O ttgaga attg gttcaaat at tdagt.gtgtt toaatatatt atctggctta tittcaacatg 3 OOO agtaatatga gcaaaataag ttaaaacctg cqtctgatca attitt cotca tdactagaac 3 O 6 O taaaac agta aatttggaca at attaagcc ticaaataatc atcto caaac toctitctaac 312 O actittittaaa toagattgga agacatggac aaatcaggitt catgtgttgc atc.tt tatgt 318O c ctittgccaa tat coaagat catca catat gigtagatatt cacatggagt ttcaaattica 324 O gaatagatta ccattacctt cct gcc.ctta cacat cotac toctitattta aaagttctat 33 OO ttgttgactitt to attt cotgaaagtttalaa aatacaattt gagaatgttt ataatacatt 3360 citct cotgtc. ttitt cacggit tacgtctgtt attgctgaaa tacaccacat tittctttgtt 342O ctgg to aagg ttaact caat atctgttgttga aagagaacta ctaacaacgt tacaatagag 3480 gctagatttgaaaaaaaaaa totatagatc taattgatac aattgtagaa caaaatgtca 354 O aaataatgtt ttalagtataa gagaagatgg accaaggaga gagagat cat ttgaaaatct 36OO aattgtagct tttctaggct cacatt catg tact acttitt agcacccitta tdggctgtgc 366 O tcqc coccitg gacagttgag ctittggatta t ct tcct citt caattitt coc totattgacc 372 O cgagtgtc.t.c cct ctdct to tacagattta tag tact cott toggct cittitt gag to tccac 378 O ttitt acticac tdt citctggg atttittaaga t cottitt citt ct cittataaa totatic ct citt 384 O aatgaaaatt agcctaacaa aagtttggag actggaatcc tactittgagc cactgacttg 3900 aaataact ct tttggcaagt togcctgacat cotgtc.ttac caaggtggca tatttgcatt 396 O tttact.gctt aaaacatttt tttittittitta c catctittat c caaattitat cat attgatg 4 O2O gtaggactaa Caggctttitt agaagctggc tittaactittg agt ct caagc tacaatgctg 4 O8O ttgggcagcc tdgtott.ccc acgtgagggit tta actttgt ttatttgcct c cagttatt c 414 O caaaatgctt attaaatgaa agt cocagga acatgttitat tittagt cacc tittgctttitt 42OO aacaattittgttttgtaatc aatgagtaat t catgatgaa ttatttittga ctaatggata 426 O gcc.gaaggcc aggcttittaa ttctaatagg taatgttctt ctitttgtc.tt attgaaacaa 432O tgagaatact ctdtgcattt caaatgcact c cqattatgc tigtggittitta t t cacataag 438 O cacaatatgt gttittattta taactitcata acaaactitat aatataataa tttacct tag 4 44 O caga catgca aaa.gct tatt Cttgttgttgac titactitt citt taagctaata atataaaaat 4500 aaatatgtat cittaaaaatc tataataaaa cattagaaat taaagatatg togctttittat 456 O tittgcagatg agttcatttg Cttttgtaga tigtgtttitca gagct aggta Cagaggaatg 462O tittgctacct ttagcggtga aaaaagaaag agagt caaga attttgttgg attgttgtttg 468O tgttgtgcata tatttgatat cat cattata tttgtaatct ttggacttgt aat catago c 474. O tgtttatt ct actgtgcc at taaatatact ttaccttata cataacgaat aaaataccta 48OO gaagtagatt tatttacaaa aaaaaaaaaa aaa 4833

<210s, SEQ ID NO 3 &211s LENGTH: 4854 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 3 gtttittaaaa gotttgtatic ticittaaaacc atgcagoagt cagtttccaa gttittgctitt 6 O gcaatcagta gttittcaagg gagcttittaa agctgaactgaaatgtttga aatgtggaac 12 O actic ttgacc atgaaatatgttctacttac atgcct cago ctittaaaagt totttgcatt 18O

US 9,689,041 B2 41 42 - Continued gagcatcaga gct gaccata catttaaa.ca gaaatgctgg tittatttgca aaatcaccag 2580 tatattitt ct attgttgttcta taaaaaatca gtcatttaag tacaagaatc at attitt coa 264 O titcc tttitta gaaatttatt ttgttgtc.cc tatggaaatc attca catct gacaattitat 27 OO atgttaaaga gttt tact ct citctattittg gtccaatttg tat citagtgg ctgagaaatt 276 O aaataattct aaagtatgaa gttacct atc tdaaaatgta cittacagagt at cattittaa 282O aatggatgtc. tctittaaaaa ttttgttact tttaccalaca atgtaatata atttatgtat 288O attitt attaa taatagtgaa titcCttaaaa tttgttctat gtact tatat ttaatttgat 294 O ttaatggitta citgcc.cagat attgagaatt ggttcaaata ttgagtgttgt ttcaatatat 3 OOO tatctggctt atttcaa.cat gagtaatatg agcaaaataa gttaaaacct gcgtctgat c 3 O 6 O aattitt cotc atgact agaa ctaaaacagt aaatttggac aat attaagc citcaaataat 312 O catcto caaa citccttctaa cactttittaa atcagattgg aagacatgga caaat caggit 318O t catgtgttg catctt tatgtcc tittgcca at atccaaga t catcacata togg tagatat 324 O t cacatggag tittcaaattic agaatagatt accattacct tcc togcc citt acacatccta 33 OO cticcittattt aaaagttcta tttgttgactt tt cattt cott gaaagtttaa aaatacaatt 3360 tgagaatgtt tataatacat t ct ct cotgt cittitt cacgg ttacgtctgt tattgctgaa 342O atacaccaca ttittctttgt totggtcaag gttaact caa tatctgtgttgaaagagaact 3480 actaacaacg ttacaataga gqctagattt gaaaaaaaaa atctatagat ctaattgata 354 O

Caattgtaga acaaaatgtc. aaaataatgt tittaagtata agagaagatg gaccalaggag 36OO agagagat catttgaaaatc taattgtagc titt to taggc t cacatt cat gtact actitt 3660 tagcaccctt atgggctgtg citcqc cocct gga cagttga gctittggatt atct tcc tot 372 O t caattitt co citctattgac ccgagtgtct c cct c togctt ctacagattt at agtacticc 378 O ttggct ctitt tagt citcca ctitt tactica citgtc.tctgg gatttittaag atccttittct 384 O t ct cittataa atcatcct ct taatgaaaat tagcc talaca aaagtttgga gactggaatc 3900 c tactittgag ccactgactt gaaataactic titttggcaag ttgcc tigaca toctdtctta 396 O c caaggtggc at atttgcat titt tactgct taaaacattt tttitttittitt accatctitta 4 O2O tccaaattta t catattgat gig taggacta acaggcttitt tagaa.gctgg ctitta actitt 4 O8O gagt ct caag ctacaatgct gttgggcago Ctggit ct tcc cacgtgaggg tttalactittg 414 O tittatttgcc ticcagttatt coaaaatgct tattaaatga aagtic cc agg aacatgttta 42OO ttittagt cac ctittgcttitt taacaattitt gttttgtaat caatgagtaa tt catgatga 426 O attatttittg actaatggat agc.cgaaggc caggcttitta attctaatag gtaatgttct 432O t ctitttgtct tattgaaaca atgagaatac totgtgcatt toaaatgcac toccattatg 438 O ctgtggittitt attcacataa goacalatatgtgttt tattt ataactt cat aacaaactta 4 44 O taatataata atttacctta gcaga catgc aaaagctitat t cittgttgttga citt actittct 4500 ttaa.gctaat aatataaaaa taaatatgta t cittaaaaat citataataaa acattagaaa 456 O ttaaagatat gtgctttitta ttittgcagat gagttcattt gcttttgtag atgtgttitt c 462O agagct aggt acagaggaat gtttgctacc tittagcggtgaaaaaagaala gagagt caag 468O aattttgttg gattgttgttt gtgtgtgcat at atttgata t catcattat atttgtaatc 474. O tittggactitg taat catago citgtttatt c tactgtgcca ttaaatatac tttacctitat 48OO acataacgaa taaaatacct agaagtag at ttatttacaa aaaaaaaaaa aaaa 4854

US 9,689,041 B2 45 46 - Continued act acttgag aaatatttgt ttatatttitt citctaac atc atgctatgtg toagtctgaa 222 O catctgacaa cagaaattitc agittattatt ctagotaagt tittgaaaa.ca tttgt catgc 228O tgtttaatag aaaactgcaa accagaga.ca citgactic cat taataaacca tattttgtgc 234 O cgttittgact gttctgacca aatactaatg ggaacaattic ttgacgttitt totgttgctg 24 OO attgttaa.ca tagagcagtic tictacactac cct gaggcaa citctacattg gaacact gag 246 O gcttacagcc tdcaagagca totagagctga ccatacattt aaacagaaat gctggittitat 252O ttgcaaaatc accagtatat tittctattgt gtctataaaa aat cagt cat ttaagtacaa 2580 gaat catatt titc cattcct ttittagaaat ttattttgtt gtc.cc tatgg aaatcattca 264 O catctgacaa tittatatgtt aaagagttitt act ct ct cta ttittggit coa atttgtatict 27 OO agtggctgag aaattaaata attctaaagt atgaagttac citatctgaaa atgtact tac 276 O agagtat cat tittaaaatgg atgtc.t.ctitt aaaaattittgttactitttac caacaatgta 282O atataattta totatattitt attaataata gtgaatticct taaaatttgt totatgtact 288O tatatttaat ttgatttaat gigt tactgcc cagatattga gaattggttcaaatattgag 294 O tgttgtttcaa tat attatct ggct tatttic aacatgagta atatgagcaa aataagttaa 3 OOO aacctg.cgt.c tdatcaattt to ct catgac tagaactaaa acagtaaatt toggacaatat 3 O 6 O taagcct caa ataatcatct c caaactic ct tctaacactt tittaaat cag attggaagac 312 O atggacaaat caggttcatg togttgcatct titatgtc.ctt togccaatatic caagat catc 318O acatatggta gatatt caca toggagttt ca aattcagaat agattaccat tacct tcctg 324 O c cct tacaca t cc tactic ct tatttaaaag ttctatttgt gacttitt cat ttic ctgaaag 3300 tittaaaaata caatttgaga atgtttataa tacattct ct c ct gttcttitt cacggittacg 3360 tctgttattg ctgaaataca cca catttitc tttgttctgg toaaggittaa citcaatat ct 342O gtgtgaaaga gaactact aa caacgttaca atagaggcta gatttgaaaa aaaaaatcta 3480 tagatctaat tdatacaatt gtagaacaaa atgtcaaaat aatgttittaa gtataagaga 354 O agatggacca aggaga.gaga gat catttga aaatctaatt gtagcttitt C taggcticaca 36OO tt catgtact acttittagca ccct tatggg ctgtgct cqc ccc.ctggaca gttgagctitt 366 O ggattatctt cotcttcaat titt coct cita ttgaccc.gag tdt ct coctic togcttctaca 372 O gatttatagt act cottggc ticttittgagt citccacttitt act cactgtc. tctgggattit 378 O ttaagatcct titt cittct ct tataaatcat cotcttaatgaaaattagcc taacaaaagt 384 O ttggag actg gaatcc tact ttgagccact gactitgaaat aactic ttittg gcaagttgcc 3900 tgacatcc td tottaccalag gtggcatatt togcatttitta citgcttaaaa catttitttitt 396 O tttittaccat ctittatccaa attitat cata ttgatgg tag gactaac agg ctittittagaa 4 O2O gctggCttta actittgagtic tica agctaca atgctgttgg gcagcctggit Ctt Cocacgt 4 O8O gagggitttaa citttgttt at ttgcct coag titatt coaaa atgct tatta aatgaaagtic 414 O c caggaacat gtttattitta gttcacctittg citttittaa.ca attttgttitt gtaat caatg 42OO agtaattcat gatgaattat ttittgactaa tdgatagc.cg aaggc.caggc titttaattct 426 O aatagg taat gttcttcttt tdt cittattgaaacaatgag aatactctgt gcatttcaaa 432O tgcact coga t tatgctgtg gttittatt ca cataagcaca atatgtgttt tatttataac 438 O tt cataacaa acttataata taataattta cct tagcaga catgcaaaag cittatt cittg 4 44 O tgttgacttac titt ctittaag ctaataatat aaaaataaat atgitat citta aaaatctata 4500 ataaaacatt agaaattaaa gatatgtgct ttittattittg cagatgagtt catttgctitt 456 O

US 9,689,041 B2 57 58 - Continued aaaagaaaat aaaggaactt gagatgct tc titatgtcagc tigagaatgala gttagaagaa 26 O agcgaatt CC at Cacagcct ggaagtttitt Ctagctggtc tigtagttt C ct catggatg 32O ataa catgtc. taatacticta aatagccttg acgagcacac tagtgagttt tacag tatgg 38O atgaaaatca gcctgtgtct gct cagoaga attcacccac aaagttcct g gcc.gtggagg 44 O caaacgctgt gttatcct ct ttgcagacca toccagaatt togcagagact citagaactta SOO ttgaatctga t cotgtagca toggagtgacg ttaccagttt tdatatttct gatgctgctg 560 cittct c citat caaatccacc ccagttaa at taatgagaat t cago acaat gaaggagc.ca 62O tggaatgc.ca atttalacgt.c agt cttgtac ttgaagggala aaaaaac act totaatggtg 68O gcaa.ca.gtga agctgttcct ttaa.catc cc caaatatago caagtttagc act coaccag 74 O c catcct cag aaagaa.gaga aaaatgcgag togt catt C C cc aggcagc gaact taggg 8OO atggct catt gaacgatggt gigtaatatgg cqctaaaa.ca tacac cactgaaaacactac 86 O cattttct cott cacagttt ttcaacacat gtc.ctgg taa togaacaactt aatatagaaa 92 O atcc tt catt tacatcaa.cc cct atttgtg ggcagaaagc tict cattaca act cotcttic 98 O ataaggaaac aacticc caaa gatcaaaagg aaaatgtagg gtttagaaca cct act atta 2O4. O gaagat citat actggg tacc acaccaagaa citcct acticc ttittaagaat gcgcttgctg 21OO cticaggagaa aaaatatgga cct cittaaaa ttgtgtc.cca gcc acttgct ttcttggaag 216 O aagatatt.cg ggaagttitta aaagaagaala Ctggalacaga cct attic ct c aaagaggaag 222 O atgaac Ctgc titacaaaagc tigcaaacaag agaat accgc titctgggaag aaagt cagaa 228O aatcactagt Cittagata at tiggaaaaag aagaat Cagg CactCaactg ttgactgaag 234 O acattt caga catgcagt ca gaaaatagat titact acatc citt attaatg ataccattat 24 OO tggaaataca tdacaatagg togcaacttga titcctgaaaa acaagatata aattcaacca 246 O acaaaacata tacact tact aaaaagaaac caaac cctaa cact tccaaa gttgtcaaat 252O tggaaaagaa ticttcagt ca aattgttgaat gggaalacagt ggttt atggg alagacagaag 2580 acca actitat tatgactgaa caa.gcaagaa gat atctgag tacttacaca gctaccagta 264 O gtacttcaag agct ct cata citgitaattgt tattaaaatt gatgaaatgc cccactic cct 27 OO tactgcagtic tictact aaat taggttgcag tdaaatttitt citcaattagt tdtttittaaa 276 O gttgtaagat agc ccttitta atacagoat c titttittctat tctatatagt aggcagaaag 282O c tagtaagtic acttaagggg tagatagttt catagitttat tttittaa.gag atgagattitt 288O taaaaattgt ttittaaagaa caagatggga aaataataga atgttcatgg atttctaaaa 294 O gtaaattct catatatttitc titcacaagat atatgttgct act ct cittga tigctgcagtt 3 OOO ttgttataga taggtg tatgagtatatatg atttctgaaa ttagt citatg tatggaaagc 3 O 6 O acacatgatt titatgaagta Cttittgcc.ca tdtgctgatt tact taggct accatttaca 312 O aagaalacaca ttgaaaagga atttalaagga aggatagaaa gttgcactac taattittittg 318O tttitttittitt cagaagicagt aaaattaact acagtgttaa atgtattitat ttgagcatag 324 O tactgaaaac aaaaag catt caaaaaagag titttittctitt attagtaaat agtattittct 33 OO taatct caga ggagctgaga gttttgttga atgtattgta Cagtatgtag gag caggaga 3360 actttgtaaa ttggaaagaa gttctgtttitt ataatttatt tittatttitta aagcttaaat 342O gtagatattt atacgtatac agggtgcc ta gaa.gc.caatgttgtttcct g ttattacagc 3480 taacacagta aagaataatt ttgactittaa gtatgaaaca gtag taagtt atagotgcaa. 354 O agaatacaat atctatactg. tatgtcacat citacctaaat gttgcactat gcc ctittaaa 36OO

US 9,689,041 B2 61 62 - Continued aacaaaaagg actgaagaaa citctggalaca gagtaaaatg gacaagggac gaggatgata 54 O aattaaagaa gttggttgaa caa.catggaa citgatgattig gactictaatt gctagt catc 6OO ttcaaaatcg citctgattitt cagtgc.ca.gc atcgatggca gaaagttitta aatcc tigaat 660 tgataaaggg to Cttggact aaagaagaag atcagagggit tattgaatta gttcagaaat 72 O atgggc.calaa aagatggit ct ttaattgcaa alacatttaala aggaagaata ggcaa.gcagt 78O gtagagaaag atggcataat Catctgaatc Ctgaggtaaa gaaatct tcc tigacagaag 84 O aggaggacag gat Catct at galagcacata agcggttggg aaatcgttgg gCagaaattg 9 OO c caaac tact tccaggaagg actgataatt citat caaaaa to attggaat t c tactatoc 96.O gaagaaaagt ggalacaggag ggctatttac aagatggaat aaaat Cagaa Catctt cat O2O ctaaact tca acacaaacct tdtgcagota tdgat catat gcaaacccag aat cagttitt O8O acatacctgt toagat coct gggitat cagt atgtgtcacc tdaaggcaat td tatagaac 14 O atgttcagcc tacttctgcc tittatt cago aaccott cat tdatgaagat cotgataagg 2OO aaaagaaaat aaaggaactt gagatgct tc titatgtcagc tigagaatgala gttagaagaa 26 O agcgaatt CC at Cacagcct ggaagtttitt Ctagctggtc tigtagttt C ct catggatg 32O ataa catgtc. taatacticta aatagccttg acgagcacac tagtgagttt tacag tatgg 38O atgaaaatca gcctgtgtct gct cagoaga attcacccac aaagttcct g gcc.gtggagg 44 O caaacgctgt gttatcct ct ttgcagacca toccagaatt togcagagact citagaactta SOO ttgaatctga t cotgtagca toggagtgacg ttaccagttt tdatatttct gatgctgctg 560 Cttctic ct at caaatccacc C cagttaa at taatgagaat t cagcacaat galaggagc.ca 62O tggaatgc.ca atttalacgt.c agt cttgtac ttgaagggala aaaaaac act totaatggtg 68O gcaa.ca.gtga agctgttcct ttaa.catc cc caaatatago caagtttagc act coaccag 74 O c catcct cag aaagaa.gaga aaaatgcgag togt catt C C cc aggcagc gaact taggg 8OO atggct catt gaacgatggt gigtaatatgg cqctaaaa.ca tacac cactgaaaacactac 86 O cattttct cott cacagttt ttcaacacat gtc.ctgg taa togaacaactt aatatagaaa 92 O atcc tt catt tacatcaa.cc cct atttgtg ggcagaaagc tict cattaca act cotcttic 98 O ataaggaaac aacticc caaa gatcaaaagg aaaatgtagg gtttagaaca cct act atta 2O4. O gaagat citat actggg tacc acaccaagaa citcct acticc ttittaagaat gcgcttgctg 21OO cticaggagaa aaaatatgga cct cittaaaa ttgtgtc.cca gcc acttgct ttcttggaag 216 O aagatatt.cg ggaagttitta aaagaagaala Ctggalacaga cct attic ct c aaagaggaag 222 O atgaac Ctgc titacaaaagc tigcaaacaag agaat accgc titctgggaag aaagt cagaa 228O aatcactagt Cittagata at tgaaaaag aagaat Cagg cacticaactg ttgactgaag 234 O acattt Caga catgcagt ca aattgttgaat gggaalacagt ggttt atggg alagacagaag 24 OO acca actitat tatgactgaa caa.gcaagaa gat atctgag tacttacaca gctaccagta 246 O gtacttcaag agct ct cata citgitaattgt tattaaaatt gatgaaatgc cccactic cct 252O tactgcagtic tictact aaat taggttgcag tdaaatttitt citcaattagt tdtttittaaa 2580 gttgtaagat agc ccttitta atacagoat c titttittctat tctatatagt aggcagaaag 264 O c tagtaagtic acttaagggg tagatagttt catagitttat tttittaa.gag atgagattitt 27 OO taaaaattgt ttittaaagaa caagatggga aaataataga atgttcatgg atttctaaaa 276 O gtaaattct catatatttitc titcacaagat atatgttgct act ct cittga tigctgcagtt 282O ttgttataga taggtg tatgagtatatatg atttctgaaa ttagt citatg tatggaaagc 288O US 9,689,041 B2 63 64 - Continued acacatgatt titatgaagta Cttittgcc.ca tdtgctgatt tact taggct accatttaca 294 O aagaalacaca ttgaaaagga atttalaagga aggatagaaa gttgcactac taattittittg 3 OOO tttitttittitt cagaagicagt aaaattaact acagtgttaa atgtattitat ttgagcatag 3 O 6 O tactgaaaac aaaaag catt caaaaaagag titttittctitt attagtaaat agtattittct 312 O taatct caga ggagctgaga gttttgttga atgtattgta Cagtatgtag gag caggaga 318O actttgtaaa ttggaaagaa gttctgtttitt ataatttatt tittatttitta aagcttaaat 324 O gtagatattt atacgtatac agggtgcc ta gaa.gc.caatgttgtttcct g ttattacagc 33 OO taacacagta aagaataatt ttgactittaa gtatgaaaca gtag taagtt atagotgcaa. 3360 agaatacaat atctatactg. tatgtcacat citacctaaat gttgcactat gcc ctittaaa 342O t catgctggit tataaagtag ttctaaaaat gtact aaata ataatttaat attitt cittitt 3480 taaattatat cqggggtggit catatacatt aatctggtga tttgtatatg tdtttgaaat 354 O ttittgcattt tdtttaaaaa ataatatggit accttggtcc ctaaaaa.ca.g. tctgcactta 36OO gaagtttata t t tact cagt gtttcagaag toggagaacat tat cittittat ttataaaaat 366 O attttgtc.ct tttittaaatgttttgttgttt citctacaggit tacaa.cagtt gottcagttg 372 O cctgttittag gtgtttgcac ttattittatt tottcttgaaagaatttitta tittgcttittg 378 O tggtagagat tatatgtaat ttitttitt cag toatataatg gtgtgctgtc. aacttaalaca 384 O ctgacaggta aatagaattig tacactgtag tittgaattat ttataattga cacact citct 3900 c cct ct coac toctdaagta togctgctata gaaaatagoa gaatcggctt gctgctacga 396 O gaga aggaaa gagcgaccac cacttgcact gtgttgaaaag ataaaaaa.ca aatgatggca 4 O2O agttct caag ttaactaaat ggaat caacc attaccaggc aaattcttgc aaataccaaa 4 O8O atac tact at gcc titataaa acaaaatgaa agcaggittaa gattittctgc tictdtttgta 414 O tgttaataga aatggaaata ctaagtattt taatgct tag ct cittgaac a gtag acctaa 42OO aagggittitta agctatttaa atc tacttgc tagtttittgc at attittata tatatatata 426 O tittatatata tatatagtga gaagtgaaga aaatgitatgg tactaagatt atgcct tatt 432O gataaataga taalaccaatt tdaatcct ct tag catgttt aag tatgttg attgctttct 438 O aattaatgaa cittct cacag aaattt cact tagtgaaacc aatgattgta gcaaact cat 4 44 O actggat cat titcagttacc titgaactaat agcacataat gigtttitttgt tdttgttgtt 4500 tittaatgtag cccttacctg gatatacata gtctgcaatc accaaagtat aatat cittgt 456 O aaggctatat tttittaaag.c at atttitt to ttgagcatta aattatccta aatgg taata 462O tattgtggat aagtctgggc titattggaca taata catat ttgggttggit actggttgaa 468O t cct tcagtt aactgctttgttgctttittg caagatttitt tat cittaaac atgtcaggca 474. O t cittaagt ca cctittatact gttttgttcc tictdagtttc titt cagtatgttatacaaat 48OO gccaga cata acatgtagca gccatacttg catggaaact gactacacat acataatact 486 O gcattt tatt gtaaggttitt cacattaata cagcaattac cct gactaaa ttgagttittg 492 O tgatatatgg aaaact tcat td taa.gagaa tottgcatac aatgttgaca tattalacatc 498O caaaataaag catctgttgta caa.gctga 5 OO8

<210s, SEQ ID NO 9 &211s LENGTH: 2632 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens

US 9,689,041 B2 73 74 The invention claimed is: one binding partner specific for an expression product 1. A method comprising: expressed from a target gene selected from the group measuring an amount of at least one expression product consisting of the FAM198B, MYBL1, PDZK11P1, and expressed from at least one target gene selected from VSIG10 genes, wherein the binding partner is attached the group consisting of the FAM198B, MYBL1, to a detectable label or solid substrate. PDZK11P1, and VSIG 10 genes, 16. The kit of claim 15, further comprising instructions for wherein the amount of the expression product is measured performing in vitro colorectal cancer testing using a periph using a peripheral blood sample obtained for in vitro eral blood sample obtained from an individual. colorectal cancer testing from an individual. 17. The kit of claim 15, comprising a plurality of binding 2. The method of claim 1, wherein the amount of the 10 partners specific for a plurality of expression products expression product is measured by an assay that comprises expressed from a plurality of target genes selected from the contacting the expression product with at least one binding group consisting of the FAM198B, MYBL1, PDZK11P1, partner specific for the expression product. and VSIG10 genes. 3. The method of claim 1, comprising measuring the 18. The kit of claim 15, comprising a combination of amounts of a plurality of expression products expressed 15 binding partners specific for the expression products from a plurality of target genes selected from the group expressed from the NEAT1, FAM198B, ITGAM, MYBL1, consisting of the FAM198B, MYBL1, PDZK11P1, and SIAH2, PDZK11P1, and VSIG 10 genes. VSIG10 genes, wherein the amounts of the expression 19. The kit of claim 15, wherein the one or more binding products are measured by an assay that comprises contacting partners include at least one hybridization probe. the expression products with binding partners specific for 20. The kit of claim 15, wherein the one or more binding the expression products. partners include at least one hybridization probe and at least 4. The method of claim 1, wherein the amounts of fewer one primer. than 50 expression products are measured. 21. The kit of claim 15, wherein the one or more binding 5. The method of claim 1, comprising measuring the partners include at least one hybridization probe and a pair amounts of expression products expressed from the NEAT1, 25 of primers. FAM198B, ITGAM, MYBL1, SIAH2, PDZK11P1, and 22. The kit of claim 15, wherein the one or more binding VSIG 10 genes. partners include at least one ligand. 6. The method of claim 1, wherein the expression product 23. The kit of claim 15, wherein the one or more binding is an RNA transcript or a polypeptide. partners include at least one antibody or affinity protein. 7. The method of claim 1, wherein the amount of the 30 24. The kit of claim 15, wherein the one or more binding expression product is measured using at least one of hybrid partners include at least two ligands specific for the same ization, amplification, or sequencing. expression product. 8. The method of claim 1, wherein the expression product 25. The method of claim 1, further comprising measuring is mRNA that is hybridized with at least one probe, at least an amount of at least one expression product expressed from one primer, or a combination thereof. 35 at least one of the NEAT1, SIAH2, and ITGAM target genes. 9. The method of claim 1, wherein cDNA of the expres 26. A method comprising: sion product is hybridized with at least one probe, at least measuring amounts of a plurality of expression products one primer, or a combination thereof. expressed from (i) at least one of the NEAT 1 and 10. The method of claim 1, wherein cFNA of the expres ITGAM target genes, and (ii) at least one target gene sion product is hybridized with at least one probe, at least 40 selected from the group consisting of the FAM198B, one primer, or a combination thereof. MYBL1, SIAH2, PDZK11P1, and VSIG 10 genes, 11. The method of claim 1, wherein the expression wherein the amounts of the expression products are product is a polypeptide that is bound by at least one ligand measured using a peripheral blood sample obtained for specific for the polypeptide. in vitro colorectal cancer testing from an individual. 12. The method of claim 11, wherein the ligand is an 45 27. A method comprising: antibody. measuring amounts of a plurality of expression products 13. The method of claim 11, wherein the ligand is an expressed from a plurality of target genes selected from affinity protein. the group consisting of the FAM198B, MYBL1, 14. The method of claim 11, wherein the polypeptide is SIAH2, PDZK11P1, and VSIG10 genes, bound by at least two ligands specific for the polypeptide. 50 wherein the amounts of the expression products are 15. A kit for in vitro colorectal cancer testing, comprising: measured using a peripheral blood sample obtained for one or more binding partners specific for from 1 to no in vitro colorectal cancer testing from an individual. more than 50 expression products that include at least