USOO9347088B2

(12) United States Patent (10) Patent No.: US 9,347,088 B2 Buendia et al. (45) Date of Patent: May 24, 2016

(54) MOLECULAR SIGNATURE OF LIVER FOREIGN PATENT DOCUMENTS TUMIOR GRADE AND USE TO EVALUATE PROGNOSIS AND THERAPEUTC REGIMEN EP 1661991 A1 5, 2006 WO WOO3 (010336 A2 2, 2003 (75) Inventors: Marie-Annick Buendia, Le Perreux sur WO WO 2005.0056O1 A2 1, 2005 Marne (FR): Carolina Armengol Niell, OTHER PUBLICATIONS Blanes (ES); Stefano Cairo, Longpont-sur-Orge (FR); Aurélien de International Search Report issued for application No. PCT/IB2009/ Reynies, Paris (FR) 006450 on Dec. 1, 2009. Affymetrix, " U95 Set,” announcement, Apr. 1, (73) Assignees: Institut Pasteur, Paris (FR); Institut 2000. National de la Santé et de la Recherche Out et al., “Restoration of Liver Mass after Injury Requires Prolif erative and Not Embryonic Transcriptional Patterns.” The Journal of Médicale, Paris (FR): Centre National Biological Chemistry, vol. 282, No. 15, pp. 11197-11204, Apr. 13, de la Recherche Scientifique, Paris (FR) 2007. Li et al., “Discovery and analysis of hepatocellular carcinoma (*) Notice: Subject to any disclaimer, the term of this using cDNA microarrays.” J. Cancer Res. Clin. Oncol., vol. 128, pp. patent is extended or adjusted under 35 369-379, 2002. U.S.C. 154(b) by 374 days. Xu et al., “Expression Profiling Suggested a Regulatory Role of Liver-enriched Transcription Factors in Human Hepatocellular Car (21) Appl. No.: 12/999,907 cinoma.” Cancer Research, vol. 61, pp. 3176-3181, Apr. 1, 2001. Kurokawa et al., “Molecular-based prediction of early recurrence in (22) PCT Filed: Jun. 26, 2009 hepatocellular carcinoma.” Journal of Hepatology, vol. 41, pp. 284 291, 2004. (86). PCT No.: PCT/B2O09/OO64SO Cairo et al., “Hepatic Stem-like Phenotype and Interplay of Wnt/B- Catenin and Myc Signaling in Aggressive Childhood Liver Cancer.” S371 (c)(1), Cancer Cell, vol. 14, No. 6, pp. 471-484, Dec. 9, 2008. (2), (4) Date: Apr. 7, 2011 Affymetrix. "Affymetrix GeneChip Human Genome U133 Array Set HG-U133A.” GEO, Feb. 17, 2002. (87) PCT Pub. No.: WO2009/156858 PCT Pub. Date: Dec. 30, 2009 * cited by examiner Primary Examiner — Anne Gussow (65) Prior Publication Data (74) Attorney, Agent, or Firm — Law Office of Salvatore US 2011 FO183862 A1 Jul. 28, 2011 Arrigo and Scott Lee, LLP US 2012/0040848 A2 Feb. 16, 2012 (57) ABSTRACT (30) Foreign Application Priority Data The present invention concerns a method to determine the expression profile on a sample previously obtained from Jun. 27, 2008 (EP) ...... O8290628 a patient diagnosed for a liver tumor, comprising assaying the Jan. 30, 2009 (EP) ...... O91.51808 expression of a set of genes in this sample and determining the gene expression profile (signature). In a particular embodi (51) Int. Cl. ment, said method enables to determine the grade of the liver CI2O I/68 (2006.01) tumor, such as hepatoblastoma (HB) or a hepatocellular car (52) U.S. Cl. cinoma (HCC). The invention is also directed to kits compris CPC ...... CI2O I/6809 (2013.01) ing a plurality of pairs of primers or a plurality of probes (58) Field of Classification Search specific for a set of genes, as well as to Solid Support or None composition comprising a set of probes specific for a set of See application file for complete search history. genes. These methods are useful to determine the grade of a liver tumor in a sample obtained from a patient, to determine (56) References Cited the risk of developing metastasis and/or to define the thera U.S. PATENT DOCUMENTS peutic regimen to apply to a patient. 2002/0142981 A1* 10, 2002 Horne et al...... 514,44 12 Claims, 35 Drawing Sheets U.S. Patent May 24, 2016 Sheet 1 of 35 US 9,347,088 B2

Affymetrix HG-U33A 25 HBS aid4 NL5

Eitantification of obeist subclusters

NT C1 C2

Fig. 1A U.S. Patent May 24, 2016 Sheet 2 of 35 US 9,347,088 B2

E rC1 rC2 L GD . ( wif - O S N On ) win (Olf Orr

-3. - - 3.

U.S. Patent May 24, 2016 Sheet 3 of 35 US 9,347,088 B2

B Hepatocyte HepatoblastLiver progenitor

NLCt C2

Fig. 2

U.S. Patent May 24, 2016 Sheet 5 of 35 US 9,347,088 B2

AFP AQP9 DG7 500 750 400 300 500 200 250 100 O O s: e NL C1 C2 NL C1 C2 NL C1 C2

ALDH2 BUB1 400 GSF1 300 200 1OO 100 O O NL C1 C2 NL C1 C2 NL C1 C2

APCS C1S 750 750 15 E2F5 NLE1

500 500 O 250 250 d 5 s O s O was sea lette NL C1 C2 NL C1 C2 NL C1 C2 APOC4 CYP2E1 GHR 150 4000 75 5OO 3000 400 100 50 2000 300 50 25 200 1000 is 100 O r O NL C1 C2 NL C1 C2 NL C1 C2 NL C1 C2

Fig. 4 U.S. Patent May 24, 2016 Sheet 6 of 35 US 9,347,088 B2

Fig. 5A U.S. Patent May 24, 2016 Sheet 7 of 35 US 9,347,088 B2

APOC4

120

100

80

40

20

U.S. Patent May 24, 2016 Sheet 8 of 35 US 9,347,088 B2

U.S. Patent May 24, 2016 Sheet 9 of 35 US 9,347,088 B2

|

350

3000

2500

1500

000

500

U.S. Patent May 24, 2016 Sheet 10 of 35 US 9,347,088 B2

0.45

0.40

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

- - - - 4. ------a. U.S. Patent May 24, 2016 Sheet 11 of 35 US 9,347,088 B2

s-- ...": U.S. Patent May 24, 2016 Sheet 12 of 35 US 9,347,088 B2

180 is... . e.. ; 160

140

t20

100

80

60

U.S. Patent May 24, 2016 Sheet 13 of 35 US 9,347,088 B2

s:

RPL10A

450

350

300

250

200

150

100 U.S. Patent May 24, 2016 Sheet 14 of 35 US 9,347,088 B2

c1 c2. Citical associations (537 (s24 g-age R Predominant fetal histotype O 92%, 14% * Advanced furror stage. 40% 75%. - Wascular invasion 35% 87% - Distart metastasis at diagnosis 13%, 48% -FRETEXT W 16%. 25%

HB subclass C1 n=323 DoD) , 6 He subclass C2 , risis 7 DoD)

p = 0.0021 4. so B 100 120 4. Time of survival (months)

His subclass C1 ina7 (ODOD)

HBsubclass C2 sist8 (5000)

me of survival (Tonths)

Multivariate analysis HR (95% CI) p-value • 16-gene signature 80 (22-29.9) 0.02 s Tumor stage 46 (1.1-21.1) 0.050 Predominant histotype 2.5 (0.3-19,1) Q,385 Fig. 6 U.S. Patent May 24, 2016 Sheet 15 Of 35 US 9,347,088 B2

Microarray set qPCR set Complete na-24 n-37 set n=61 - Male gender (%) 17 (71%) 20 (54%) 37 (61%) - Median age, years (range) 2 (0-17) 2 (0-10) 2 (0-17) - AFP at diagnosis (ng/mL): median 131000 75873 84000 range 180-1990.800 44-2265OOO 44-2265.000 - Preoperative chemotherapy 19 (79%) 29 (78%) 48 (79%) Treatment protocol (SR/HR/NA) 9/9/1 16/9/4 25/18/5 - Tumor characteristics Tumor stage (SIOPEL): Distant metastasis at diagnosis 6 (25%) 10 (27%) 16 (26%) Vascular invasion' 14 (58%) 15 (40%) 29 (47%) PRETEXT stage (I/II/III/IV) 3/10/7/4 5/1/13/8 8/21/20/12 Multifocality 9 (37%) 9 (24%) 18 (29%) Histology: Epithelial/Mixed 1915 23/14 43/19 Predominant Epith. histotype (Fetal/others"/NA) 16/8/O 21/13/3 37/21/3 CTNNB1 mutation 19 (79%) 28 (76%) 47 (77%) AXN mutation 1 (4%) 0 (0%) 1 (2%) - Median follow-up, months (range) 40 (4-120) 22(1-135) 32 (1-135) Alive/DOD/complication-related death 16/7/1 28/8/1 44/15/2

ig. 7

U.S. Patent May 24, 2016 Sheet 18 Of 35 US 9,347,088 B2

C1 C2 PVal Clinical associations (n=53) (n=33) Wale o Predominant fetal histotype 92% 21% <0.001 Advanced tumor stage: 38% 70% O.OO5 - Vascular invasion 29% 61% O.OO5 - Distant metastasis at diagnosis 19% 48% 0.004 - PRETEXT IV 18% 23% O.393

HB Subclass C1 n=47 (3 DOD)

HB subclass C2 n=26 (9 DOD)

P = O.OOO2

40 SO 80 100 120 140 Time of survival (months)

Fig. 9A U.S. Patent May 24, 2016 Sheet 19 of 35 US 9,347,088 B2

HB subclass C1 n=12 (ODOD)

0. 6

O .4

0. 2 HB Subclass C2 P = 0.0164 n=17 (6 DOD)

40 60 Time of survival (months)

Multivariate analysis HR (95% CI) P-value • 16-gene signature 0.11 (0.03-0.41) <0.001 Tumor stage 4.52 (0.99-20.4) 0.021 • Predominant histotype 0.29 (0.04-1.96) 0.176

Fig. 9B U.S. Patent May 24, 2016 Sheet 20 of 35 US 9,347,088 B2

1.

8 C1 Subclass n = 36 (9 deaths)

U 6

C2 Subclass n = 28 (13 deaths)

2

0. Log rank: p=0.020

0. 50.00 OOOO 1500 20.O Time follow-up (months)

C1 Subclass n = 37 (9 deaths)

C2 Subclass n = 27 (13 deaths)

Log rank, p=0.024

O. 500 1OOOO 150.00 2OOOO Time follow-up (months) Fig. 10 U.S. Patent May 24, 2016 Sheet 21 of 35 US 9,347,088 B2

apaog - atrata score2 opec - strata : score 2 apeds - atrata : a core 2

Log-rank p. ; 1.33e-d Log-rank p. : 0.0043 Log-rank p. 1 4.9e-05

l: cq23 (2) q3357 (2) 1257 (l) cq3i (2 : -q2 {1} tags 7 (2) ags? 12 cases 229 cases t2 cases a list crazed 3 ti cages 25 casea a Catec

Fig.11 U.S. Patent US 9,347,088 B2

U.S. Patent May 24, 2016 Sheet 23 of 35 US 9,347,088 B2

??

U.S. Patent US 9,347,088 B2

U.S. Patent US 9,347,088 B2

6oTXue)=ZEO'O U.S. Patent May 24, 2016 Sheet 28 of 35 US 9,347,088 B2

8g=N(peepGZ) Sse?oqns-LO GG=N(peepC1) ssejoqns-ZO

6oTXue)=GOO’O

eAAunsee AO

U.S. Patent May 24, 2016 Sheet 30 Of 35 US 9,347,088 B2

sse?oqns-ZO (peep9Z)LZ=N

IZL61-I eAAuns elemo

US 9,347,088 B2 1. 2 MOLECULAR SIGNATURE OF LIVER There is thus a need for a system, complementary to the TUMIOR GRADE AND USE TO EVALUATE PRETEXT system, based on genetic and molecular features PROGNOSIS AND THERAPEUTC REGIMEN of the liver tumors. The present invention concerns a method or process of SEQUENCE LISTING profiling gene expression for a set of genes, in a sample previously obtained from a patient diagnosed for a liver The instant application contains a Sequence Listing which tumor. In a particular embodiment said method is designed to has been submitted in ASCII format via EFS-Web and is determine the grade of a liver tumor in a patient. hereby incorporated by reference in its entirety. Said ASCII By “liver tumor' or “hepatic tumor, it is meant a tumor copy, created on Dec. 4, 2012, is named 09837601.txt and is 10 originating from the liver of a patient, which is a malignant 302.498 bytes in size. tumor (comprising cancerous cells), as opposed to a benign The present invention relates to a method to in vitro deter tumor (non cancerous) which is explicitly excluded. Malig mine the grade of a liver tumor in a sample previously nant liver tumors encompass two main kinds of tumors: hepa obtained from a patient, using a molecular signature based on toblastoma (HB) or hepatocellular carcinoma (HCC). These the expression of a set of genes comprising at least 2, espe 15 two tumor types can be assayed for the presently reported cially has or consist of 2 to 16 genes, preferably a set of 16 molecular signature. However, the present method may also genes. In a particular embodiment, the method focuses on be used to assay malignant liver tumors which are classified hepatoblastoma (HB) or hepatocellular carcinoma (HCC), in as unspecified (non-HB, non-HCC). adults or in children. The invention is also directed to sets of The present method may be used to determine the grade of primers, sets of probes, compositions, kits or arrays, compris a liver tumor or several liver tumors of the same patient, ing primers or probes specific for a set of genes comprising at depending on the extent of the liver cancer. For convenience, least 2 genes, especially has or consists of 2 to 16 genes, the expression “a liver tumor' will be used throughout the preferably exactly 16 genes. Said sets, kits and arrays are specification to possibly apply to “one or several liver tools suitable to determine the grade of a liver tumor in a tumor(s)'. The term "neoplasm' may also be used as a syn patient. 25 onymous of "tumor. The liver is a common site of metastases from a variety of In a particular embodiment, the tumor whose grade has to organs such as lung, breast, colon and rectum. However, liver be determined is located in the liver. The presence of the is also a site of different kinds of cancerous tumors that start tumor(s) in the liver may be diagnosed by ultrasound scan, in the liver (primary liver cancers). The most frequent is the X-rays, blood test, CT scans (computerised tomography) and/ Hepatocellular Carcinoma (HCC) (about 3 out of 4 primary 30 or MRI scans (magnetic resonance imaging). liver cancers are this type) and is mainly diagnosed in adults. In a particular embodiment, the tumor, although originat In the United States approximately 10,000 new patients are ing from the liver, has extended to other tissues or has given diagnosed with hepatocellular carcinoma each year. Less fre rise to metastasis. quent liver tumours are cholangiocarcinoma (CC) in adults In a particular embodiment, the patient is a child i.e., a and hepatoblastoma (HB) in children. 35 human host who is under 20 years of age according to the The prognosis and treatment options associated with these present application. Therefore, in a particular embodiment, different kinds of cancers is difficult to predict, and is depen the liver tumor is a paediatric HB or a paediatric HCC. In dent in particular on the stage of the cancer (such as the size another embodiment, the liver tumor is an adult HCC. of the tumor, whether it affects part or all of the liver, has A grade is defined as a Subclass of the liver tumor, corre spread to other places in the body or its aggressiveness). 40 sponding to prognostic factors, such as tumor status, liver Therefore, it is important for clinicians and physicians to function and general health status. The present method of the establish a classification of primary liver cancers (HCC or invention allows or at least contributes to differentiating liver HB) to propose the most appropriate treatment and adopt the tumors having a good prognosis from tumors with a bad most appropriate Surgery strategy. Some factors are currently prognosis, in terms of evolution of the patient's disease. A used (degree of local invasion, histological types of cancer 45 good prognosis tumor is defined as a tumor with good Survival with specific grading, tumour markers and general status of probability for the patient (more than 80% survival at two the patient) but have been found to not be accurate and suffi years for HB and more than 50% survival at two years for cient enough to ensure a correct classification. HCC), low probability of metastases and good response to As far as the HB is concerned, the PRETEXT (pre-treat treatment for the patient. In contrast, a bad prognosis tumor is ment extent of disease) system designed by the International 50 defined as a tumor with an advanced stage. Such as one having Childhood Liver Tumor Strategy Group (SIOPEL) is a non vascular invasion or/and extrahepatic metastasis, and associ invasive technique commonly used by clinicians, to assess the ated with a low survival probability for the patient (less than extent of liver cancer, to determine the time of Surgery and to 50% survival in two years). adapt the treatment protocol. This system is based on the The method of the invention is carried out on a sample division of the liver in four parts and the determination of the 55 isolated from the patient who has previously been diagnosed number of liversections that are free of tumor (Aronson et al. for the tumor(s) and who, optionally, may have been treated 2005: Journal of Clinical Oncology; 23(6): 1245-1252). A by surgery. In a preferred embodiment, the sample is the liver revised staging system taking into account other criteria, Such tumor (tumoral tissue) or of one of the liver tumors identified as caudate lobe involvement, extrahepatic abdominal disease, by diagnosis imaging and obtained by Surgery or a biopsy of tumor focality, tumorrupture or intraperitoneal haemorrhage, 60 this tumor. The tumor located in the liver tumor is called the distant metastases, lymph node metastases, portal vein primary tumor. involvement and involvement of the IVC (inferior vena cava) In another embodiment, the sample is not the liver tumor, and/or hepatic veins, has been recently proposed (Roebuck; but is representative of this tumor. By “representative', it is 2007; Pediatr Radiol; 37: 123-132). However, the PRETEXT meant that the sample is regarded as having the same features system, even if reproducible and providing good prognostic 65 as the primary tumors, when considering the gene expression value, is based on imaging and clinical symptoms, making profile assayed in the present invention. Therefore, the this system dependent upon the technicians and clinicians. sample may also consist of metastatic cells (secondary US 9,347,088 B2 3 4 tumors spread into different part(s) of the body) or of a bio target(s) may be labelled, with isotopic (such as radioactive) logical fluid containing cancerous cells (such as blood). or non isotopic (such as fluorescent, coloured, luminescent, The sample may be fixed, for example informalin (forma affinity, enzymatic, magnetic, thermal or electrical) markers lin fixed). In addition or alternatively, the sample may be or labels. embedded in paraffin (paraffin-embedded) or equivalent It is noteworthy that steps carried out for assaying the gene products. In particular, the tested sample is a formalin-fixed, expression must not alter the qualitative or the quantitative paraffin-embedded (FFPE) sample. expression (number of copies) of the expression product(s) or One advantage of the method of the present invention is of the nucleotide target(s), or must not interfere with the that, despite the possible heterogeneity of some liver tumors 10 Subsequent step comprising assaying the qualitative or the (comprising epithelial tumor cells at different stages of liver quantitative expression of said expression product(s) or differentiation within the same tumor), the assay has proved nucleotide target(s). to be reproducible and efficient on liver tumor biopsies The step of profiling gene expression comprises determin obtained from any part of the whole tumor. Therefore, there is ing the expression of a set of genes. Such a set is defined as a no requirement for the isolation of cells presenting particular 15 group of genes that must be assayed for one test, and espe features except from the fact that they are obtained from a cially performed at the same time, on the same patients liver tumor or are representative thereof, to carry out the gene sample. A set comprises at least 2 and has especially from 2 to expression profile assay. 16 genes, said 2 to 16 genes being chosen from the 16 fol lowing genes: alpha-fetoprotein (AFP), aldehyde dehydroge In a particular embodiment, the tumor originates from a nase 2 (ALDH2), amyloid P component serum (APCS), apo patient having a Caucasian origin, in particular European, lipoprotein C-IV (APOC4), aquaporin 9 (AQP9), budding North American, Australian, New-Zealander or Afrikaners. uninhibited by benzimidazoles 1 (BUB1), complement com In a first step, the method or process of the invention ponent 1 (C1S), cytochrome p4502E1 (CYP2E1), discs large comprises assaying the expression level of a set of genes in a homolog7 (DLG7), dual specificity phosphatase 9 (DUSP9), sample, in order to get an expression profile thereof. 25 E2F5 transcription factor (E2F5), growth hormone receptor By “expression of a set of genes' (or “gene expression'), it (GHR), 4-hydroxyphenylpyruvase dioxygenase (HPD), is meant assaying, in particular detecting, the product or immunoglogulin superfamily member 1 (IGSF1), Notchless several products resulting from the expression of a gene, this homolog 1 (NLE1) and the ribosomal L10a product being in the form of a nucleic acid, especially RNA, 30 (RPL10A) genes. mRNA, cDNA, polypeptide, protein or any other formats. In A complete description of these 16 genes is given in Table a particular embodiment, the assay of the gene expression 1. This table lists, from left to right, the symbol of the gene, profile comprises detecting a set of nucleotide targets, each the complete name of the gene, the number of the SEQ ID nucleotide target corresponding to the expression product of provided in the sequence listing, the Accession Number from a gene encompassed in the set. 35 the NCBI database on June 2008, the human chromosomal The expression “nucleotide target means a nucleic acid location and the reported function (when known). molecule whose expression must be measured, preferably A set of genes comprises at least 2 out the 16 genes of Table quantitatively measured. By "expression measured, it is 1, and particularly at least or exactly 2, 3, 4, 5, 6, 7, 8, 9, 10, meant that the expression product(s), in particular the tran 11, 12, 13, 14 or 15 out of the 16 genes of Table 1. In a 40 particular embodiment, the set comprises or consists of the 16 Scription product(s) of a gene, are measured. By "quantita genes of Table 1 i.e., the set of genes comprises or consists of tive' it is meant that the method is used to determine the AFP, ALDH2, APCS, APOC4, AQP9, BUB1, C1S, CYP2E1, quantity or the number of copies of the expression products, DLG7, DUSP9, E2F5, GHR, HPD, IGSF1, NLE1 and in particular the transcription products or nucleotide targets, RPL10A genes. Accordingly, unless otherwise stated when originally present in the sample. This must be opposed to the 45 reference is made in the present application to a set of 2 to 16 qualitative measurement, whose aim is to determine the pres genes of Table 1, it should be understood as similarly apply ence or absence of said expression product(s) only. ing to any number of genes within said 2 to 16 range. A nucleotide target is in particular a RNA, and most par In other particular embodiments, the set of genes com ticularly a total RNA. In a preferred embodiment, the nucle prises or consists of one of the following sets: (a) the E2F5 50 and HPD genes, (b) the APCS, BUB1, E2F5, GHR and HPD otide target is mRNA or transcripts. According to the methods genes, (c) the ALDH2, APCS, APOC4, BUB1, C1S, used to measure the gene expression level, the mRNA ini CYP2E1, E2F5, GHR and HPD genes, (d) the ALDH2, tially present in the sample may be used to obtain cDNA or APCS, APOC4, AQP9, BUB1, C1S, DUSP9, E2F5 and cRNA, which is then detected and possibly measured. RPL10A genes, or (e) the ALDH2, APCS, APOC4, AQP9, In an embodiment, the expression of the gene is assayed 55 C1S, CYP2E1, E2F5, GHR, IGSF1 and RPL10A genes. directly on the sample, in particular in the tumor. In an alter As indicated by the expression “comprises from 2 to 16 native embodiment, the expression products or the nucleotide genes of Table 1, the set may, besides the specific genes of targets are prepared from the sample, in particular are isolated Table 1, contain additional genes not listed in Table 1. This or even purified. When the nucleotide targets are mRNA, a means that the set must comprises from 2 to 16 genes of Table further step comprising or consisting in the retro-transcrip 60 1, i.e. 2 to 16 genes of Table 1 (in particular 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 genes), and optionally com tion of said mRNA into cDNA (complementary DNA) may prises one or more additional genes. Said set may also be also be performed prior to the step of detecting expression. restricted to said 2 to 16 genes of Table 1. Optionally, the cDNA may also be transcribed in vitro to Additional genes may be selected for the difference of provide cRNA. 65 expression observed between the various grades of liver can During the step of preparation, and before assaying the cer, in particular between a tumor of good prognosis and a expression, the expression product(s) or the nucleotide tumor of poor prognosis. US 9,347,088 B2 5 6 TABLE 1.

mRNA Protein symbol Gene name SEQID Accession No Location Function SEQID AFP alpha-fetoprotein 1 NM 001134 4q11-q13 plasma protein synthesized by the fetal liver 2 ALDH2 aldehyde dehydrogenase 2 family 3 NM OOO690 12q24.2 liver enzyme involved in alcohol metabolism 4 (mitochondrial) APCS amyloid P component, serum 5 NM OO1639 1q21-q23 Secreted glycoprotein 6 APOC4 apolipoprotein C-IV 7 NM OO1646 19q13.2 secreted liver protein 8 AQP9 aquaporin 9 9 NM 020980 15q22.1-22.2 water-selective membrane channel 10 BUB1 BUB1 budding uninhibited by 11 AFO43294 2q14 kinase involved in spindle checkpoint 12 benzimidazoles 1 homolog (yeast) C1S complement component 1, S 13 M18767 12p13 component of the cleavage and 14 Subcomponent polyadenylation specificity factor complex CYP2E1 cytochrome P450, family 2, 15 AF182276 10q24.3-qter cytochrome P450 family member involved in drug 16 Subfamily E, polypeptide 1 metabolism DLG7 discs, large homolog7 (Drosophila) 17 NM 014750 14q22.3 cell cycle regulator involved in kinetocore 18 (DLGAP5) ormation DUSP9 dual specificity phosphatase 9 19 NM OO1395 Xq28 phosphatase involved in regulation of MAP 2O Kinases E2F5 E2F transcription factor 5, p130- 21 U15642 8q21.2 transcription factor involved in cell cycle 22 binding regulation GHR Growth hormone receptor 23 NM OOO163 5p13-p12 transmembrane receptor for growth hormone 24 HPD 4-hydroxyphenylpyruvate 25 NM 002150 12q24-qter enzyme involved in amino-acid degradation 26 dioxygenase IGSF1 immunoglobulin Superfamily, 27 NM 001555 Xq25 cell recognition and regulation of cell 28 member 1 behavior NLE1 notchless homolog 1 (Drosophila) 29 NM 018096 17q12 unknown 30 RPL10A ribosomal protein L10a 31 NM 007104 6p21.3-p21.2 ribosomal protein of 60S subunit 32

The invention also relates to a set of genes comprising or good prognosis is either overexpressed or underex consisting of the 16 genes of Table 1 (i.e., AFP, ALDH2, pressed of a factor of at least 2, preferably of at least 5, APCS, APOC4, AQP9, BUB1, C1S, CYP2E1, DLG7, 30 and more preferably of at least 10, as compared to its DUSP9, E2F5, GHR, HPD, IGSF1, NLE1 and RPL10A expression in a tumor of poor prognosis. genes), in which 1, 2, 3, 4 or 5 genes out of the 16 genes are (b) besides presenting the feature in a), the added gene Substituted by a gene presenting the same features in terms of and/or the Substituted gene may also provide, in combi difference of expression between a tumor of a good prognosis nation with the other genes of the set, discriminant and a tumor of poor prognosis. results with respect to the grade of the liver tumors; this In a particular embodiment, the number of genes of the set 35 discrimination is reflected by the homogeneity of does not exceed 100, particularly 50, 30, 20, more particularly expression profile of this gene in the tumors of a good 16 and even more particularly is maximum 5, 6, 7, 8, 9 or 10. prognosis on the one hand, and the tumors of poor prog When considering adding or Substituting a gene or several nosis in the other hand; and genes to the disclosed set, the person skilled in the art will O (c) finally, besides features of a) and/or b), the added gene consider one or several of the following features: and/or the Substituted gene is optionally chosen among (a) the added gene(s) and/or the Substituted gene(s) of genes that are involved in liver differentiation, in par Table 1 must present the same features in terms of dif ticular having a specific expression in fetal liver, or ference of expression between a tumor of a good prog genes that are involved in proliferation, for example in nosis and a tumor of poor prognosis as the genes of Table mitosis or associated with ribosomes. 1 when taken as a whole. Thus, the expression of the 45 Examples of genes which can be added or may replace added gene or of the Substituted gene in a tumor of a genes of the set may be identified in following Table 2. TABLE 2 list of genes according top value.

(8. le:8 ratio Parametric Gene symbol rC1 rC2 rC2/rC1 p-value FDR Description IPO4 123.7 248.3 2.0 2.00E-07 0.00036 importin 4 CPSF1 467.8 1010.7 2.2 2.00E-07 0.00036 cleavage and polyadenylation specific factor 1, 160 kDa MCM4 25.8 90.7 3.5 1.10E-06 0.00115 MCM4 minichromosome maintenance deficient 4 (S. cerevisiae) EIF3S3 1319 26O1.2 2.0 1.20E-06 0.00119 eukaryotic translation initiation factor 3, Subunit 3 gamma, 40 kDa NCL 1319 26SS.6 2.0 1.30E-06 0.00122 nucleolin CDC25C 35.7 99.3 2.8 1.40E-06 0.00124 cell division cycle 25C CENPA 28.2 78.4 2.8 1.50E-06 0.00124 centromere protein A, 17 kDa KIF14 24.7 54.2 2.2 1.50E-06 0.00124 kinesin family member 14 IPW 145.7 397.6 2.7 1.90E-06 0.0015 imprinted in Prader-Willi syndrome KNTC2 26.8 65.1 2.4 2.2OE-06 0.00157 kinetochore associated 2 TMEM48 26.4 71.7 2.7 2.30E-06 0.00157 transmembrane protein 48 BOP1 87.2 270.9 3.1 2.30E-06 0.00157 block of proliferation 1 US 9,347,088 B2

TABLE 2-continued list of genes according top value.

(8. le:8 ratio Parametric Gene symbol rC1 rC2 rC2/rC1 p-value FDR Description EIF3S9 170 372.4 2.2 2.30E-06 0.00157 eukaryotic translation initiation factor 3, subunit 9 eta, 116 kDa PH-4 340.9 1682 O.S 2.4OE-06 0.00158 hypoxia-inducible factor prolyl 4 hydroxylase SMC4L1 1515 359.3 2.4 2.5OE-06 0.0016 SMC4 structural maintenance of 4-like 1 (yeast) TTK 23.7 74.2 3.1 2.60E-06 0.00161 TTK protein kinase LAMA3 696 136.3 O.2 2.80E-06 0.00168 laminin, alpha 3 C10orf72 192.6 67.7 0.4 2.90E-06 0.00169 10 open reading frame 72 TPX2 73.4 401.5 5.5 3.10E-06 0.00171 TPX2, microtubule-associated, homolog (Xenopus laevis) MSH2 75.5 212.1 2.8 3.20E-06 0.00171 mutS homolog2, colon cancer, nonpolyposis type 1 (E. coli) DKC1 358.1 833.5 2.3 3.20E-06 0.00171 dyskeratosis congenita 1, dyskerin STK6 86.4 395.3 4.6 3.30E-06 0.00172 serine/threonine kinase 6 CCT6A 2OO.S 526.6 2.6 3.5OE-06 0.00173 chaperonin containing TCP1, subunit 6A (Zeta 1) SULT1C1 67.5 3148 4.7 3.5OE-06 0.00173 sulfotransferase family, cytosolic, 1C, member 1 ILF3 1423 294.5 2.1 3.70E-06 0.00174 interleukin enhancer binding factor 3, 90kDa IMPDH2 916.9 23.85.6 2.6 3.70E-06 0.00174 IMP (inosine monophosphate) dehydrogenase 2 HIC2 63.4 208.8 3.3 3.90E-06 0.00179 hypermethylated in cancer 2 AFM 1310.3 237.4 O.2 4.10E-06 0.00184 afamin MCM7 187.3 465.3 2.5 4.3OE-06 0.00189 MCM7 minichromosome maintenance deficient 7 (S. cerevisiae) CNAP1 70.2 1775 2.5 4.4OE-06 0.00189 chromosome condensation-related SMC associated protein 1 CBARA1 958 475 O.S 4.6OE-06 0.00194 calcium binding atopy-related autoantigen 1 PLA2G4C 123.3 51.2 0.4 4.90E-06 0.00194 phospholipase A2, group IVC (cytosolic, calcium-independent) CPSF1 301.9 616 2.0 5.00E-06 0.00194 cleavage and polyadenylation specific factor 1, 160 kDa SNRPN 30.9 100.6 3.3 5.00E-06 0.00194 Small nuclear ribonucleoprotein polypeptide N RPL5 2754.8 4961 1.8 5.20E-06 0.00194 ribosomal protein L5 C1R 1446.5 366.4 O.3 5.30E-06 0.00194 complement component 1, r Subcomponent C16orf54 630.4 1109.6 1.8 5.30E-06 0.00194 chromosome 16 open reading frame 34 PHB 309.3 915.1 3.0 5.30E-06 0.00194 prohibitin BZW2 387.4 946.4 2.4 5.4OE-06 0.00194 basic leucine Zipper and W2 domains 2 ALAS1 1075.8 466.5 0.4 5.5OE-06 0.00194 aminolevulinate, delta-, synthase 1 FL2O364 48.6 112.4 2.3 5.7OE-06 0.00198 hypothetical protein FLJ20364 RANBP1 593.7 1168.1 2.0 5.90E-06 0.00201 RAN binding protein 1 SKB1 3S4.7 687.4 1.9 6.20E-06 0.00208 SKB1 homolog (S. pombe) ABHD6 402.2 196.9 O.S 6.50E-06 0.00213 abhydrolase domain containing 6 CCNB1 60.4 330 5.5 6.6OE-06 0.00213 cyclin B1 NOLSA 246.9 716.2 2.9 7.00E-06 0.00213 nucleolar protein 5 A (56 kDa with KKE/D repeat) RPL8 3805.7 7390.5 1.9 7.00E-06 0.00213 ribosomal protein L8 BLNK 211.1 39.8 O.2 7.1OE-06 0.00213 B-cell linker BYSL 67.3 269.7 1.6 7.10E-06 0.00213 bystin-like UBE1L, 247.6 1423 O.6 7.20E-06 0.00213 ubiquitin-activating enzyme E1-like CHD7 18.6 312 2.6 7.4OE-06 0.00215 chromodomain helicase DNA binding protein 7 DKFZp762E1312 70.2. 219.4 3.1 7.6OE-06 0.00218 hypothetical protein DKFZp762E1312 (HJURP) NUP210 78.4 284.9 1.6 7.7OE-06 0.00218 nucleoporin 210 kDa PLK1 72.8 1852 2.5 7.90E-06 0.0022 polo-like kinase 1 (Drosophila) ENPEP 16.2 29.4 O.3 8.00E-06 0.0022 glutamyl aminopeptidase (aminopeptidase A) HCAP-G 17.7 57.8 3.3 8.4OE-06 0.00228 chromosome condensation protein G UGT2B4 1117.8 246.7 O.2 9.20E-06 0.00245 UDPglucuronosyltransferase 2 family, polypeptide B4 C20orf27 29.7 245.3 1.9 9.30E-06 0.00245 chromosome 20 open reading frame 27 US 9,347,088 B2 10 TABLE 2-continued list of genes according to p value.

(8. le:8 ratio Parametric Gene symbol rC1 rC2 rC2/rC1 p-value FDR Description C6orf149 178.7 491.1 2.7 9.40E-06 0.00245 chromosome 6 open reading frame 149 (LYRM4) The Accession Numbers of the genes of Table 2, as found in CBI database in June 2008, are the following: IPO4 (BC136759), CPSF1 (NM 013291), MCM4 (NM 005914.2; NM 182746.1; two accession numbers for the same gene correspond to 2 different isoforms ofth gene), EIF3S3 (NM 0037562), NCL (NM_0053.812), CDC25CNM0017903), CENPA (NM001809,3; NM 00042426), KD i (BC113742), IPW (U12897), KNTC2 (AK313184), TM EM48 (NM 018087), BOP1 (NM O15201), EDF3S9 (NM 003751; NM_001037283), PH-4 (NM 177939), SMC4L1 (NM 005496; NM 001002800), TTK (AK315696), LAMA3 (NM 198129), C10orf7 (NM_001031746; NM 144984), TPX2 (NM 012112), MSH2 (NM 000251), DKC1 (NM 001363), STK6 (AY892410), CCT6 (NM_001762; NM 001009186), SULT1C1 (AK313193), ILF3 (NM 012218; NM 004516), IMPDH2 (NM 000884), HICA (NM 015094), AFM (NM 001133), MCM7 (NM 005916; NM 182776), CNAP1 (AK128354), CBARA1 (AK225695), PLA2G4 C (NM 003706), CPSF1(NM 013291), SNRPN (BCOOO611), RPL5 (AK314720), C1R (NM 001733), C16orf.34 (CH471112), PH (AK312649), BZW2 (BCO17794), ALAS1 (AK312566), FLJ20364 (NM 017785), RANBP1 (NM 002882), SKB1 (AFO 15913), ABH (NM 020676), CCNB1 (NM 031966), NOL5A (NM 006392), RPL8 (NM 000973; NM 033301), BLNK (NM 0133 NM_001114094), BYSL (NM 004053), UBE1L(AY889910), CH D7 (NM 017780), DKFZp762E1312 (NM_018410), NUP2 (NM 024923), PLK1(NM 005030), ENPEP(NM_001977), HCAP-G(NM 022346), UGT2B4 (NM 021139), C20orf27 (NM 001039140) and C6orf149 (NM 020408).

In a particular embodiment of the invention, the set of gene is assayed on the same sample and at the same time genes of the invention is designed to determine the grade of as the genes of the set to be assayed, and is called an hepatoblastoma, in particular paediatric hepatoblastoma. In invariant gene or a normalizer. The invariant gene is another embodiment, the set of genes is designed to deter generally selected for the fact that its expression is mine the grade of hepatocellular carcinoma, in particular steady whatever the sample to be tested. The expression paediatric HCC or adult HCC. “steady whatever the sample” means that the expression 25 of an invariant gene does not vary significantly between The expression of the genes of the set may be assayed by a normal liver cell and the corresponding tumor cell in a any conventional methods, in particular any conventional same patient and/or between different liver tumor methods known to measure the quantitative expression of samples in a same patient. In the present specification, a RNA, preferably mRNA. gene is defined as invariant when its absolute expression The expression may be measured after carrying out an 30 does not vary in function of the grade of the liver tumors, amplification process, such as by PCR, quantitative PCR in particular does not vary in function of the grade of the (qPCR) or real-time PCR. Kits designed for measuring HB or HCC tumor, and/or does not vary between liver expression after an amplification step are disclosed below. tumor and normal liver cells. The expression may be measured using hybridization In the present invention, the expression which is assayed is method, especially with a step of hybridizing on a solid Sup 35 preferably the relative expression of each gene, calculated port, especially an array, a macroarray or a microarray or in with reference to at least one (preferably 1, 2, 3 or 4) invariant other conditions especially in Solution. Arrays and kits of the gene(s). Invariant genes, Suitable to perform the invention, are invention, designed for measuring expression by hybridiza genes whose expression is constant whatever the grade of the tion method are disclosed below. livertumors, such as for exampleACTG1, EFF1A1, PNN and The expression of a gene may be assayed in two manners: RHOT2 genes, whose features are summarized in Table 3. In to determine absolute gene expression that corresponds to 40 a particular embodiment preferred, the relative expression is the number of copies of the product of expression of a calculated with respect to at least the RHOT2 gene or with gene, in particular the number of copies of a nucleotide respect to the RHOT2 gene. target, in the sample; and In another advantageous embodiment, the relative expres to determine the relative expression that corresponds to the 45 sion is calculated with respect to at least the PNN gene or with number of copies of the product of expression of a gene, respect to the PNN gene. It may be calculated with respect to in particular the number of copies of a nucleotide target, the RHOT2 and PNN genes. in the sample over the number of copies of the expres The calculation of the absolute expression or of the relative sion product or the number of copies of a nucleotide expression of each gene of the set and of each invariant gene target of a different gene (calculation also known as 50 being assayed with the same method from the same sample, normalisation). This different gene is not one of the preferably at the same time, enables to determine for each genes contained in the set to be assayed. This different sample a gene expression profile. TABLE 3 Features of invariant genes. ACTG1, EEF1A1, PNN and RHOT2 are defined in SEQID NOs: 34, 36, 38 and 40 respectively. symbol Gene name SEQID* Accession No Location Function ACTG1 actin, gamma 1 33 NM OO1614 17q25 cytoplasmic actin cytoskeleton in nonmuscle cells EEF1A1 eukaryotic translation 35 NM OO1402 6q14.1 enzymatic delivery of elongation factor 1 aminoacyl tRNAs to alpha 1 the ribosome PNN pinin, desmosome 37 NM 002687 14q21.1 transcriptional associated protein corepressor, RNA splicing regulator US 9,347,088 B2 11 TABLE 3-continued Features of invariant genes. ACTG1, EEF1A1, PNN and RHOT2 proteins are defined in SEQID NOS: 34, 36.38 and 40 respectively. symbol Gene name SEQID* Accession No Location Function RHOT2 ras homolog gene 39 NM 138769 16p13.3 Signaling by Rho family, member T2 GTPases, mitochondrial protein

An additional step of the method or process comprises the TABLE 4 determination of the grade of said liver tumor, referring to the Level of expression of the genes of Table 1, with respect gene expression profile that has been assayed. In a particular to the status of the robust tumors embodiment of the invention, the method is designed to deter 15 mine the grade of hepatoblastoma, in particular paediatric Nucleotide Expression status in robust tumor hepatoblastoma. In another embodiment, the method is designed to determine the grade of hepatocellular carcinoma, target with poor prognosis with good prognosis in particular paediatric HCC or adult HCC. AFP overexpressed underexpressed ALDH2 underexpressed overexpressed According to a particular embodiment of the invention, in APCS underexpressed overexpressed the step of the method which is performed to determine the APOC4 underexpressed overexpressed grade of the liver tumor, a gene expression profile or a signa AQP9 underexpressed overexpressed BUB1 overexpressed underexpressed ture (preferably obtained after normalization), which is thus C1S underexpressed overexpressed specific for each sample, is compared to the gene expression CYP2E1 underexpressed overexpressed profile of a reference sample or to the gene expression profiles 25 DLG7 overexpressed underexpressed DUSP9 overexpressed underexpressed of each sample of a collection of reference samples (individu E2F5 overexpressed underexpressed ally tested) whose grade is known, so as to determine the GHR underexpressed overexpressed grade of said liver tumor. This comparison step is carried out HPD underexpressed overexpressed with at least one prediction algorithm. In a particular embodi IGSF1 overexpressed underexpressed ment, the comparison step is carried out with 1, 2, 3, 4, 5 or 6 30 NLE overexpressed underexpressed prediction algorithms chosen in the following prediction RPL10A overexpressed underexpressed algorithms: Compound Covariate Predictor (CCP), Linear Discriminator Analysis (LDA). One Nearest Neighbor Reference samples usually correspond to so-called “robust tumor for which all the marker genes providing the signature (1NN), Three Nearest Neighbor (3NN), Nearest Centroid 35 (NC) and Support Vector Machine (SVM). These six algo are expressed (either under expressed or overexpressed) as rithms are part of the “Biometric Research Branch (BRB) expected i.e., in accordance with the results disclosed in Table Tools' developed by the National Cancer Institut (NCI) and 5, when tested in similar conditions, as disclosed in the are available on http://linus.nci.nih.gov/BRB-ArrayTools.h- examples hereafter. tml. Equivalent algorithms may be used instead of or in addi 40 A robust tumor having an overexpression of one or several tion to the above ones. Each algorithm classifies tumors gene(s) selected among ALDH2, APCS, APOC4. AQP9, within either of the two groups, defined as tumors with good C1S, CYP2E1, GHR and HPD genes (these genes belong to prognosis (such as C1) or tumors with bad prognosis (such as the so-called group of differentiation-related genes), and/or C2); each group comprises the respective reference samples an underexpression of one or several gene(s) selected among used for comparison, and one of these two groups also com 45 AFP, BUB1, DLG7, DUSP9, E2F5, IGSF1, NLE1 and prises the tumor to be classified. RPL10A genes (these genes belong to the so-called group of Therefore, when 6 algorithms are used, the grade of a proliferation-related genes), is an indicator of a robust liver tumor sample may be assigned with certainty to the class of tumor, in particular of a hepatoblastoma, with a good prog good prognosis or to the class of bad prognosis, when 5 or 6 nosis. A robust tumor having an overexpression of one or of the above algorithms classified the tumor sample in the 50 several gene(s) selected among AFP, BUB1, DLG7, DUSP9, same group. In contrast, when less than 5 of the above algo E2F5, IGSF1, NLE1 and RPL10A genes, and/or an underex rithms classify a tumor sample in the same group, it provides pression of one or several gene(s) among ALDH2, APCS, an indication of the grade rather than a definite classification. APOC4, AQP9, C1S, CYP2E1, GHR and HPD genes, is an Reference samples which can be used for comparison with indicator of a robust liver tumor, in particular of a hepatoblas the gene expression profile of a tumor to be tested are one or 55 toma, with a poor prognosis. In the present application, a gene several sample(s) representative for tumor with poor progno is said “underexpressed when its expression is lower than the sis (such as C2), one or several sample(s) representative of expression of the same gene in the other tumor grade, and a tumor with good prognosis (such as C1), one or several gene is said "overexpressed when its expression is higher sample(s) of a normal adult liver and/or one or several than the expression of the same gene in the other tumor grade. sample(s) of a fetal liver. 60 In a particular embodiment, Table 5 provides the gene Table 4 lists the level of expression of each gene of Table 1 expression profiles of the 16 genes of Table 1 in 13 samples of depending upon the status of the reference sample i.e., robust hepatoblastoma (HB) including 8 samples that have been tumor with poor prognostic and robust tumor with good prog previously identified as rC1 subtype and 5 samples that have nostic. Examples of methods to identify such robust tumors been previously identified as rC2 subtype. This Table can are provided in the examples. The present invention provides 65 therefore be used for comparison, to determine the gene a new classification method in this respect, which is based on expression profile of a HB tumor to be classified, with the discretization of continuous values. robust tumors disclosed (constituting reference samples), for US 9,347,088 B2 13 14 a set of genes as defined in the present application. Said proliferation-related genes or the group of downregulated comparison involves using the classification algorithms differentiation-related gene) is used to perform the method. which are disclosed herein, for both the selected reference The invention thus relates to a method enabling the deter samples and the assayed sample. mination of the tumor grade on a patient’s sample, which TABLE 5 Normalized qPCR data of 16 genes in 13 HB samples including 8 samples of the rC1 subtype and 5 samples of the rC2 subtype (in grey). The qpCR values have been obtained by measuring the expression of 16 genes in 8 samples of the rC1 subtype and 5 samples of the rC2 subtype by the SYBR green method using the primers as disclosed in Table 6 below and in the coniditions reported in the examples, and normalized by the ROTH2 gene (primers in Table 7).

grade AFP

-0.75

1.11 1.29 0.06 0.06 0.

The method of the present invention is also suitable to 30 comprises a classification of the tumor through discretization classify new tumor samples, and to use them as new reference according to the following steps: samples. Therefore, the gene expression values of these new measuring the expression and especially the relative (nor reference samples may be used in combination or in place of malized) expression of each gene in a set of genes some of the values reported in Table 5. defined as the signature of the tumor, for example by In another embodiment of the invention, the step of deter- 35 quantitative PCR thereby obtaining data as Ct or prefer mining the tumor grade comprises performing a method of ably Delta Ct, wherein said set of genes is divided in two discretization of continuous values of gene expression groups, a first group consisting of the proliferation-re obtained on the set of genes the tested patients’ samples. lated genes and a second group consisting of the differ Discretization is generally defined as the process of trans entiation-related genes (as disclosed above), forming a continuous-valued variable into a discrete one by 40 comparing the values measured for each gene, to a cut-off creating a set of contiguous intervals (or equivalently a set of value determined for each gene of the set of genes, and cutpoints) that spans the range of the variable's values. Dis assigning a discretized value to each of said measured cretization has been disclosed for use in classification perfor values with respect to said cut-off value, said discretized mance in Lustgarten J. L. et al., 2008. value being advantageously a '1' or a '2' value assigned The inventors have observed that discretization can be 45 with respect to the cut-off value of the gene and option effective in determining liver tumor grade, especially for ally, if two cut-offs values are used for one gene, a those tumors described in the present application, including further discretized value such as a “1.5” or another value Hepatoblastoma (HB) or Hepatocellular carcinoma (HCC). between “1” or “2 may be assigned for the measured The discretization method is especially disclosed in the values which are intermediate between the cut-offs val examples where it is illustrated by using data obtained on 50 tumor samples wherein these data are those obtained from lues, profiling the 16 genes providing the large set of genes for determining the average of the discretized values for the expression profiling according to the invention. It is pointed genes, in each group of the set of genes, out that the discretization method may however be carried out determining the ratio of the average for the discretized on a reduced number of profiled genes within this group of 16 55 values for the proliferation-related genes on the average genes, starting from a set consisting of 2 genes (or more for the discretized values for the differentiation-related genes) including one (or more) overexpressed proliferation genes, thereby obtaining a score for the sample, related genes chosen among AFP, BUB1, DLG7, DUSP9, comparing the obtained score for the sample with one or E2F5, IGSF1, NLE1 and RPL10A and one downregulated more sample cut-off(s), wherein each cut-off has been differentiation-related gene chosen among ALDH2, APCS, 60 assessed for a selected percentile, APOC4, AQP9, C1S, CYP2E1, GHR, HPD, said genes being determining the tumor grade as C1 or C2, as a result of the thus classified as a result of gene profiles observed on robust classification of the sample with respect to said sample tumors with poor prognosis (according to the classification in cut-off. Table 4 above). In particular embodiments of the discretiza The above defined ratio of average values may be alterna tion method, the number of assayed gene for expression pro 65 tively calculated as the ratio of the average for the discresized filing is 2, 4, 6, 8, 10, 12, 14 or 16 and the same number of values for the differentiation-related genes on the average for genes in each category (either the group of overexpressed the discretized values for the proliferation-related genes, to US 9,347,088 B2 15 16 obtain a score. If this calculation made is adopted the cut-offs sample, corresponding to the respective value(s) at one values are inversed, i.e., are calculated as 1/XXX. or more (especially 2 or 3) percentile(s), wherein said In order to carry out the discretization method of the inven percentile(s) is (are) either identical or different from the tion, the data obtained on the assayed genes for profiling a percentiles(s) selected for the genes. patient’s sample are preferably normalized with respect to 5 When the cut-offs values for each gene of the set of genes for one or more invariant gene(s) of the present invention, in profiling have been obtained for a sufficient number of rel order to prevent detrimental impact on the results that may evant samples and the cut-off value for the sample is deter arise from possible inaccurancy in the quantification of initial mined on the basis of the same samples, these cut-offs can be nucleic acid, especially RNA, in the sample. adopted as reference cut-offs for the user who will be carrying Normalization with respect to one invariant gene only, 10 especially when said invariant gene is RHOT2 gene has out the analysis of any further patient’s tumor sample, espe proved to be relevant in the results obtained by the inventors. cially for the purpose of determining the tumor grade in a Similarly normalization with respect to PNN gene would be patient’s sample, if the analysis is performed in identical or an advantageous possibility because the gene does also not similar conditions as the conditions which led to the estab lishment of the cut-offs values. vary in expression. 15 In order to design a discretization method for the determi Therefore the invention provides cut-offs values as refer nation of tumor grade of an individual sample of a patient, ence cut-offs, in order to carry out the determination of tumor according to the invention, cut-offs values have to be deter grade in particular testing conditions as those disclosed below mined to allow the determination of the tumor grade. The and in the examples. cut-offs values can be determined experimentally by carrying In a particular embodiment of the method of discretization, out the following steps on expression profiling results the cut-off for each gene is the value corresponding to a obtained on a determined number of tumor samples: determined percentile, which can be different for each of the defining a cut-off (threshold value) for each gene in the set considered two groups of genes (proliferation-related genes of genes designed for the signature, said cut-off corre on the one hand and differentiation-related genes on the other sponding to the value of the absolute or preferably rela 25 hand). The selected percentile (or quantile) is determined tive (i.e. normalized) expression of said gene at a with respect to the fraction of tumors (such as /3 or more) Selected percentile and said percentile being selected for harbouring some chosen features such as overexpression of each of two groups of genes defined in the set of genes. proliferation-related genes and/or dowregulation of differen In order to do so, the set of profiled genes comprises the tiation-related genes, in the two groups of genes of the set of same number of genes within each of the 2 groups of 30 genes. Especially, when one intends to assign more weight to genes consisting of the group of overexpressed prolif tumors displaying strong overexpression of proliferation-re eration-related genes encompassing AFP, BUB1, lated genes and/or strong downregulation of differentiation DLG7, DUSP9, E2F5, IGSF1, NLE1 and RPL10A and related genes, the cut-off corresponds to a high quantile the group of downregulated differentiation-related gene (above the 50", preferably the 60", or even above the 65", encompassing ALDH2, APCS, APOC4, AQP9, C15, 35 such as the 67" and for example within the range of 55' and CYP2E1, GHR, HPD (said groups being defined based 70") for said proliferation-related genes and the cut-off cor on gene profiles on robust tumors with poor prognosis), responds to a low quantile (below the 50', preferably equal to in each tumor sample assigning to each expression value or below the 40" for example the 33", and for example within (especially normalized expression value) obtained for the range of between 20' and 40") of the differentiation each expression profiled gene in the sample, a dis 40 related genes. The cut-off for each group of genes and the cretized value which is codified with respect to the cut cut-off for the sample may be determined with respect to the off value determined for the same gene and in line with same percentile(s) or may be determined with respect to the defined contiguous intervals of continuous values, different percentile. e.g. a discretized value of “1” or “2 if two intervals According to a particular embodiment of the invention, for (categories) are defined or a discretized value of “1”, 45 HB tumors, the percentile which is chosen for the overex “1.5” (or another values between 1 and 2) or “2” if three pressed proliferation-related genes is the 67" and the percen intervals are defined, said assignment of discretized tile which is chosen for the downregulated differentiation value being advantageously such that the “1” is assigned related genes is the 33". According to a particular for expression values falling below the cut-off found for embodiment of the invention, for HC tumors, the percentile the differentiation-related genes and for expression val 50 which is chosen for the overexpressed proliferation-related ues falling below the cut-off found for the proliferation genes is the 60" and the percentile which is chosen for the related genes, the '2' is assigned for expression values downregulated differentiation-related genes is the 40'. falling above the cut-off found for the differentiation Each percentile (or cut-off value corresponding to the per related genes and for expression values falling above the centile) defines a cutpoint and the discretized values for each cut-off found for the proliferation-related genes, and 55 gene are either “1” or '2' below or above said percentile. The optionally if a “1.5” is used it is assigned to values found values “1” and “2 are distributed with respect to the percen between the cut-offs; tiles so as to create the highest difference in the values of the on each tumor sample, determining in each group (prolif calculated ratio for the most different tumor grades. This is eration-related genes group or differentiation-related illustrated in the examples for the selected percentiles. genes group) the average value of said assigned dis 60 It has been observed that in a preferred embodiment of the cretized values of profiled genes of the set of profiled invention, the relative values of the profiled genes are deter genes. mined by real-time PCR (qPCR). determining a score for each sample, as the ratio between Conditions to carry out the real-time PCR are disclosed the average expression values of said genes in said two herein, especially in the examples, as conditions applicable to groups of genes in the set of profiled genes; 65 analyzed samples. determining on the basis of the obtained scores for all the PCR primers and probes suitable for the performance of tumor samples, one or more cut-off value(s) for the RT-PCR are those disclosed herein for the various genes. US 9,347,088 B2 17 18 In a particular embodiment of the invention, the analysed (here the 60') of the distribution of the modified scores, tumor is a hepatoblastoma and its grade is determined by using the samples of the intermediate class. A sample discretization as disclosed above and illustrated in the (initially classified in the intermediate class) with a examples, taking into account that: modified score below 1.3 can be re-classified into the C1 the set of assayed genes for profiling is constituted of the 16 class, and a sample with a modified score above 1.3 can genes disclosed; be re-classified into the C2 class. the invariant gene (of reference) is RHOT2: It is observed that the refinement of the results which are the cut-offs value for each gene based on-dCt (minus delta between the cut-offs of the samples is advantageous for hepa Ct) measures) are: tocellular carcinoma in order to increase the relevancy of the AFP: 3.96139596; ALDH2: 4.3590482: APCS: 4.4691582: 10 information on the tumor grade. APOC4: 2.03068712; AQP9: 3.38391456: Generally said refinement of the classification of the inter BUB1: -1.41294708; C1S: 4.24839464; CYP2E1: mediate results in the HCC is obtained by performing the 6.70659644; DLG7: -3.3912188: DUSP9: 2.07022648: following steps: E2F5: -0.72728656: GHR: -0.1505569200, HPD: a modified score is determined which corresponds to the 2.27655628; IGSF 1: 0.1075015200. NLE:-0.0234357 1999; 15 average of the discretized values of the “proliferation-related RPL1 OA: 6.19723876. genes' only for the sample. A new cut-off value is determined the cut-off value for the sample is 0.91 (for the 67') and for said genes, which is the cut-off value for the modified optionally a further the cut-off value for the sample is score (in the present case it is 1.3). This cut-off can be deter 0.615 (for the 33"). In such a case, a sample with a score mined via a percentile (here the 60') of the distribution of the above 0.91 is classified into the C2 class and a sample modified scores, using the samples of the intermediate class. with a score below 0.91 is classified into the C1 class. A sample (initially classified in the intermediate class) with a The reference to the cut-off at 0.615 may be used to modified score below the “proliferation cut-off (for example refine the results for values between both cut-offs. 1.3) can be re-classified into the C1 class, and a sample with In another embodiment of the invention, the tumor is an a modified score above the “proliferation cut-off (for hepatocellular carcinoma and its grade is determined by dis 25 example 1.3) can be re-classified into the C2 class. cretization as disclosed above and illustrated in the examples, From the 16 genes expressed in liver cells listed in Table 1, taking into account that: a set comprising from 2 to 16 genes (or more generally a set the set of assayed genes for profiling is constituted of the 16 as defined herein) may be used to assay the grade of tumor genes disclosed; cells in a tumor originating from the liver. The results 30 obtained, after determining the expression of each of the the invariant gene (of reference) is RHOT2: genes of the set, are then treated for classification according to the cut-offs value for each gene based on-dCt (minus delta the steps disclosed herein. The invention relates to each and Ct) measures) are: any combination of genes disclosed in Table 1, to provide a set comprising from 2 to 16 of these genes, in particular a set Cut-off for 35 comprising or consisting of 2,3,4,5,6,7,8,9, 10, 11, 12, 13, Gene name Cut-off for Taqman SybrGreen 14, 15 or 16 of these genes. In the designed set, one or many AFP -12634010 -2.3753O3S genes of Table 1 may be modified by substitution or by ALDH2 4014143 5.3143O2 addition of one or several genes as explained above, which APCS S. 6142907 6.399079 also enable to determine the grade of the liver tumor, when APOC4 -O.7963158 4.6S6336 40 assayed in combination with the other genes. AQP9 4.2836O11 5.446966 In a preferred embodiment, the liver tumor is a paediatric BUB1 -1.2736579 -3634476 C1S 6.3514679 6.24OOO2 HB, and the method or process of the invention enables to CYP2E1 6.9562419 S.829384 distinguish a first class, called C1, qualifying as a good prog DLG7 -2.335694 -4614352 nosis tumor and a second class, called C2, qualifying as a poor DUSP9 -7.979559 -18626715 45 prognosis tumor. The C1 grade is predominantly composed of E2F5 -0.440O218 -1367846 GHR 1.0832632 1.169362 fetal histotype cells (i.e., well differentiated and non prolif HPD 6.748O328 6.736329 erative cells). In contrast, the C2 grade presents cells other IGSF1 -4.841778S 7.6653982 than the fetal histotype Such as embryonic, atypic (crowded NLE -1.6167268 -182226 RPL10A 6.2483OS6 5.731897 fetal), small cell undifferiantiated (SCUD) and/or macrotra 50 becular cells. The present invention also relates to a kit suitable to deter the cut-off value for the score of a sample based on the mine the grade of a liver tumor from the sample obtained from ration between the average of the discretized values of a patient. This kit is appropriate to carry out the method or the “proliferation-related genes' on the “differentiation process described in the present application. related genes” are 0.66 determined as the 30" percentile 55 In a particular embodiment, the kit comprises a plurality of of the score) and 0.925 (determined as the 67 percentile pairs of primers specific for a set of genes to be assayed, said of the score) In such a case, a sample with a score above set comprising from 2 to 16 genes, said 2 to 16 genes being 0.925 is classified into the C2 class and a sample with a chosen in the group consisting of AFP, ALDH2, APCS, score below 0.66 is classified into the C1 class. The APOC4, AQP9, BUB1, C1S, CYP2E1, DLG7, DUSP9, sample with a score (initial score) between 0.66 and 60 E2F5, GHR, HPD, IGSF1, NLE1 and RPL10A genes. 0.925 can be assigned to an intermediate class. It can By “plurality', it is mean that the kit comprises at least as alternatively be classified as C1 or C2 using a modified many pairs of primers as genes to enable assaying each score corresponding to the average of the discretized selected gene, and in particular the nucleotide target of this values of the “proliferation-related genes'. A new cut gene. Accordingly, each gene and in particular its nucleotide off value is determined for said genes, which is the 65 target is specifically targeted by a least one of these pairs of cut-off value for the modified score (in the present case primers. In a particular embodiment, the kit comprises the it is 1.3). This cut-off can be determined via a percentile same number of pairs of primers as the number of genes to US 9,347,088 B2 19 20 assay and each primer pair specifically targets one of the chapter 8 and in particular pages 8.13 to 8.16). Various soft genes, and in particular the nucleotide targets of one of these wares are available to design pairs of primers, such as OligoTM genes, and does not hybridize with the other genes of the set. or Primer3. The kits of the invention are defined to amplify the nucle otide targets of the sets of genes as described in the present Therefore, each primer of the pair (forward and backward) invention. Therefore, the kit of the invention comprises from has, independently from each other, the following features: 2 to 16 pairs of primers which, when taken as a whole, are their size is from 10 and 50 bp, preferably 15 to 30 bp; and specific for said from 2 to 16 genes out of the 16 genes of they have the capacity to hybridize with the sequence of the Table 1. In particular, the kit comprises or consists of 2, 3, 4, nucleotide targets of a gene. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 pairs of primers specific 10 for 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 out of the 16 In a particular embodiment, when the pairs of primers are genes of Table 1. In a particular embodiment, the kit com used in a simultaneous amplification reaction carried out on prises or consists of 16 pairs of primers specific for the 16 the sample, the various primers have the capacity to hybridize genes of Table 1 i.e., a primer pair specific for each of the with their respective nucleotide targets at the same tempera following genes: AFP, ALDH2, APCS, APOC4, AQP9, 15 ture and in the same conditions. BUB1, C1S, CYP2E1, DLG7, DUSP9, E2F5, GHR, HPD, Conventional conditions for PCR amplification are well IGSF1, NLE1 and RPL10A genes. known in the art and in particular in Sambrook et al. An When the set of genes has been modified by the addition or example of common conditions for amplification by PCR is Substitution of at least one gene as described above, the kit is dNTP (200 mM), MgCl, (0.5-3 mM) and primers (100-200 adapted to contain a pair of primers specific for each added or substituted gene(s). As indicated by the term “comprises', the nM). kit may, besides the pairs of primers specific for the genes of In a particular embodiment, the sequence of the primer is Table 1, contain additional pair(s) of primers. 100% identical to one of the strands of the sequence of the In a particular embodiment, the kit comprises at least one nucleotide target to which it must hybridize with, i.e. is 100% pair of primers (preferably one) for at least one invariant gene 25 complementary to the sequence of the nucleotide target to (preferably one or two) to be assayed for the determination of which it must hybridize. In another embodiment, the identity the expression profile of the genes, by comparison with the or complementarity is not 100%, but the similarity is at least expression profile of the invariant gene. 80%, at least 85%, at least 90% or at least 95% with its The number of pairs of primers of the kit usually does not complementary sequence in the nucleotide target. In a par exceed 100, particularly 50, 30, 20, more particularly 16, and 30 ticular embodiment, the primer differs from its counterpart in even more particularly is maximum 5, 6, 7, 8, 9 or 10. the sequence of the sequence of the nucleotide target by 1, 2, In the kits of the invention, it is understood that, for each 3, 4 or 5 mutation(s) (deletion, insertion and/or substitution), gene, at least one pair of primers and preferably exactly one preferably by 1, 2, 3, 4 or 5 nucleotide substitutions. In a pair, enabling to amplify the nucleotide targets of this gene, is particular embodiment, the mutations are not located in the present. When the kits provide several pairs of primers for the 35 last 5 nucleotides of the 3' end of the primer. same gene, the gene expression level is measured by ampli In a particular embodiment, the primer, which is not 100% fication with only one pair of primers. It is excluded that identical or complementary, keeps the capacity to hybridize amplification may be performed using simultaneously sev with the sequence of the nucleotide target, similarly to the eral pairs of primers for the same gene. primer that is 100% identical or 100% complementary with As defined herein, a pair of primers consists of a forward 40 the sequence of the nucleotide target (in the hybridization polynucleotide and a backward polynucleotide, having the conditions defined herein). In order to be specific, at least one capacity to match its nucleotide target and to amplify, when of the primers (having at least 80% similarity as defined appropriate conditions and reagents are brought, a nucleotide above) of the pair specific for a gene can not hybridize with sequence framed by their complementary sequence, in the the sequence found in the nucleotide targets of another gene sequence of their nucleotide target. 45 of the set and of another gene of the sample. The pairs of primers present in the kits of the invention are In a particular embodiment, the pairs of primers used for specific for a gene i.e., each pair of primers amplifies the amplifying a particular set of genes are designed, besides nucleotide targets of one and only one gene among the set. some or all of the features explained herein, in order that the Therefore, it is excluded that a pair of primers specific for a amplification products (or amplicons) of each gene have gene amplifies, in a exponential or even in a linear way, the 50 approximately the same size. By “approximately' is meant nucleotide targets of another gene and/or other nucleic acids that the difference of size between the longest amplicon and contained in sample. In this way, the sequence of a primer the shortestamplicon of the set is less than 30% (of the size of (whose pair is specific for a gene) is selected to be not found the longest amplicon), preferably less than 20%, more pref in a sequence found in another gene, is not complementary to erably less than 10%. As particular embodiments, the size of a sequence found in this another gene and/or is not able to 55 hybridize in amplification conditions as defined in the present the amplicon is between 100 and 300 bp, such as about 100, application with the sequence of the nucleotide targets of this 150, 200, 250 or 300 bp. another gene. The nucleotide sequences of the 16 genes of Table 1 are In a particular embodiment, the forward and/or backward provided in the Figures, and may be used to design specific primer(s) may be labelled, either by isotopic (such as radio 60 pairs of primers for amplification, in view of the explanations active) or non isotopic (such as fluorescent, biotin, fluroro above. chrome) methods. The label of the primer(s) leads to the Examples of primers that may be used to measure the labelling of the amplicon (product of amplification), since the expression of the genes of Table 1, in particular to amplify the primers are incorporated in the final product. nucleotide targets of the genes of Table 1, are the primers The design of a pair of primers is well known in the art and 65 having the sequence provided in Table 6 or variant primers in particular may be carried out by reference to Sambrook et having at least 80% similarity (or more as defined above) with al. (Molecular Cloning. A laboratory Manual. Third Edition; the sequences defined in Table 6. US 9,347,088 B2 21 22 TABLE 6 Sequence of forward and backward primers of the 16 genes defined in Table 1. These primers may be used in any real-time PCR, in particular the SYBR green technique, except for the Tadman * protocol.

Product Target size (bp) Forward primer (5'-3') Reverse primer (5'-3')

AFP 5 AACTATTGGCCTGTGGCGAG TCATCCACCACCAAGCTGC

ALDH2 5 GTTTGGAGCCCAGTCACCCT GGGAGGAAGCTTGCATGATTC

APCS 5 GGCCAGGAATATGAACAAGCC CTTCTCCAGCGGTGTGATCA

APOC4 5 GGAGCTGCTGGAGACAGTGG TTTGGATTCGAGGAACCAGG

AQP9 5 GCTTCCTCCCTGGGACTGA CAACCAAAGGGCCCACTACA

BUB1 52 ACCCCTGAAAAAGTGATGCCT TCATCCTGTTCCAAAAATCCG

C1S 4. TTGTTTGGTTCTGTCATCCGC TGGAACACATTTCGGCAGC

CYP2E1 5 CAACCAAGAATTTCCTGATCCAG AAGAAACAACTCCATGCGAGC

DLG7 5 GCAGGAAGAATGTGCTGAAACA TCCAAGTCTTTGAGAAGGGCC

DUSP9 5 CGGAGGCCATTGAGTTCATT ACCAGGTCATAGGCATCGTTG

E2F5 5 CCATTCAGGCACCTTCTGGT ACGGGCTTAGATGAACTCGACT

GHR 5 CTTGGCACTGGCAGGATCA AGGTGAACGGCACTTGGTG

HPD 5 ATCTTCACCAAACCGGTGCA CCATGTTGGTGAGGTTACCCC

IGSF1 52 CACTCACACTGAAAAACGCCC GGGTGGAGCAATTGAAAGTCA,

NLE1 5 ATGTGAAGGCCCAGAAGCTG GAGAACTTCGGGCCGTCTC

RPL1 OA 5 TATCCCCCACATGGACATCG TGCCTTATTTAAACCTGGGCC

The kit of the invention may further comprise one or many 35 The kits of the invention may also further comprise, in pairs of primers specific for one or many invariant genes, in association with or independently of the pairs of primers particular specific for ACTG1, EFF1A1, PNN and/or RHOT2 specific for the invariant gene(s), reagents necessary for the genes. The pair of primers specific for invariant gene(s) may amplification of the nucleotide targets of the sets of the inven be designed and selected as explained above for the pair of O tion and if any, of the nucleotide targets of the invariant genes. primers specific for the genes of the set of the invention. In a The kits of the invention may also comprise probes as particular embodiment, the pairs of primers of the invariant disclosed herein in the context of sets of probes, compositions genes are designed in order that their amplification product and arrays. In particular, the kits also comprise the four (or amplicon) has approximately the same size as the ampli dNTPs (nucleotides), amplification buffer, a polymerase (in con of the genes of the set to be assayed (the term approxi particular a DNA polymerase, and more particularly a ther mately being defined as above, with respect to the longest 45 mostable DNA polymerase) and/or salts necessary for the amplicon of the set of genes). Examples of primers that may activity of the polymerase (such as Mg"). be used to amplify the particular invariant genes are primers Finally, the kits may also comprise one or several control having the sequence provided in Table 7 or primers having at sample(s) i.e., at least one sample(s) representative of tumor least 80% similarity (or more as defined above) with the with bad (i.e., poor) prognosis (in particular a HBC2 grade), sequences defined in Table 7. at least one sample(s) representative of tumor with good TABLE F Sequence of forward and backward primers specific for the invariant genes defined in Table 3. These primers may be used in real-time PCR, in particular the SYBR green technique, except for the Tadman protocol.

Product Target size (bp) Forward primer (5'-3') Reverse primer (5'-3')

ACTG1 151 GATGGCCAGGTCATCACCAT ACAGGTCTTTGCGGATGTCC

EFF1A1 151 TCACCCGTAAGGATGGCAAT CGGCCAACAGGAACAGTACC

PNN 151 CCTTTCTGGTCCTGGTGGAG TGATTCTCTTCTGGTCCGACG

RHOT2 151 CTGCGGACTATCTCTCCCCTC AAAAGGCTTTGCAGCTCCAC US 9,347,088 B2 23 24 prognosis (in particulara HBC1 grade), at least one sample of (Molecular Cloning. A laboratory Manual. Third Edition; a normal adult liver and/or at least one sample of a fetal liver. chapter 8 and in particular page 9.3.) In a particular embodi The kits may also comprise instructions to carry out the ment, the probes are modified to confer them different physi amplification step or the various steps of the method of the cochemical properties (such as by methylation, ethylation). invention. In another particular embodiment, the probes may be modi The invention is also directed to a set of probes suitable to fied to add a functional group (such as a thiol group), and determine the grade of aliver tumor from the sample obtained optionally immobilized on bead (preferably glass beads). from a patient. This set of probes is appropriate to carry out In a particular embodiment, the sequence of the probe is the method or process described in the present invention. It 100% identical to a part of one strand of the sequence of the may also be part of the kit. 10 This set of probes comprises a plurality of probes in par nucleotide target to which it must hybridize, i.e. is 100% ticular from 2 to 16 probes, these 2 to 16 probes being specific complementary to a part of the sequence of the nucleotide for genes chosen in the group consisting of AFP, ALDH2, target to which it must hybridize. In another embodiment, the APCS, APOC4, AQP9, BUB1, C1S, CYP2E1, DLG7, identity or complementarity is not 100% and the similarity is DUSP9, E2F5, GHR, HPD, IGSF1, NLE1 and RPL10A 15 at least 80%, at least 85%, at least 90% or at least 95% with a genes. part of the sequence of the nucleotide target. In a particular By “plurality', it is mean that the set of probes comprises at embodiment, the probe differs from a part of one strand of the least as many probes as genes to assay. In a particular embodi sequence of the nucleotide target by 1 to 10 mutation(s) ment, the array comprises the same number of probes as the (deletion, insertion and/or substitution), preferably by 1 to 10 number of genes to assay. nucleotide substitutions. By “a part of, it is meant consecu The probes of the sets of the invention are selected for their tive nucleotides of the nucleotide target, which correspond to capacity to hybridize to the nucleotide targets of the sets of the sequence of the probe. genes as described in the present invention. Therefore, the set In a particular embodiment, the probe, which is not 100% of probes of the invention comprise from 2 to 16 probes identical or complementary, keeps the capacity to hybridize, specific for 2 to 16 genes out of the 16 genes of Table 1. In 25 in particular to specifically hybridize, to the sequence of the particular, the sets of probes comprise or consist of 2, 3, 4, 5, nucleotide target, similarly to the probe which is 100% iden 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 probes specific of 2, 3, 4, 5, tical or 100% complementary with the sequence of the nucle 6,7,8,9, 10, 11, 12, 13, 14 or 15 out of the 16 genes of Table otide target (in the hybridization conditions defined herein). 1. In a particular embodiment, the sets of probes comprise or In a particular embodiment, the size of the probes used to consist of 16 probes specific for the 16 genes of Table 1 i.e., a 30 assay a set of genes is approximately the same for all the probe specific of each of the following genes: AFP, ALDH2, probes. By “approximately” is meant that the difference of APCS, APOC4, AQP9, BUB1, C1S, CYP2E1, DLG7, DUSP9, E2F5, GHR, HPD, IGSF1, NLE1 and RPL10A size between the longest probe and the shortest probe of the genes. set is less than 30% (of the size of the longest probe), prefer The specificity of the probes is defined according to the 35 ably less than 20%, more preferably less than 10%. same parameters as those applying to define specific primers. The set of probes of the invention may further comprise at When the set of genes has been modified by the addition or least one (preferably one) probe specific for at least one substitution of at least one gene as described above, the set of invariant gene (preferably one or two), in particular specific probes is adapted to contain a probe specific for the added or for ACTG1, EFF1A1, PNN and/or RHOT2 genes. The probes substituted gene(s). As indicated by the term “comprises', the 40 specific for invariant gene(s) may be designed and selected as set of probes may, besides the probes specific for the genes of explained above for the probes specific for genes of the sets of Table 1, contain additional probe(s). the invention. In a particular embodiment, the probes specific The number of probes of the set does usually not exceed of the invariant genes have approximately the same size as the 100, particularly 50, 30, 20, more particularly 16, and even probes specific of the genes of the set of be assayed (the term more particularly is maximum 5, 6, 7, 8, 9 or 10. 45 approximately being defined as above, with respect to the In the set of probes of the invention, it is understood that for longest probes of the set of genes). each gene corresponds at least one probe to which the nucle The invention is also directed to an array suitable to deter otide target of this gene hybridize to. The set of probes may mine the grade of a liver tumor from the sample obtained from comprise several probes for the same gene, either probes a patient. This array is appropriate to carry out the method or having the same sequence or probes having different 50 process described in the present application. Sequences. An array is defined as a solid Support on which probes as As defined herein, a probe is a polynucleotide, especially defined above, are spotted or immobilized. The solid support DNA, having the capacity to hybridize to the nucleotide target may be porous or non-porous, and is usually glass slides, of a gene. Hybridization is usually carried out at a tempera silica, nitrocellulose, acrylamide or nylon membranes or fil ture ranging from 40 to 60° C. in hybridization buffer (see 55 ters. example of buffers below). These probes may be oligonucle The arrays of the invention comprise a plurality of probes otides, PCR products or cDNA vectors or purified inserts. The specific for a set of genes to be assayed. In particular, the array size of each probe is independently to each other from 15 and comprises, spotted on it, a set of probes as defined above. 1000 bp, preferably 100 to 500 bp or 15 to 500 bp, more The invention also relates to a composition comprising a preferably 50 to 200 bp or 15 to 100 bp. The design of probes 60 set of probes as defined above in solution. is well known in the art and in particular may be carried out by In a first embodiment, the probes (as defined above in the reference to Sambrook et al. (Molecular Cloning. A labora set of probes) may be modified to confer them different physi tory Manual. Third Edition; chapters 9 and 10 and in particu cochemical properties (such as methylation, ethylation). The lar pages 10.1 to 10.10). nucleotide targets (as defined herein and prepared from the The probes may be optionally labelled, either by isotopic 65 sample) are linked to particles, preferably magnetic particles, (radioactive) or non isotopic (biotin, flurorochrome) meth for example covered with ITO (indium tin oxide) or polyim ods. Methods to label probes are disclosed in Sambrook et al. ide. The solution of probes is then put in contact with the US 9,347,088 B2 25 26 target nucleotides linked to the particles. The probe/target sification of 25 HB samples and status of CTNNB1 gene and complexes are then detected, for example by mass spectrom B-catenin protein. C1 and C2 classification was based on rC1 etry. and rC2 gene signature by using six different statistical pre Alternatively, probes may be modified to add a functional dictive methods (CCP LDA, 1NN, 3NN, NC and SVM) and group (such as a thiol group) and immobilized on beads the leave-one-out cross-validation. Black and gray squares (preferably glass beads). These probes immobilized on beads indicate mutations of the CTNNB1 and AXIN1 genes. Immu are put in contact with a sample comprising the nucleotide nohistochemical analysis of B-catenin in representative C1 targets, and the probe/target complexes are detected, for and C2 cases is shown. (D) Expression of representative example by capillary reaction. Wnt-related and B-catenin target genes (p<0.005, two-sample The invention is also directed to kits comprising the sets of 10 t test) in HB subclasses and non-tumor livers (NL). (E) Clas probes, the compositions or the arrays of the invention and sification of hepatoblastoma by expression profile of a preferably the primer pairs disclosed herein. These kits may also further comprise reagents necessary for the hybridization 16-gene signature. (F) Classification of normal human livers of the nucleotide targets of the sets of genes and/or of the of children with HB (from 3 months to 6 years of age) (NT) or invariant genes, to the probes (as such, in the compositions or 15 fetal livers at 17 to 35 weeks of gestation (FL) by expression on the arrays) and the washing of the array to remove unbound profile of a 16-gene signature. nucleotides targets. FIG.2: Molecular HB Subclasses are related to liver devel In a particular embodiment, the kits also comprise reagents opment stages. (A) Distinctive histologic and immunostain necessary for the hybridization, such as prehybridization ing patterns of HB subclasses C1 and C2. From top to bottom: buffer (for example containing 5xSSC, 0.1% SDS and 1% numbers indicate the ratio of mixed epithelial-mesenchymal bovine serum albumin), hybridization buffer (for example tumors and of tumors with predominant fetal histotype in C1 containing 50% formamide, 10xSSC, and 0.2% SDS), low and C2 subtypes; hematoxylin and eosin (H&E) and immu stringency wash buffer (for example containing 1XSSC and nostaining of Ki-67, AFP and GLUL in representative 0.2% SDS) and/or high-stringency wash buffer (for example samples. Magnification, x400. (B) Expression of selected containing 0.1xSSC and 0.2% SDS). 25 markers of mature hepatocytes and hepatoblast/liver progeni The kits may also comprise one or several control tors in HB subclasses and non-tumor livers. sample(s) i.e., at least one sample(s) representative for tumor FIG. 3: Validation of the 16-gene signature by qPCR in an with poor prognosis, at least one sample(s) representative of independent set of 41 HBs. Expression profiles of the 16 tumor with good prognosis, at least one sample of a normal genes forming the HB classifier are shown as a heatmap that adult liver and/or at least one sample of a fetal liver. Alterna 30 indicates high (red) and low (green) expression according to tively, it may comprise the representation of a gene expres log-transformed scale. HB tumors, HB biopsies (b) and sion profile of such tumors. human fetal livers (FL) at different weeks (w) of gestation Finally, the invention provides a kit as described above were assigned to class 1 or 2 by using the 16-gene expression further comprising instructions to carry out the method or process of the invention. 35 profile, six different statistical predictive methods (CCP, The arrays and/or kits (either comprising pairs of primers LDA, 1NN, 3NN, NC and SVM) and leave-one-out cross or probes or arrays or compositions of the invention or all the validation. Black boxes in the rows indicate from top to components) according to the invention may be used in vari bottom: human fetal liver, mixed epithelial-mesenchymal ous aspects, in particular to determine the grade of a liver histology, predominant fetal histotype, and B-catenin muta tumor from a patient, especially by the method disclosed in 40 tion. the present application. FIG. 4: Gene expression of the 16 genes of the prognostic The arrays and/or kits according to the invention are also liver cancer signature assessed by qPCR is presented as box useful to determine, depending upon the grade of the liver plot. The boxes represent the 25-75 percentile range, the lines tumor, the risk for a patient to develop metastasis. Indeed, the the 10-90 percentile range, and the horizontal bars the median classification of a liver tumor in the class with poor prognosis 45 values. is highly associated with the risk of developing metastasis. FIG. 5: Expression level of the 16 liver prognostic signa In another embodiment, the arrays and/or kits according to ture genes shown case by case in 46 hepatoblastomas and 8 the invention are also useful to define, depending upon the normal livers. C1 tumors (green), C2 tumors (red) and normal grade of the liver tumor, the therapeutic regimen to apply to liver (white). the patient. 50 FIG. 6. Correlation between molecular HB subtypes and The invention also relates to a Support comprising the data clinical outcome in 61 patients. (A) Association of clinical identifying the gene expression profile obtained when carry and pathological data with HB classification in the complete ing out the method of the invention. set of 61 patients. Only significant correlations (Chi-square test) are shown. PRETEXT IV stage indicates tumorous BRIEF DESCRIPTION OF THE DRAWINGS 55 involvement of all liver sections. (B) Kaplan-Meier plots of overall survival for 48 patients that received preoperative The colour version of the drawings as filed is available chemotherapy. Profiling via the 16-gene expression signature upon request to the European Patent Office. was used to define C1 and C2 subclasses in tumors resected FIG.1. Identification of two HB subclasses by expression after chemotherapy, and differences between survival curves profiling. 60 were assessed with the log-rank test. (C) Overall survival of (A) Schematic overview of the approach used to identify 17 HB patients for which pretreatment biopsies or primary robust clusters of samples, including two tumor clusters (rC1 Surgery specimens were available. The signature was applied and rC2) and one non-tumor cluster (NL) (B) Expression exclusively to tumor samples without prior therapy. (D) Mul profiles of 982 probe sets (824 genes) that discriminate rC1 tivariate analysis including 3 variables associated to patients and rC2 samples (p<0.001, two-samplet test). Data are plot 65 survival. The predominant histotype is defined as either fetal ted as a heatmap where red and green correspond to high and or other (including embryonal, crowed-fetal, macrotrabecu low expression in log-transformed scale. (C) Molecular clas lar or SCUD types). Tumor stage is defined by PRETEXT US 9,347,088 B2 27 28 stage (Perilongo et al., 2000) and/or distant metastasis at Patients were treated either by partial hepatectomy (PH) or diagnosis and/or vascular invasion. HR, Hazard Ratio; CI. by orthotopic liver transplantation (OLT). Unless specified, Confidence Interval. the follow-up was closed at 146 months. FIG.7: Clinical, pathological and genetic characteristics of A: HCC cases were classified into 3 classes by the discreti 61 HB cases. SR: standard risk; HR: high risk according to 5 zation method using as cut-offs the 33" and the 67" percen SIOPEL criteria; NA: not available: PRETEXT: pre-treat tiles. ment extent of disease according to SIOPEL: DOD: dead of B: 47 HCC cases previously classified into the intermediate disease; * : Vascular invasion was defined by radiological class (3335 weeks) and earlier (17 to 26 weeks) oncofetal AFP gene associated to high protein levels in tumor developmental stages were classified as C1 and C2 respec cells by IHC (FIG. 2A) and in patients sera (r–0.79, tively, further supporting that HB subclasses reflect matura p-0.0001). C2 tumors also abundantly expressed hepatic pro tion arrest at different developmental phases. TABLE 8 Gene expression of the prognostic signature for liver cancer by quantitative RT-PCR. NL Fold-change

median min max median min max median min 8X C1 NL C2, NL C2, C1 C1, C2

AFP 0.4 O.O 33.3 30.7 O.O 456.1 O.2 O.O 8.8 2.3 38.1 16.5 O.1 ALDH2 87.1 13.2 356.7 1S.O 2.2 744 240.4 1516 387.6 O.3 O.1 O.2 5.2 APCS 61.6 1.1 338.9 1.9 O.O 276.2 158.6 92.7 509.5 O.2 O.O O.1 19.8 US 9,347,088 B2 37 38 TABLE 8-continued Gene expression of the prognostic signature for liver cancer by quantitative RT-PCR. NL Fold-change

median min max median min max median min 8X C1 NL C2, NL C2, C1 C1, C2

APOC4 21.3 4.3 122.8 1.6 O.1 24.2 47.0 22.3 112.4 O.S O.O O.1 16.1 AQP9 60.6 8.O 540.6 2.5 O.1 90.1 46.6 38.0 72.7 3 O.1 O.1 18.9 BUB1 O.O O.O 0.4 O.9 O.1 3.9 O.O O.O O.1 .2 16.1 13.4 O.1 C1S 51.1. 14.9 277.2 7.5 1.3 96.O 223.4 129.3 S65.3 O.2 O.O O.2 5.7 CYP2E1 S83.2 97.7 3463.O 19.7 O.4 1504.O 1128.6 S27.6 1697.O 0.7 O.O O.O S1.6 DLG7 O.O O.O O.O O.1 O.O O.S O.O O.O O.O 7 12.4 7.3 O.1 DUSP9 1.5 0.4 45.7 19.1 O.O 1790 O.6 O2 1.3 4.0 18.3 4.6 O.2 E2F5 O.2 O.O 2.0 1.1 O.1 11.7 O.1 O.O O.S 8 6.5 3.5 O.3 GHR 5.2 O.O S4.O O.S O.O 2.4 35.2 20.8 54.5 O.1 O.O O.1 8.6 HPD 22.9 O.9 1820 1.2 O.1 23.8 111.5 62.6 165.7 O.2 O.O O.1 14.0 IGSF1 O.1 O.O 1.7 1.7 O.O 19.8 O.1 O.O O.1 2.2 22.4 10.2 O.1 NLE 0.4 O.1 4.8 O.8 O.3 S.1 0.4 O2 O.8 .2 2.2 1.8 O.S RPL1 OA 73.3 12.O 230.4 98.2 11.9 432.8 86.9 54.1 159.9 O.8 1.1 1.5 0.7 NL, non-tumor liver; C1, good prognosis hepatoblastomas; C2, bad prognosis hepatoblastomas. Shown are the me ian values of 46 hepatoblastomas from 41 patients, the minimal and maximal values in each class, and the fold changes between classes, Data are presented in arbitrary units after normalization of the raw quantitative PCR values with genes (ACTG1, EFF1A1, PNN and RHOT2) that presents highly similar values in all samples. Gene expression of the 16 genes are presented on FIGS. 4 and 5.

The 16-Gene Signature as a Strong Independent Prognos histotypes, HBs of the C2 subclass were tightly associated tic Factor with features of advanced tumor stage, such as vascular inva In a First Set of 61 Patients 25 sion and extrahepatic metastasis (FIG. 9A). Accordingly, The clinical impact of HB molecular classification was overall survival of these patients was markedly impaired. addressed in a first set of 61 patients (FIGS. 7 and 8), com Kaplan-Meier estimates of overall survival probability at prising 37 (61%) C1 and 24 (39%) C2 cases. Besides strong 2-years were 60% for patients with C2 tumors and 94% for association with predominant immature histotypes, HBS of patients with C1 tumors (p=0.00001, log rank test), and simi the C2 subclass were tightly associated with features of 30 lar trends were seen for disease-free survival probabilities advanced tumor stage, such as vascular invasion and extrahe (Table 9). patic metastasis (FIG. 6A). Accordingly, overall survival of these patients was markedly impaired. Kaplan-Meier esti TABLE 9 mates of overall survival probability at 2-years were 50% for patients with C2 tumors and 90% for patients with C1 tumors 35 Survival analysis (Kaplan Meier, log rank test); DFS: disease-free (p=0.0001, log rank test), and similar trends were seen for survival: Others: dead or alive with recurrent disease. disease-free survival probabilities (data not shown). Next, we N. of patients 61 C1 - 25 C2 = 86 P value examined whether pre-operative chemotherapy treatment Survival (all patients) Alive Dead given to 48 patients could affect tumor classification. These C1 50.3

Tumor follow-up tumor grade tumor differentiation tumor vascular invasion (CeCe O

id length (years) (Edmonson) according to OMS size 800 micro metastasis

HC1 O.O7 3 moderately differentiated 120 NA absent O (ClC6 HC10 O.95 4 moderately poorly differentiated 75 absent absent O (ClC6 HC11 11.10 NA NA 15 absent absent O (ClC6 HC12 O.OS NA Well differentiated 60 NA NA O (ClC6 HC14 1.OO NA moderately poorly differentiated 8O NA NA O (ClC6 HC15 122 3 moderately differentiated 60 present present O (ClC6 US 9,347,088 B2 43 44 TABLE 1 1-continued features of the HCC samples obtained from 67 patients (pages 60 to 62)

Tumor follow-up tumor grade tumor di ere ia ion tumor vascular invasion ClelC O

length (years) (Edmonson) ing to OMS size 800 micro metastasis

7 O.96 2 erentia OO 8Se. 8LSel O ClelC 8 O.39 3 differentiated 40 present present NA O S.40 NA erentia 40 NA NA O ClelC 1 O.70 NA NA OO NA NA NA 2 1...SO NA erentia 45 Sel Sel O ClelC 3 1.93 2 erentia 50 Sel Sel O ClelC 5 5.87 2 erentia 40 Sel Sel NA 7 O.10 NA erentia 15 Sel Sel O ClelC 8 O.10 NA 2O NA present O ClelC 3.33 2 60 8Se. 8LSel (CeCe. O 1.78 3 16 NA NA O ClelC 2 O.66 2 60 8Se. NA O ClelC 4 4.72 2 40 8Se. 8LSel (CeCe. 7 O.2O NA 35 present present Oil 8 1.12 NA 50 8Se. NA (CeCe. 148 OO 8Se. 8LSel O ClelC 1 7:44 30 NA 8LSel (CeCe. 2 O.S8 Olea erentiate 30 possible; present O ClelC non certain O.2O Olea erentiate 15 NA NA O ClelC O.25 Olea erentiate 10 absent 8LSel O ClelC 8.30 Olea erentiate OO absent 8LSel O ClelC 25 We erentia ed 90 absent (Sel (CeCe. 5.25 Olea erentiate 40 absent 8LSel (CeCe. 8.93 Well to mo ely erentiated 75 absent 8LSel O ClelC SO We erentia ed OO Sel (Sel (CeCe. O8. i erentiate 30 absent 8S O CUC moderately poorly di OO Sel (Sel O ClelC Well to mo (8. f 35 Sel (Sel O ClelC Poor y di 200 Sel (Sel O ClelC Well to mo (8. 55 absent (Sel (CeCe. Well to mo (8. 60 Possible: (Sel O ClelC non certain 05 modera 40 Sel (Sel (CeCe. O6 modera 8O Sel (Sel O ClelC O7 We erentia ed 60 8Se. absent O ClelC O8 modera ely ifferentiated 26 8Se. (Sel O ClelC 09 Well to very We Il differentiated 30 8Se. absent O ClelC 10 modera e ifferentiated 30 Sel (Sel O ClelC 11 modera ifferentiated 40 Sel (Sel O ClelC 12 Well to mo ely erentiated 18 8Se. absent O ClelC 13 Well to mo ely erentiated 50 Sel (Sel O ClelC 14 We erentia ed 36 Sel absent O ClelC 19 We erentia ed 90 Sel absent O ClelC 2O Olea ely erentiate 140 Sel absent O ClelC 21 Well to mo (8. erentiated 28 Sel absent O ClelC 22 Very well ifferen iated 40 Sel absent O ClelC 23 Olea ely erentiate 26 Sel (Sel O ClelC 24 Well to mo (8. erentiated 2O 8Se. (Sel O ClelC 25 Olea ely erentiate 150 Possible: (Sel O ClelC non certain 26 0.75 We 2O present (Sel (CeCe. 27 O4O Olea 43 probable probable O ClelC 28 O.S2 Olea e 62 absen absent O ClelC 29 O.30 Olea 25 absen (Sel O ClelC 31 O42 We : 130 present (Sel (CeCe. 32 O.25 Well to mo erentiated 115 present (Sel (CeCe. 33 0.44 Well to mo erentiated 110 absen (Sel O ClelC 34 O.10 Olea 30 absen (Sel O ClelC 35 O.14 Olea , 38 absen Possible: O ClelC non certain 36 O.26 2-3 Well to mo (8. ely erentiated 120 absen (Sel O ClelC

A: non available; macro: macrovacular invasion; micro: microwacular invasion US 9,347,088 B2 45 46 TABLE 12 TABLE 12-continued features of the HCC samples obtained from 67 patients, features of the HCC samples obtained from 67 patients, and features of patients (pages 63 and 64 and features of patients (pages 63 and 64 Chronic Other 5 Chronic Other Tumor Score METAVIR wiral Viral etiology alco- etiol- Tumor Score METAVIR wiral Viral etiology alco- etiol D Activity Fibrosis hepatitis HBV HCV hol ogies D Activity Fibrosis hepatitis HBV HCV hol ogies

HC1 NA 4 O O O yes HC102 1 1 yes yes yes O HC10 NA 4 yes yes O O 10 HC103 3 4 yes yes O O HC11 NA NA yes yes yes O HC104 O 1 O O O O HC12 NA NA yes yes O O HC1 OS 2 4 yes O yes O HC14 NA NA yes O yes yes HC106 1 4 yes yes O O HC15 3 3 O O O yes HC107 O O-1 O O O yes HC17 NA 3 yes yes O O HC108 1 1 yes O yes O HC18 2 4 O O O yes 15 C109 2 4 O O O yes NASH HC2O NA NA O O O yes HC110 1 4 yes O yes yes HC21 NA NA O O O yes HC111 1 4 O O O yes HC22 NA NA O O no yes HC112 2 2 O O no no NASH HC23 NA O O O O O HC113 1 4 yes O yes O HC2S O O O O O O HC114 2 3 O O no yes HC27 NA NA yes no yes no 20 HC119 2 1 O no no no NASH HC28 O O O O O O HC12O 2 3 yes yes O O HC3 NA 4 yes O yes O HC121 2 4 yes O yes O HC30 NA 4 O O O yes HC122 O 1 O O O O HC32 NA 4 yes O yes O HC34 NA O O O O O HC123 2 4 yes O yes yes HC37 NA NA O O no yes HC124 1 4 yes yes O O HC38 NA 4 yes O yes O 25 HC125 2 4 O O O yes NASH HC4 NA 1 O O O O HC126 1 4 yes yes O O HC41 NA 4 yes O yes no HC127 2 4 yes O yes no HC42 2 1 yes yes O O HC128 1 1 O O O no NASH HC43 NA NA yes O yes O HC129 2 4 O O O yes HCS2 NA 4 yes yes O O HC131 O 1 O O O O HCS8 2 3 yes O yes O 30 HC132 1 1 yes yes O O HC6 NA 1 O O O yes Hemochro HC133 2 2 O O O yes HC64 2 2 yes O yes no HC134 2 3 yes O yes no HC66 NA 4 yes yes O yes HC13S 1 2 yes yes O O HC7 2 3 O O no yes HC136 O 1 O O O O HC8 NA 4 yes O yes O HC9 1 3 O O O yes 35 A: non available; HBV: hepatitis B virus;HCV; hepatitis C virus; hemochro: hemochro HC101 2 4 yes yes yes yes matosis; NASH: non alcoholic steatohepatitis.

TABLE 13 Quantitative PCR data of the 16-gene signature normalized to the expression of the ROTH2 gene (pages 65 to 68) HC1 HC3 HC7 HC8 HC9 HC10 HC11

FP 2.212911 -3.865709 -7.675811S -7.946,9815 5.311541 2.0890815 - 7.048.3095 2.386963S O.648833S LDH2 6.2372335 6.23OO74 2.186358 S4231O3S 4.0446765 3.9297.005 3.0017225 O.95212 S.958.108 POC4 O.614689 O.95786 -1608247 O.96142SS -3.550537 -0.6776965 -9.6721075 NA 1.076151 PCS 7.0721355 7.52919 5.845683 7.3704745 5.196791.5 6.567126 -O.O17488 -1.0272875 7.7638.255 QP9 6.047695 6.7334475 3.759528 7.006052 6.7471.03 3.10821SS 3.7536735 1.34.00495 6.122144 UB1 3.841SOS -0.147459 -4.221132 -O.S2S2O45 -O.299039 -1214781 2.980O29 -1.864677 -2.3624.54 S 8.1634.92 8.79634OS 5.8997645 8.162856 4.062593 7.2991.535 4.83O331 2.6399.02 8.319293 1 O.309323S 10.428O74 7.11475.15 10.133426S 11.024027 7.791OO75 O.S82S245 3.604805 9.575619 5.30317 -2.057513 -4.4.2264-65 -16282005 -1169221 -2.8O866 1.3733475 NA -2.84.322O5 R -1 1.616567 -8.8462855 -9.4268.185 -10.22O51 -6.6521625 -9.6946695 -9.52626SS NA NA O.OS328 -1909804 -1.74321.9S O.O24339 -O.283.3465 -0.01931.65 O.71 1082 -1.344368 -0.736822 2.655512 2.069524 -2.OO12965 18878OS -1.7428205 2.342442 -232421.95 -0.4900285 4.7S7848 9.449416 8.5498O3 9.4152S3 8.5958965 6.183977 5.329776 -0.011478 2.932809 9.029214 6.46034 -7.249974 NA -7.158O385 -3.192514 -2806768 -4O26769 NA -7.639001S 1.1594.17 -15801355 -3.145993S O.6940375 -0.391956S -1579419 -O.80375 NA - 193287SS 6.622S235 6.0562915 4.4121.905 6.863 7555 7.138112S 6.2S74845 6.3016635 9.1966.395 7.379063

HC12 HC15 HC17 HC18 HC2O HC21 HC22 HC23

C 6.538312 6.14089 7.1950405 -6.856588 -0.65281 -4.3O70475 -4418.018 -SS384.38 -3.90298 LDH2 4.6271S65 4.5178635 2.6522585 1840894 6.287.083 2.175112 5.331214 5.8534.86 6.162477 POC4 1.221393 -5.156O26 -2395651 -3.84764 3.2094.885 -6.2591.235 0.5455545 0.5708905 1834891 6.942673 3.38O1 O2 4S167O3S 4.916924 8.211763S 5.9159775 6.683SO3S 6.9009145 8.798759 4.1878425 2.373344 2.8711295 3.609349S 7.354605 1.1452S3S 5.7992305 6.651868 8.758959 3.293346 O.883 0545 1.08844-85 -0.063545 -14635O25 O.08O2935 -2.173361 -2.5475915 -2.5679685 6.850O23 7.1343975 6.O3S123 4.263272 8.471663 5.7190985 7.251.4145 8.2212235 8.5606875 Y 7.284587 4.939.0935 6.O37085 5.811062 10.2S36915 1.2878O15 8.0876755 9.047SO9 10.814935 -471.99665 -0.14142O5 O666.284 -1512286 -2.1165725 -O.3224.55 -3.3904095 -3.848,364 -3.342O2 USP9 NA -4.4342765 -3.163581 -8.7756845 -9.6208445 -7.8162765 - 10.827291 NA -7.1111525

US 9,347,088 B2 49 50 TABLE 13-continued Quantitative PCR data of the 16-gene signature normalized to the expression of the ROTH2 gene (pages 65 to 68) HPD 7.245347 7.714358 6.685692 6.83S254 9.220498 8.5127,155 7.480725 8.73O1975 4.7774665 IGSF1 - 1.8696S -34428695 -2.045068 -S.1813245 -5.3901.7 -9.4041.96 -5.980435 -8.648O295 -5.14OO615 NLE1 -1.O12752 -1.119237 -2.156348 -13170345 -0.400823 -1.1096815 - 17581.63 -22430545 - 15951645 RPL1 OA. S.S682OS 6.1905075 5.8884625 5.795905 7.954231 6.45.1717S 6.4042S45 5.199782 4.7323885

HC124 HC12S HC126 HC127 HC128 HC129 HC131 HC132 HC133

AFP 3.952S335 -4.806564 -5.8994.37 -O.O39.0765 S.863630S -3.43O757 -1.4911.89 5.426S2OS -S.1621.395 ALDH2 4.027289 4S4S146S S.O2839 2.41699 5.085525 4.62.9847S 5.425994 3.105643 4.246.2915 APOC4 -0.04.9906S 2.6326775 O.4O789S O.8680995 -0.626498 -18639SS 2.4702 -6.9974S15 0.631.56 APCS 5.391271 6.5321595 S.283836S 4.846 116 5.087517 4.84487OS 8.6617295 -3.274886S 7.145861 AQP9 4.463488 8.370224 3.6163S45 18613935 4.3.18491S 2.87.0839 7.4772145 3.924.4375 6.OS182 BUB1 - 1.592563 11627945 -2.6943O25 -2.048769 -1.3297375 -2.3688.215 -0.727709 O.2895395 -4.9277675 C1S S.151686 8.4244OSS 7.1365955 6.3641.695 6.828468 7.302922 7.525072 4.390O82 7.31881.45 CYP2E1 9.520436 9.426232 S.226091 6.181306S 7.4344O3S 2.692798 8.98645 7.O4SS73S 8.1908895 DLG7 -2.03781 O.3286S45 -3.94.4339 -2.96212 -2.6299155 -3.6405185 -1.461713 -1.SS72645 -S.S44733S DUSP9 -8.810SS -9.3740615 -8.7174575 -8.672372 -8.499.355 - 7.06274SS -8.415907 -3.3843145 -8.02.2457 E2F5 O.S7416S -O.O28878 -3.271927 -2.1626O2 -4.393 094 -O-470421 O.154573 1.9018925 -2.634.1825 GHR 2.236930S O.697866 1.824385 O.129431 1971 6885 2.332961 4.OO965S 1.7710325 2.2298.335 HPD 7.832169 5.7813 1865621 3.4481965 S.70528SS S-SO2918, 8.96O383 2.365386S 6.1281315 IGSF1 - 1.4450915 - 10.2234745 - 7.659377 -3.15032OS -2.72995 -S-692623 -7.5832OOS -1947OSS NA NLE -O.1499775 -0.405397 -2.033278 -2.205965 -1949352 -1.6838O8 -15313675 O.2O3S885 - 14173895 RPL1 OA 6.691S21 7.1.1965.75 5.389.272 4.3385115 6.6181545 4.8697295 6.77S249 6.7796O75 5.762015

AFP HC134 HC13S HC136

ALDH2 2.8738695 -O.909107 -0.4105.125 APOC4 4.061101 2.74.4216S 6.0408575 APCS -O-1134065 -O.7630605 0.7390785 AQP9 7.5103485 O.959726 7.150737 BUB1 S.SSO642 4.0595 615 5.996,196 C1S 1.7425995 -1.2O18365 -4.288554 CYP2E1 8.460933S 4.667223 8.243333 DLG7 7.859.701 4.30592 9.042865 DUSP9 O.814873S -2.2SO3OS -5.5267715 E2F5 -4.96739 -5.7946OS -10.93O772S GHR 3.1030595 O.98616S -2.4040865 HPD 1.313.8565 -0.6955.465 4.O13948 IGSF1 7.231,144 6.7262275 8.223611 NLE1 -0.3848995 -4.394354 -7.4962365 RPL1 OA O.794.433 -O.9780515 -2.426321 AFP 7.714O66S 6.689595 5.506.9335

NA: non available 40 Data were then analyzed by unsupervised clustering TABLE 1.4 (dCHIP software) using 2 methods: average and centroid. Tumors were clustered into 2 groups, C1 and C2. Most of the Variable C1 C2 p-value samples have been attributed the same classification using the 2 methods, exceptp for 6 samplesp (9%)O that have been attrib- 45 TumorModerately-poorly grade -2 (Edmonson) differentiated (OMS) 13,291736 21,2323.25 67 (chi-square mediated erase of a specific reaction Characteristics percentile percentile test) 10 min at 95°C. to activate the polymerase and inactivate Previous C1/C2 52.5 2,26 1.0739e-14 the UNG classification 40 cycles: 25 Gender Male/Female 28,29 7.21 O.O3368 PRETEXT.stage 30.25 11.15 O.30367 15 sec at 95°C. denaturation step I-IIII-IV 1 min at 60° C. annealing and extension Distant Metastasis 45,12 15, 13 O.O15808 Final dissociation step to Verify amplicon specificity. No Yes Vascular invasion 38/17 11,17 O.OO90345 The normalized qPCR (deltaCt) values of the 85 HB samples No Yes are given in Table A. 30 Multifocality No/Yes 38:18 15, 13 O.20088 Analysis of qPCR Data. Histology 34,22 16,22 0.75303 Assignment of a discretized value for the 8 proliferation Epithelial/Mesenchymal B-catenin mutation 8:45 8,16 O.O67697 related genes (“AFP” “BUB1” “DLG7” “DUSP9” “E2F5” No Yes “IGSF1” “NLE” “RPL10A) was based on the 67' quantile Main epithelial 49.7 5,21 2.332O6e-9 (i.e. percentile), given that around /3 of HB cases overexpress 35 component proliferation genes, which is correlated with tumor aggres Fetal. Other siveness and poor outcome. Assignment of a discretized value for the 8 differentiation-related genes (ALDH2” “APCS” *Other = embryonal, macrotrabecular, crowded fetal “APOC4” “AQP9 “C1S “CYP2E1 “GHR. “HPD) was The best correlation of the discrete classification was based on the 33" quantile, given that around /3 of HB cases 40 observed with the previous classification into C1 and C2 underexpress differentiation genes, which is correlated with classes, followed by the main epithelial histological compo tumor aggressiveness and poor outcome. nent. The correlation with patients’ survival is also excellent, The cut-offs (or thresholds) selected for the -deltaCT value as shown by using the Kaplan-Meier estimates and the log of each gene were determined after considering said chosen rank test. Illustrative Kaplan-Meier curves are given in FIG. percentiles for each group of genes are as follows: 45 11 for specific cancer-related Survival, using different percen AFP: 3.96139596; ALDH2: 4.3590482: APCS: 4.4691582: tiles to classify the tumors. APOC4: 2.03068712; AQP9: 3.38391456: In conclusion, this study shows that the discretization BUB1: -1.41294708; C1S: 4.24839464; CYP2E1: method allows to classify hepatoblastoma as efficiently as the 6.70659644; DLG7: -3.3912188: DUSP9: 2.07022648: previously described method. E2F5: -0.72728656: GHR: -0.1505569200, HPD: 50 A similar approach was therefore applied to the analysis of 2.27655628; IGSF 1: 0.1075015200. NLE:-0.0234357 1999; hepatocellular carcinoma. RPL1 OA: 6.19723876 Analysis of 114 Hepatocellular Carcinomas (HC) For the sample, the relative expression value is determined RNA Preparation for each gene of the set of profiled genes. Each value is RNA was extracted by using either Trizol, RNeasy kit compared to the cut-off for the corresponding gene and is then 55 (QIAGEN) or miRvana kit (Ambion), then quantified and discretized as a result of its position with respect to said quality-checked by Agilent technology. cut-off. For each cDNA preparation, 1 lug of RNA was diluted at the The next step consisted in assigning a discretized score to final concentration of 100 ng/ul, and reverse transcribed with each sample as follows: the Superscript RT kit (Invitrogen) following the manufac 1—the average of the “discretized values of the 8 prolifera 60 turer's protocol. Random primers were added at the final tion-related genes was determined. The 8 proliferation-re concentration of 30 ng/ul and the final volume was 20 ul. The lated genes are the following: AFP, BUB1, DLG7, DUSP9, cDNA was diluted 1:25, and 5ul were used for each qPCR E2F5, IGSF1, NLE, and RPL10A. reaction. We added 5ul of 2xSybr Green Master mix or the 2—the average of the “discretized values of the 8 differen Taqman Master mix (Applied Biosystems) and specific prim tiation-related genes was determined. The 8 differentiation 65 ers (and probes when using Taqman chemistry) at the con related genes are the following: ALDH2, APCS, APOC4. centration indicated by the manufacturer. Each reaction was AQP9, C1S, CYP2E1, GHR, and HPD. performed in triplicate. qPCR reactions were run on the US 9,347,088 B2 55 56 Applied Biosystems 7900HT Fast Real-Time PCR System TABLE E with a 384-well thermo-block, and the conditions were the following: of cut-offs for discretization values 2 min at 50° C. to activate Uracil-N-glycosylase (UNG)- Cut-off for mediated erase of aspecific reaction (omit if using the Gene name Cut-off for Taqman SybrGreen Taqman approach) AFP -12634010 -2.3753O3S 10 min at 95°C. to activate the polymerase and inactivate ALDH2 4014143 5.3143O2 the UNG APCS S. 6142907 6.399079 40 cycles: APOC4 -O.7963158 465.6336 15 sec at 95°C. denaturation step 10 AQP9 4.2836O11 5.446966 1 min at 60° C. annealing and extension BUB1 -1.2736579 -3634476 Final dissociation step to Verify amplicon specificity (omit C1S 6.3514679 6.24OOO2 CYP2E1 6.9562419 S.829384 if using the Taqman approach) DLG7 -2.335694 -4614352 Quantitative PCR DUSP9 -7.979559 -18626715 Real time RT-PCR was performed for 16 genes on 114 15 E2F5 -0.440O218 -1367846 HCC samples using two different technologies: GHR 1.0832632 1.169362 Sybr Green as described above for hepatoblastoma (26 HPD 6.748O328 6.736329 samples). IGSF1 -4.841778S 7.6653982 Taqman methodology (88 samples) using primers and NLE -1.6167268 -182226 probes designed and publicly released by Applied Bio RPL10A 6.2483OS6 5.731897 systems company. For the sample, the relative expression value is determined Examples for each gene of the set of profiled genes. Each value is compared to the cut-off for the corresponding gene and is then 25 discretized as a result of its position with respect to said AFP forward primer: GCCAGTGCTGCACTTCTTCA cut-off. The next step consisted in assigning a score to each sample AFP reverse primer: TGTTTCATCCACCACCAAGCT as follows: AFP Taqman probe: ATGCCAACAGGAGGCCATGCTTCA 30 1—the average of the “discretized values of the 8 prolifera RHOT2 forward primer: CCCAGCACCACCATCTTCAC tion-related genes was determined. The 8 proliferation-re lated genes are the following: AFP, BUB1, DLG7, DUSP9, RHOT2 reverse primer: CCAGAAGGAAGAGGGATGCA E2F5, IGSF1, NLE, and RPL10A.

RHOT2 Taqman probe: CAGCTCGCCACCATGGCCG 35 2—the average of the “discretized values of the 8 differen Each reaction was performed in triplicate for Sybr Green tiation-related genes was determined. The 8 differentiation protocol and in duplicate for the taqman protocol. qPCR related genes are the following: ALDH2, APCS, APOC4. reactions were run on the Applied Biosystems 7900HT Fast AQP9, C1S, CYP2E1, GHR, and HPD. Real-Time PCR System with a 384-well thermo-block. 3. The score for each sample was determined as the ratio Raw data for each gene were normalized to the expression of 40 between the average of proliferation-related genes and the the ROTH2 gene, providing the deltaCt values that were then average of differentiation-related genes. used for tumor classification into Subclasses using the dis According to this calculation, a score of 2 is the theoretical cretization method. maximal score for highly proliferating and poorly differenti The normalized qPCR values (deltaCt) of the 16 genes in 26 HCC samples analyzed by the Sybr Green approach is given 45 ated tumors, whereas well differentiated and slowly prolifer in Table C. The deltaCt values for 88 HCCs analyzed by the ating tumors will have a theoretical minimal score of 0.5. Taqman approach are given in Table D. Based on the scores assigned to the 114 samples analyzed, Analysis of qPCR Data. cut-offs are identified to separate the samples into relevant The-deltaCt values for each gene in each sample was used. subclasses. Three different cut-offs that correspond to the The cut-offs (or thresholds) selected for each gene using the 50 30rd (0.66), 50th (0.8125) and 67th percentile (0.925) have Taqman method or the SybrGreen method are as follows: been assessed, leading to 4 different classification methods.

Table F of discretized values for 114 HCCs using 3 different thresholds and 4 combinations

Method 1 3-class: (1): q67 (2): >q30 (2): >q67 (2): >q50 survival (years) HCOO1 O.6875 2 2 1 1 1 O.O7 HCOO3 0.6875 2 2 1 1 1 3.33 HC 004 O.7272727 2 2 1 1 O 1148 HCOO6 O.812S 2 2 1 2 1 1.25 HCOO7 1454S4S5 3 2 2 2 1 1.5 HCOO8 1.0769231 3 2 2 2 1 8.48

US 9,347,088 B2 59 60 -continued

Table F of discretized values for 114 HCCs using 3 different thresholds and 4 combinations

Method 1 3-class: (1): q67 (2): >q30 (2): >q67 (2): >q50 survival (years)

HC 146 0.9 2 2 2 O 4.33 HC 147 0.66666.67 2 2 1 O 3.83 HC 148 11 3 2 2 2 O 3.08 HC 149 1.2222222 3 2 2 2 3.42 HC 150 0.66666.67 2 2 1 O S.42 HC 151 0.6153846 1 1 1 O 2.25 HC 152 0.64285,71 1 1 1 3.67 HC 153 O.6923.077 2 2 1 4.83 HC 154 1.375 3 2 2 2 2.21 HC 155 0.8181818 2 2 2 O 4.1 HC 156 1.4 3 2 2 2 2.31 HC 157 1 3 2 2 2 3.59 HC 159 0.7272727 2 2 2.42 HC 161 O.6 1 1 O 4.47 HC 162 11111111 3 2 2 2 O 3.49 HC 163 O.6 1 1 2.21 HC 164 O.64285,71 1 1 O 4.54 HC 16S O.64285,71 1 1 O 4.72 HC 168 0.6 1 1 O 6 HC 169 O.6 1 1 2.78 HC 17O O.S625 1 1 O S.29 HC 171 O.8181818 2 2 2 O 4.57 HC 172 0.8333333 2 2 2 O 3.9 HC 173 0.64285,71 1 1 O 4.21 HC 176 0.64285,71 1 1 1 1 O 4.57 HC 177 O.66666.67 2 2 O S.42 HC 178 O.7142857 2 2 O 2.5 HC 179 0.8181818 2 2 2 O 5.17 HC 18O O.8571429 2 2 2 1 3.58 HC 181 1 3 2 2 2 O 6.83 HC 182 O.S625 1 1 O 3.5 HC 183 0.7333333 2 2 1 4.08 HC 184 O.9230769 2 2 2 1 2.08 HC 185 O.76923O8 2 2 O 2.25 HC 186 0.928,571.4 3 2 2 2 1 2.17 HC 187 0.64285,71 1 1 O 7.67 HC 188 O.7142857 2 2 O 4.67 HC 189 0.86666.67 2 2 2 1 3.25 HC 190 O.761904.8 2 2 O 5.58

Samples were separated into the corresponding Subgroups, TABLE H-continued and Subsequent analysis was carried out using the 4 classifi Clinical and pathological parameters and molecular classification cation methods. Survival for each group was determined 50 of 114 HB cases. using the Kaplan-Meier estimates and the log-rank test. Characteristics Statistical Analysis of Clinical Correlations with the Sub HBV 23 (20%) classes for 114 HCCs Hemochromatosis 6 (5%) A complete table with all clinical and pathological data ss NASH 6 (5%) Unknown 23 (20%) collected for 114 HCC patients is given in Table G. The Treatment (SR, OLT) 93.21 different parameters are represented as follows: Chronic viral hepatitisf 46 (41%) Liver cirrhosis 44 (48%) TABLE H Tumor characteristics Clinical and pathological parameters and molecular classification 60 Macrovascular invasion 20 (25%) of 114 HB cases. Microvascular invasion 47 (50%) Characteristics Mean tumor size, cm (range) 7.9 (1.5-22) Multifocality 46 (48%) Etiology" Histology: Alcohol 40 (36%) 65 Edmonson Tumor grade (1/2/3/4) 735,475 HCV 26 (23%) OMS Tumor differentiation (WMP) 51/55/6 US 9,347,088 B2 61 62 TABLE H-continued TABLE I Clinical and pathological parameters and molecular classification Summary of the clinical variables associated to overall Survival of 114 HB cases. (Kaplan-Meier curves and log-rank test). This Table does not take into Characteristics account the molecular classification. 5 Classification with 16-genes by discretization N. N. patients Log Variable patients Log rank With PH rank 40' Percentile (C1/C2) 3O84 50 Percentile (C1/C2) 55.59 Edmonson Tumor grade 94 O.O28 73 O.O32 67" Percentile (C1/C2) 77/37 (1-2,3-4) Mean follow-up, months (range) 43.6 (0.26-146) 10 Tumor diff. OMS 111 O.406 90 O.647 Tumor recurrencef 43 (40%) (Well/Moderate-poorly diff.) Alive/DODf 75.38 High proliferation: >10 45 O.OS4 34 O4O2 Mitosis in 10 fields 40x A.bbreviations: (N/Y) H CV, hepatitis C virus; Macrovascular Invasion 79 O.OO1 59 O.O10 BV, hepatitis B virus; is (NY) ASH, Nonalcoholic steatohepatitis; Microvascular Invasion 92 O.007 72 O.OSO S , surgical resection; (N/Y) O LT, orthotopic liver transplantation; Tumor size >10 cm 113 O.298 92 O314 W, well differentiated; M, moderately differentiated; P. poorly differentiated; NA, not available; 20 Classification by Discretization of Continuous Values OD, dead of cancer, The clinico-pathological parameters were compared 12 cases have more than one etiological agent and data were not available for 2 cases, between the tumor groups using student's t test and chi Data were not available for all cases. Percentages were deduced from available data. square test. Survival was analyzed by using Kaplan-Meier In a second step, the intrinsic parameters of the tumors curves and log rank test. A special attention was given to the correlated with patients survival were analyzed. In this series 25 classification with the 67" percentile. Follow-up was closed of tumors, only tumor grade (Edmonson) and vascular inva at 146 months for overall survival (OS) and at 48 months for sion were significantly correlated with survival. disease-free survival (DFS). TABLEJ Association of 16-gene classification by discretization with clinical and pathological data (chi-square test). Abbreviations: P33,337 percentile, P50, 50" percentile and P67, 67 percentile. p-value P67

Variable P33 PSO P67 C2 comments

Edmonson Tumor O.OO6 10 mitosis in 10 fields 40x (N/Y) Macrovascular Invasion O.097 O.O33 O.OO8 448 1612 The cases defined as (N/Y) possible are considered negative. Microvascular O.O71 O.OO1 O.O09 37,26 921 The cases defined as Invasion (NY) possible are considered negative. Tumor size S S O.O15 57 20 1918. Different cut-offs <>10 cm assessed: 2, 3, 5 and 10 cm Multifocality (NY) S S S 35.30 15,16 Macronodules of S S S 24.9 12.4 regeneration Norm Liver AOFO-AOF1 S S S 48.17 27.7 Cirrhosis AXF4 (NY) S S S 31 29 1715 Score METAVIR O.053 O.O44 S 19.32 5.20 Activity > 0 (NY) Score METAVIR S O.2O S 31 20 15.10 Activity > 1 (NY) Score METAVIR O.O41 S S S.48 2/27 Fibrosis > 0 (NY) Score METAVIR S S S 1935 7/22 Fibrosis > 1 (NY) Score METAVIR S S S 24, 30 8.21 Fibrosis > 2 (NY) US 9,347,088 B2 63 64 TABLE J-continued Association of 16-gene classification by discretization with clinical and pathological data (chi-square test). Abbreviations: P33,337 percentile, P50, 50" percentile and P67, 67 percentile. p-value P67

Variable P33 PSO P67 C1 C2 comments

Score METAVIR S S S 26.28. 15.14 Fibrosis > 3 (NY) Chronic viral O.O47 S S 48.29 18.17 hepatitis (NY) HBV (NY) 0.075 S S 62.1S 27.8 HCV (N/Y) S S S 61.16 25.10 Alcohol (NY) S S S 47:30 25.10 Recurrence (NY) S S S 41.32 24, 11 HCCO34 and HCC030 censored Survival (NY) O.OSO 0.023 O.O31 S6,21 1917 HCCO2S and HCC030 censored DFS (N/Y) S S S 35,42 1S21 HCCO2S and HCC030 censored

In conclusion, these data show significant correlations (OLT). The ability of the 16-gene signature to discriminate between molecular classification using the 3 methods and the between recurrent and non-recurrent tumors was also following parameters: Tumor grade (Edmonson), tumor dif assessed. ferentiation (OMS), proliferation rate, vascular invasion and 25 Survival. In contrast, the classifications were not correlated TABLEL with etiological factors (viral hepatitis, alcohol, etc. ...), with Summary of Survival analysis using Kaplan-Meier curves and log the state of the disease in adjacent, non tumoral livers or with rank test tumor recurrence. The data suggest that classification using the 67" percentile 30 Analysis N. patients Classif, method Log rank seems to be the most adequate and is strongly recommended OS 113 P33 0.037 to classify HCCs. 113 P50 O.OOS 113 P67 O.OO2 Multivariate Analysis DFS 113 P33 O.078 To further determine the efficiency of the molecular clas 35 113 P50 O.019 sification using the 67"percentile, we performed multivariate 113 P67 O.O72 analysis with the Cox regression test on two sets of patients (CeCe. 108 P33: 0.1348 108 P50: O.115* for which all data were available: 108 P67 1.OOO 91 patients that received either surgical resection or ortho Analysis of 92 cases that received Surgical resection ptic liver transplantation (OLT) 40 OS 92 P33 O.O32 71 patients that received Surgical resection. 92 P50 O.OO9 Different variables associated to survival in the clinical set 92 P67 O.O13 tings have been included in the multivariate analysis: 1) DFS 92 P40 S Edmonson grade, 2) microvascular invasion and 3) Molecular 92 P50 S 92 P67 S classification using the 67th percentile. 45 (CeCe. 88 P33 S 88 P50 S TABLE K 88 P67 S Multivariate test (Cox regression). Abbreviations: OS, overall survival; 50 DFS, disease free survival patients variable HR 95% CI p-value *There is a trend but it is not significant and it is lost in the P60 analysis 91 Molec classsif (p67) 2.534 (1.214-5.289) 0.016 The different analyses are illustrated in the Kaplan-Meier (Surgical Edmonson Tumor grade 1.690 (0.747-3.823) 0.205 resections (1-2,3-4) plots shown in FIG. 12. The discretization method of classi and OLT) Microvascular Invasion 2.451 (1.105-5.435) 0.024 55 fication showed the same efficiency in the analysis of tumors (N/Y) obtained either from Surgical resection (also called partial 71 Molec classsif (p67) 2.646 (1.1156.278) O.O32 (only Edmonson Tumor grade 2.697 (1.103-6.592) 0.026 hepatectomy, PH) or from orthotopic liver transplantation Surgical (1-2,3-4) (OLT), showing that the clinical management of the tumor resections) Microvascular Invasion 1681 (0.648-4.359) 0.282 had no impact on the classification. (N/Y) 60 In conclusion, the method described herein is able to clas Sify HCC cases according to tumor grade and patient’s Sur Correlation of the Molecular Classifications with Survival vival, and represents a powerful tool at diagnosis to stratify For overall survival (OS) and disease-free survival (DFS), the tumors according to the prognosis, and for further clinical we compared the efficiency of the 3 methods of discretization management of HCC. In particular, it may be an excellent tool that separate the samples into 2 subclasses. Independent stud 65 for the decision of orthotopic liver transplantation, since the ies were made for patients that received Surgical resection and criteria used currently are limited and often poorly informa for patients that received orthoptic liver transplantation tive of the outcome. US 9,347,088 B2 65 66 Protocol for Applying the Method to a New Sample Example The following protocol is designed according to the inven For patient X having an HC tumor a Taqman qPCR is tion: performed. 1—extract total RNA from the tumor specimen using well 5 Step one: assignment of discretized values to each selected established technologies. gene among proliferation-related genes and differentia 2 synthesize cDNA synthesis (suggested conditions: 1 ug tion-related genes. Example: the DCt of AFP is -4.0523 RNA and 300 ng of random hexamers for a 20 ul-reaction) The cut-off for AFP for qPCR using Taqman technology is 3—amplify the selected genes said genes being in equal num- -1.2634010 Given that -4.0523 is lower than the cut-off, the ber of each of the groups defined as overexpressed prolifera- 10 assigned discretized value is 2. tion-related genes group and downregulated differentiation- Step two: Determination of the average of discretized val related genes group (profiled genes within the group of 2 to 16 ues for the 2 sets of 8 genes: genes) and the reference gene (invariant gene) such as for AFP=2; BUB1=1; DLG7=2: DUSP9=2: E2F5=2; IGSF1=1: example the RHOT2 gene 1:5 cDNA dilution, using either NLE=2; RPL10A=1: Taqman or SybrGreen qPCR technology. 1s AVERAGE OF PROLIFERATION-RELATED GENES: 4—determine the Delta Ct (DCt) value for each gene (2+1+2+2+2+1+2+1)/8=1.625 5—compare the value with the threshold of reference (for HB &E's disi. local : AQP9=1; C1S=2; or for HC) in order to assign a discretized value of “1” or “2. 4. l 4. AVERAGE OF DIFFERENTIATION-RELATED GENES: 5—determine the average of discretized values in each group, 20 (1+1+1+1+2+2+1+2)/8=1.375 i.e., for the selected proliferation-related genes (up to 8) sepa Step Three: calculate the ratio proliferation/differentiation rately for and the selected differentiation-related genes (up to SCO 8) and determine the ratio of these 2 average values which is In this example: 1.625/1.375=1.18182 the score of the sample. Step 4: compare the result with the reference scores: 6–compare the result with the reference scores correspond- C1 ing to the following cut-offs: |30' percentile=0.6667 C1 |50 percentile=0.8125 3Ord=0.6667 |67 percentile=0.925 C2 50th=0.8125 30 Classification based on the value of the ratio=1.18182.~ — 67th=0.925 As the value is above the 67" percentile, the assigned class is C2 C2. TABLE A i AFP ALDH2 APCS APOC4 AQP9 BUB1 C1S CYP2E1

HB -7.684892 -4.5927O2 -0.66O189 -2.651,319 -4194894 -1068O25 -1394.659 -3.334692 HB100 -7.682724 -3.84.9128 -0.372S66 0.297.278 -O.305.738 O.65983 -2.572264 -7.352142 HB101 1.8O1478 -7.157316 -1166513 -4.924,476 -8.067838 6.222865 -5.284734 -11.7S7699 HB102 -7.76111S -5.696697 -1.044129 -2.374592 -3.447046 2.724363 -3.657616 -5.7694.17 HB103 2.908O26 -2580629 -2.748625 -2.SS63S 1480624 3.891875 -2.819372 O454623 HB106 O.294.848 -7.534485 -1424535 -5.377.043 -7.886612 4.855797 -6.80698 -11496.242 HB107 O.719866 -6.546O79 -9.18522 -3.42SO75 -6.189664 3.901806 -5.609115 -10.6711 SSS HB11 14928OS -3560021 -S.O94387 -1.031623 -8.42849 2.086834 -6.166353 -9.043371 HB112 4.1552S2 -6486961 -0.154814 -4.48155 -5.634596 3.762347 -7.88579 -8.960815 HB114 6.2971 -3966456 S.O2266 O604275 3.037682 4.23408 -5.29691 -O.313326 HB118 O.318307 -4.311795 -5.1464.09 -3.787568 -5428442 2.329959 -5.284.827 -7.342423 HB121 -O. 971033 -6.879043 -8.355819 -4.679.393 -6.361435 2.329708 -6.559457 -8.871 OS HB122 2.188721 -6.220957 -7.7399 -3.410743 -5.745306 3.309004 -6.327656 -8.906339 HB12S 2.929931 -4053616 -4.882212 -2324.94 -3.352398 5.067815 -4-2SS762 -7.8874SS HB126 2.458273 -5.577951 -6.518289 -3.1824O7 -5.243351 S.27.0089 -5.814672 -8.1883O7 HB129 -4.93O877 -2.124281 -0.744262 1.154663 -O.846572 O421372 -2.925.458 -4708874 HB130 -4861.99 -1.139837 -1398,588 0.115559 -1313951 1.669S43 -2.37235 0.175598 HB131 S.S4S4O6 -1.714367 -1045683 2.628822 1903853 1972112 -2.306818 O.O69456 HB132 2.654369 -3.71955 -6.543987 -3.876868 -47099 4.043489 -48O1651 -7.72SO89 HB136 5.005516 -3.234557 -4.827.283 2.471208 -O.SO238S -1.945351 -4.324749 -4.844765 HB140 2.8354.57 -7.04.1546 -6.886O4 -S.56.1912 -5089682 4.140594 -6.O23758 -10.477228 HB142 S.200474 -4.919616 2.416807 2.058522 -3.3961.71 1.380591 -5.96S126 1.196438 HB145 3.58286 -5186236 -5.18731 NA -5.118895 S.S8416 -5.786933 -7.880334 HB146 -1.2900S6 -5422341 -5.973879 -3.869993 -5.908O24 O982626 -4.124487 -8.7S1883 HB147 -9.442257 -3.6SS303 -0.36.2122 1.179633 -2.349782 -1.51351 -2.756099 O3O832 HB148 -35664O1 -5.382S48 -6.721533 -2.38O348 -6.951359 1.183916 -4.188648 -7.101147 HB150 2.356994 -556181 -5.4961.86 -445536 -5.603247 5.136577 -5.435261 -8.522OO1 HB153 -2O863O2 -4364035 -4.049735 -1.1908 -4342186 2.437297 -6.OSSO92 -7.522683 HB15S -1.9512S6 -5.14O738 -7.17357 -O.801318 4.538929 4.038.538 -5.939438 3.058475 HB156 -6.523604 -4.658O12 -5.112322 -1499.462 -1.13031 1970226 -4763811 -8.138508 HB157 -8.7472S2 -3.193287 -0.914511 0.563 787 -0.139273 O.6481.9S -3.0893O2 -2.404646 H6160 4.40621 -0.878277 -2.381785 -1.9527 0.77O799 4.5162O3 -2.89522 1.197611 HB162 -1.127062 -5.1421.95 -6.564426 -2.432348 -5.1796O1 3.271.57 -4.959578 -9-351464 HB16S -1.O15428 -1578048 -1612095 -1677494 1.921.123 -0.416058 -4.579384 -0.458984 HB167 -7.32343S -5692388 -6.461153 -2470512 -4.912208 -0.369976 -4.94.9694 -10583.324 HB170 -O.98OO72 -5.786627 -7.2651.56 -3.690367 -5.952908 1.54896.7 -6.61768 -8574004 HB171 2.310988 -5.68763S -7.127181 -3.794.631 -5.898.63S 2.05689 -6.42O469 -8.856566

US 9,347,088 B2 73 74 TABLE B-continued

O Vascular Main Epithelial beta-catenin Follow-up Surgery Follow-up invasion Multifocality Histology component Status (months) Outcome Type speoS (years) 18 1.5 17 14166666.67 14 1.1666666.67 NA 6 O.S 7 O.S83333333 42 3.5 6 O.S 49 4.083333333 53 4.416666667 37 3.083333333 33 2.75 32 2.6666666.67 A wt (FAP) 31 2.58.3333333 121 10.08333333 18 1.5 22 1.833333333 13 1.083333333 0.75 NA 11 O.9166666.67 C F 72 6 48 4 35 2.9166666.67 23 19166666.67 Wt 63 5.25 46 3.833333333 2O 1666,6666.67 25 2.083333333 35 2.9166666.67 69 5.75 25 2.083333333 wt (FAP) 15 1.25 24 2 17 14166666.67 M 41 3.4166666.67 91 7.58.3333333 Wt 29 24166666.67 M X O4166666.67 F Wt 55 4.58.3333333 39 3.25 Wt 55 4.58.3333333 68 S. 6666666.67 52 4.333333333 t A NA O.083333333 g O6666666.67 66 96 8 f 21 1.75 F O6666666.67 Wt 120 10 C F 53 4.416666667 NA NA O.S Wt 32 2.6666666.67 63 5.25 30 2.5 36 3 0.75 EM 23 19166666.67 ME NA 16 1.333333333 11 O.9166666.67 O.1666666.67 73 16 1.333333333 1 Wt O.083333333 Wt O.1666666.67 g ME F Wt 32 2.6666666.67 HB72 9.5 O.7916666.67 HB48 0.75 HB102 O.333333333 HB160 NA 14 1.1666666.67 HB172 NA 10 O.833333333 HB99 O.S83333333 HB130 62 S.1666666.67 HB98 s wt (FAP) 30 2.5 HB136 Wt 34 2.833333333 HB16S fE O.333333333 HB1 wt (FAP) 12 1 HB93 EEE 33 2.75 HB129 wt (FAP) S4 4.5 HB33 E F wt (AXIN1) 3.5 R O.2916666.67

US 9,347,088 B2 95 96 REFERENCES Lee, J.S., Heo, J., Libbrecht, L., Chu, I. S. Kaposi-Novak, P. Calvisi, D. F., Mikaelyan, A., Roberts, L. R., Demetris, A. Assou, S., Le Carrour, T., Tondeur, S., Strom, S., Gabelle, A., J., Sun, Z. et al. (2006). A novel prognostic subtype of Marty, S., Nadal, L., Pantesco, V., Reme, T., Hugnot, J. P. human hepatocellular carcinoma derived from hepatic pro et al. (2007). A meta-analysis of human embryonic stem is cells transcriptome integrated into a web-based expression genitor cells. Nat Med 12, 410-416. atlas. Stem Cells 25, 961-973. McLin, V.A., Rankin, S.A., and Zorn, A. M. (2007). Repres Boyault, S., Rickman, D. S., de Reynies, A., Balabaud, C., sion of Wnt/B-catenin signaling in the anterior endoderm is Rebouissou, S., Jeannot, E., Herault, A., Saric, J., Belghiti, essential for liver and pancreas development. Development J., Franco, D., et al. (2007). Transcriptome classification of 134, 2207-2217. HCC is related to gene alterations and to new therapeutic O Perilongo, G., Shafford, E., and Plaschkes, J. (2000). SIOPEL targets. Hepatology 45, 42-52. trials using preoperative chemotherapy in hepatoblastoma. Finegold, M. J., Lopez-Terrada, D. H. Bowen, J., Washing Lancet Oncol 1, 94-100. ton, M. K., and Qualman, S. J. (2007). Protocol for the Rowland, J. M. (2002). Hepatoblastoma: assessment of cri examination of specimens from pediatric patients with teria for histologic classification. Med Pediatr Oncol 39, hepatoblastoma. Arch Pathol Lab Med 131,520-529. 15 478-483 Fodde,ing in R., cancer and Brabletz,Sternness T. and (2007). malignant Wnt/beta-catenin behavior. Curr signal- Opin Schnater, J.M., Kohler, S.E., Lamers, W. H., von Schweinitz, Cell Biol 19, 150-158. D., and Aronson, D. C. (2003). Where do we stand with Glinsky, G. V., Berezovska, O., and Glinskii, A. B. (2005). hepatoblastoma? A review. Cancer 98, 668-678. Microarray analysis identifies a death-from-cancer signa- 20 Taniguchi. K. Roberts, L. R. Aderca, I.N. Dong, Xi Qian, ture predicting therapy failure in patients with multiple C., Murphy, L. M., Nagorney, D. M., Burgart, L.J., Roche, types of cancer. J. Clin Invest 115, 1503-1521. P. C. Smith, D. I., et al. (2002). Mutational spectrum of Hirschman, B. A., Pollock, B. H., and Tomlinson, G. E. beta-catenin, AXIN1, and AXIN2 in hepatocellular carci (2005). The spectrum of APC mutations in children with nomas and hepatoblastomas. Oncogene 21, 4863-4871. hepatoblastoma from familial adenomatous polyposis kin- 25 Wei, Y., Fabre, M., Branchereau, S., Gauthier, F. Perilongo, dreds. J Pediatr 147,263-266. G., and Buendia, M.A. (2000). Activation of beta-catenin Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., in epithelial and mesenchymal hepatoblastomas. Onco Antonellis, K. J., Scherf, U., and Speed, T. P. (2003). gene 19, 498-504. Exploration, normalization, and Summaries of high density Lustgarten, J. L. et al (2008)—Improving classification per oligonucleotide array probe level data. Biostatistics 4, 249- 30 formance with discretization on biomedical datasets. 264. AMIA 2008 Symposium Proceedings, 445-449

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS : 14 O

SEO ID NO 1 LENGTH: 2014 TYPE: DNA ORGANISM: Homo sapiens FEATURE; OTHER INFORMATION: AFP mRNA FEATURE; NAME/KEY: CDS LOCATION: (31) ... (1854)

<4 OOs SEQUENCE: 1

ccact gccaa taacaaaata actagdalacc atg aag togg gtg gaa toa att titt 54 Met Lys Trp Val Glu Ser Ile Phe 1. 5

tta att tto Cta Cta aat titt act gaa tcc aga aca Ctg cat aga aat 1 O2 Luell Ile Phe Luell Luell Asn Phe Thir Glu Ser Arg Thir Luell His Arg Asn 10 15 2O

gaa tat gga at a gct to c ata ttg gat tot tac Cala tgt act gca gag 150 Glu Tyr Gly Ile Ala Ser Ile Lell Asp Ser Tyr Glin Cys Thir Ala Glu 25 3O 35 4 O

ata agt tta gct gac Ctg gct a CC ata titt titt gcc cag titt gtt caa 198 Ile Ser Lell Ala Asp Luell Ala Thir Ile Phe Phe Ala Glin Phe Wall Glin 45 SO 55

gaa gcc act tac aag gala gta agc a.a.a. atg gtg a.a.a. gat gca ttg act 246 Glu Ala Thir Tyr Lys Glu Wall Ser Lys Met Wall Asp Ala Lieu. Thr 60 65 70

gca att gag a.a.a. CCC act gga gat gaa Cag tot to a 999 tgt tta gaa 294 Ala Ile Glu Pro Thir Gly Asp Glu Glin Ser Ser Gly Cys Lieu. Glu 7s 8O 85

US 9,347,088 B2 99 100 - Continued

Cala gca ttg gca aag cga agc tgc ggc Ct c ttic cag aaa ct a gga gala Glin Ala Luell Ala Lys Arg Ser Cys Gly Lieu. Phe Glin Llys Lieu. Gly Glu 41 O 415 42O tat tac tta Cala aat gcg titt citc. gtt gct tac aca aag aaa goc ccc 350 Tyr Luell Glin Asn Ala Phe Luell Val Ala Tyr Thr Llys Lys Ala Pro 425 43 O 435 44 O

Cag Ctg acc tog tcg gag Ctg atg gcc at C acc aga aaa atg gca gCC 398 Glin Luell Thir Ser Ser Glu Lell Met Ala Ile Thr Arg Llys Met Ala Ala 445 450 45.5 a Ca gca gcc act tgt tgc Cala citc. agt gag gac aaa cta ttg gcc tit 446 Thir Ala Ala Thir Cys Cys Glin Luell Ser Glu Asp Llys Lieu. Lieu Ala Cys 460 465 47 O ggc gag gga gcg gac att att at C gga cac tta tdt atc aga cat gala 494 Gly Glu Gly Ala Asp Ile Ile Ile Gly His Lieu. Cys Ile Arg His Glu 48O 485 atg act CC a gta aac cott ggit gtt ggc cag tigc tigc act tct tca tat 542 Met Thir Pro Wall Asn Pro Gly Wall Gly Glin Cys Cys Thr Ser Ser Tyr 490 495 5 OO gcc aac agg agg C Ca tgc tto agc agc titg gtg gtg gat gala aca tat 590 Ala Asn Arg Arg Pro Cys Phe Ser Ser Leu Val Val Asp Glu. Thir Tyr 5 OS 510 515 52O gtc cott cott gca tto tot gat gac aag titc att titc cat aag gat ctd 638 Wall Pro Pro Ala Phe Ser Asp Asp Llys Phe Ile Phe His Lys Asp Lieu. 525 53 O 535 tgc Cala gct cag ggit gta gcg Ctg caa aca atg aag caa gag titt ct c 686 Glin Ala Glin Gly Wall Ala Luell Gln Thr Met Lys Glin Glu Phe Leu 54 O 545 55 O att aac citt gtg aag Cala aag CC a caa at a aca gag gaa caa citt gag 734 Ile Asn Luell Wall Lys Glin Pro Glin Ile Thr Glu Glu Gln Lieu. Glu 555 560 565 gct gt C att gca gat tto toa ggc Ctg ttg gag aaa to tc caa ggc 782 Ala Wall Ile Ala Asp Phe Ser Gly Lieu. Lieu. Glu Lys Cys Cys Glin Gly st O sfs 58O

Cag gala cag gala gtc tgc titt gct gala gag gga caa aaa citg att to a 83 O Glin Glu Glin Glu Wall Cys Phe Ala Glu Glu Gly Glin Llys Lieu. Ile Ser 585 590 595 6OO a.a.a. act cgt gct gct ttg gga gtt taaatt actt Caggggaaga galagacaaaa 884 Lys Thir Arg Ala Ala Lell Gly Wall 605 cgagtc.tttc attcggtgtgaacttittct c tittaattitta actgatttaa cacttitttgt 944 gaattaatga aatgataaag actitt tatgt gagattt cot tat cacagaa ataaaatatic 2004 tccaaatgtt 2014

SEQ ID NO 2 LENGTH: 608 TYPE : PRT ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: AFP

<4 OOs, SEQUENCE: 2

Met Lys Trp Val Glu Ser Ile Phe Lieu. Ile Phe Leu Lleu. Asn. Phe Thr 1. 5 1O 15

Glu Ser Arg Thr Lieu. His Arg Asn Glu Tyr Gly Ile Ala Ser Ile Lieu. 25 3O

Asp Ser Tyr Glin Cys Thir Ala Glu Ile Ser Lieu Ala Asp Lieu Ala Thr 35 4 O 45

Ile Phe Phe Ala Glin Phe Wall Glin Glu Ala Thr Tyr Lys Glu Val Ser US 9,347,088 B2 101 102 - Continued

SO 55 6 O

Lys Met Wall Lys Asp Ala Lell Thir Ala Ile Glu Pro Thir Gly Asp 65 70

Glu Glin Ser Ser Gly Lell Glu Asn Glin Luell Pro Ala Phe Luell Glu 85 90 95

Glu Luell His Glu Glu Ile Luell Glu Lys Gly His Ser Asp 105 11 O

Ser Glin Ser Glu Gly Arg His ASn Phe Luell Ala His 115 12 O 125

Lys Pro Thir Pro Ser Ile Pro Luell Phe Glin Wall Pro Glu Pro 13 O 135 14 O

Wall Thir Ser Glu Glu Glu Asp Arg Glu Thir Phe Met Asn 145 155 160

Phe Ile Glu Ala Arg Arg His Pro Phe Lell Tyr Ala Pro 1.65

Thir Ile Luell Luell Trp Ala Arg Tyr Asp Ile Ile Pro Ser 18O 185 19 O

Ala Glu Asn Wall Glu Phe Glin Thir Lys Ala Ala Thir 195

Wall Thir Glu Lell Arg Glu Ser Ser Luell Luell Asn Glin His Ala 21 O 215 22O

Ala Wall Met Asn Phe Gly Thir Arg Thir Phe Glin Ala Ile Thir Wall 225 23 O 235 24 O

Thir Luell Ser Glin Phe Thir Wall ASn Phe Thir Glu Ile Glin 245 250 255

Luell Wall Luell Asp Wall Ala His Wall His Glu His Cys Arg Gly 26 O 265 27 O

Wall Luell Asp Lell Glin Asp Gly Glu Lys Ile Met Ser Ile 285

Ser Glin Glin Asp Thir Lell Ser Asn Ile Thir Glu 29 O 295 3 OO

Lell Thir Thir Luell Glu Arg Gly Glin Ile Ile His Ala Glu Asn Asp 3. OS 310 315 32O

Glu Pro Glu Gly Lell Ser Pro Asn Luell ASn Arg Phe Luell Gly Asp 3.25 330 335

Arg Asp Phe Asn Glin Phe Ser Ser Gly Glu Lys Asn Ile Phe Luell Ala 34 O 345 35. O

Ser Phe Wall His Glu Ser Arg Arg His Pro Glin Lell Ala Wall Ser 355 360 365

Wall Ile Luell Arg Wall Ala Lys Gly Tyr Glin Glu Lell Lell Glu 37 O 375

Phe Glin Thir Glu Asn Pro Lell Glu Glin Asp Gly Glu Glu Glu 385 390 395 4 OO

Lell Glin Ile Glin Glu Ser Glin Ala Luell Ala Arg Ser 4 OS 415

Gly Luell Phe Glin Lell Gly Glu Tyr Tyr Luell Glin Asn Ala Phe Luell 42O 425 43 O

Wall Ala Tyr Thir Ala Pro Glin Luell Thir Ser Ser Glu Luell Met 435 44 O 445

Ala Ile Thir Arg Met Ala Ala Thir Ala Ala Thir Glin Luell 450 45.5 460

Ser Glu Asp Lell Lell Ala Gly Glu Gly Ala Asp Ile Ile Ile 465 470 47s 48O

US 9,347,088 B2 107 108 - Continued tgg gt C aac tgc tat gat gtg titt gga gCC Cag toa c cc titt ggt gC 496 Trp Wall Asn Cys Tyr Asp Val Phe Gly Ala Glin Ser Pro Phe Gly Gly 47 O 48O tac aag atg tcg ggg agt gc C9g gag ttg ggc gag tac 999 Ctg cag 544 Tyr Lys Met Ser Gly Ser Gly Arg Glu Lieu. Gly Glu Tyr Gly Lieu. Glin 485 490 495 SOO gca tac act gala gtg aaa act gtC aca gttcaaa gtg cott cag aag aac 592 Ala Tyr Thir Glu Val Lys Thr Val Thr Val Lys Wall Pro Glin Lys Asn 5 OS 510 515 toa taagaatcat gcaa.gcttico tocct cagcc attgatggaa agttcagcaa. 645 Ser gatcagcaac aaaaccaaga aaaatgat co ttgcgtgctg aat atctgaa aagagaaatt titt CCtacala aatctottgg gtcaagaaag ttctagaatt tgaattgata alacatggtgg 765 gttggctgag ggtaagagta tatgaggaac cittittaaacg acaacaatac tgctagottt 825

Caggatgatt tittaaaaaat agattcaaat gtgttatcct citct citgaaa cgct tcc tat 885 aact cq agtt tat aggggaa gaaaaagcta ttgtttacaa ttatato acc attalaggcaa. 945 ctgctacacc ctgctttgta ttctgggcta agatt catta aaaactagot gct citt 2 OO1

SEQ ID NO 4 LENGTH: 517 TYPE : PRT ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: ALDH2

<4 OOs, SEQUENCE: 4

Met Lieu. Arg Ala Ala Ala Arg Phe Gly Pro Arg Lell Gly Arg Arg Lieu. 1. 5 1O 15

Lell Ser Ala Ala Ala Thr Glin Ala Wall Pro Ala Pro Asn Glin Gln Pro 25 3O

Glu Wall Phe Cys Asn Glin Ile Phe Ile Asn. Asn Glu Trp His Asp Ala 35 4 O 45

Wall Ser Arg Lys Thr Phe Pro Thr Wall Asn. Pro Ser Thir Gly Glu Wall SO 55 6 O

Ile Glin Val Ala Glu Gly Asp Lys Glu Asp Wall Asp Ala Wall 65 70 7s 8O

Ala Ala Arg Ala Ala Phe Glin Lieu. Gly Ser Pro Trp Arg Arg Met 85 90 95

Asp Ala Ser His Arg Gly Arg Lieu. Lieu. Asn Arg Lell Ala Asp Lieu. Ile 105 11 O

Glu Arg Asp Arg Thr Tyr Lieu Ala Ala Lieu. Glu Thir Lell Asp Asin Gly 115 12 O 125

Pro Val Ile Ser Tyr Lieu Val Asp Lieu. Asp Met Wall Lieu Lys 13 O 135 14 O

Cys Luell Arg Tyr Tyr Ala Gly Trp Ala Asp Llys His Gly Lys Thr 145 150 155 160

Ile Pro Ile Asp Gly Asp Phe Phe Ser Tyr Thr Arg His Glu Pro Wall 1.65 17O 17s

Gly Wall Gly Glin Ile Ile Pro Trp Asin Phe Pro Lell Luell Met Glin 18O 185 19 O

Ala Trp Lys Lieu. Gly Pro Ala Lieu Ala Thr Gly Asn Wall Wall Wal Met 195 2O5

Wall Ala Glu Glin. Thir Pro Leu Thir Ala Lieu Tyr Wall Ala Asn Luell 21 O 215 22O US 9,347,088 B2 109 110 - Continued

Ile Glu Ala Gly Phe Pro Pro Gly Wall Wall Asn Ile Wall Pro Gly 225 23 O 235 24 O

Phe Gly Pro Thir Ala Gly Ala Ala Ile Ala Ser His Glu Asp Val Asp 245 250 255

Wall Ala Phe Thir Gly Ser Thir Glu Ile Gly Arg Wall Ile Glin Wall 26 O 265 27 O

Ala Ala Gly Ser Ser Asn Lell Lys Arg Wall Thir Lell Glu Luell Gly Gly 285

Ser Pro Asn Ile Ile Met Ser Asp Ala Asp Met Asp Trp Ala Wall 29 O 295 3 OO

Glu Glin Ala His Phe Ala Lell Phe Phe Asn Glin Gly Glin 3. OS 310 315 32O

Ala Gly Ser Arg Thir Phe Wall Glin Glu Asp Ile Asp Glu Phe Wall 3.25 330 335

Glu Arg Ser Wall Ala Arg Ala Ser Arg Wall Wall Gly Asn Pro Phe 34 O 345 35. O

Asp Ser Lys Thir Glu Glin Gly Pro Glin Wall Asp Glu Thir Glin Phe Lys 355 360 365

Ile Luell Gly Tyr Ile Asn Thir Gly Glin Glu Gly Ala Llys Lieu. 37 O 375 38O

Lell Gly Gly Gly Ile Ala Ala Asp Arg Gly Phe Ile Gln Pro 385 390 395 4 OO

Thir Wall Phe Gly Asp Wall Glin Asp Gly Met Thir Ile Ala Glu Glu 4 OS 41O 415

Ile Phe Gly Pro Wall Met Gln Ile Lieu Phe Thr Ile Glu Glu 425 43 O

Wall Wall Gly Arg Ala Asn Asn Ser Thir Gly Lell Ala Ala Ala Wall 435 44 O 445

Phe Thir Asp Lell Asp Lys Ala Asn Luell Ser Glin Ala Lieu. Glin 450 45.5 460

Ala Gly Thir Wall Trp Wall Asn Asp Wall Phe Gly Ala Glin Ser 465 470 48O

Pro Phe Gly Gly Tyr Met Ser Gly Ser Gly Arg Glu Luell Gly Glu 485 490 495

Gly Luell Glin Ala Thir Glu Wall Thir Wall Thir Wall Llys Val SOO 505 51O

Pro Glin Lys Asn Ser 515

SEO ID NO 5 LENGTH: 928 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: APCS mRNA FEATURE: NAME/KEY: CDS LOCATION: (97) . . (765)

< 4 OOs SEQUENCE: 5 gggcatgaat at Cagacgct agggggacag C cactgttgtt gtctgct acc ct catcc togg 6 O t cactgcttctgctataa.ca gcc ctaggcc aggaat atg aac aag cc.g ctg. citt 114 Met Asn Lys Pro Lieu. Luell 1. 5 tgg atc. tct gtc. c.t.c acc agc ct c ctd gaa goc titt gct CaC aca gac 162 Trp Ile Ser Val Lieu. Thir Ser Lieu. Lieu. Glu Ala Phe Ala His Thir Asp 15 2O US 9,347,088 B2 111 112 - Continued citc. agt 999 aag gtg titt gta titt cott aga gaa tot gtt act gat cat 21 O Lell Ser Gly Lys Wall Phe Wall Phe Pro Arg Glu Ser Wall Thir Asp His 25 3O 35 gta aac ttg at C a Ca cc.g Ctg gag aag cott Cta Cag aac titt acc ttg 258 Wall Asn Luell Ile Thir Pro Lell Glu Lys Pro Luell Glin Asn Phe Thir Lieu. 4 O 45 SO tgt titt cga gcc tat agt gat citc. tot cgt gcc tac agc citc. titc. tcc. 3 O 6 Cys Phe Arg Ala Tyr Ser Asp Luell Ser Arg Ala Ser Luell Phe Ser 55 6 O 65 70 tac aat acc Cala ggc agg gat aat gag Cta Cta gtt tat a.a.a. gala aga 3.54 Asn Thir Glin Gly Arg Asp Asn Glu Luell Luell Wall Glu Arg 8O 85 gtt gga gag tat agt Cta tac att gga aga CaC a.a.a. gtt aca to C aaa. 4 O2 Wall Gly Glu Tyr Ser Lell Ile Gly Arg His Wall Thir Ser Lys 90 95 1OO gtt at C gala aag tto cc.g gct CC a gtg CaC atc. tgt gtg agc tgg gag 450 Wall Ile Glu Lys Phe Pro Ala Pro Wall His Ile Cys Wall Ser Trp. Glu 105 11 O 115 t cc to a to a ggt att gaa titt tgg at C aat 999 a Ca cott ttg gtg 498 Ser Ser Ser Gly Ile Glu Phe Trp Ile ASn Gly Thir Pro Lieu Wall 12 O 125 13 O a.a.a. aag ggt Ctg cga ggit tac titt gta gaa gct Cag cc c aag att 546 Lys Lys Gly Luell Arg Gly Phe Wall Glu Ala Glin Pro Lys Ile 135 145 150 gtc Ctg 999 cag gaa gat to c tat 999 ggc aag titt gat agg agc 594 Wall Luell Gly Glin Glu Asp Ser Gly Gly Phe Asp Arg Ser 155 160 1.65

Cag to c titt gtg gga att 999 gat ttg tac atg tgg gac tot gtg 642 Glin Ser Phe Wall Gly Ile Gly Asp Luell Met Trp Asp Ser Wall 17O 17s 18O

Ctg cc c CC a gala aat Ctg tot gcc tat cag ggit a CC cott ctic cott 69 O. Lell Pro Pro Glu Asn Lell Ser Ala Glin Gly Thir Pro Leul Pro 185 19 O 195 gcc aat at C Ctg gac tgg Cag gct Ctg aac tat gaa atc. aga gga tat 738 Ala Asn Ile Luell Asp Trp Glin Ala Luell Asn Tyr Glu Ile Arg Gly Tyr 2OO 2O5 210 gtc at C at C a.a.a. c cc ttg gtg tgg gt C tgaggtottg acticaacgag Wall Ile Ile Lys Pro Lell Wall Trp Wall 215 22O agcacttgaa aatgaaatga Ctgtctaaga gatctggtca aagcaactgg atact agat C 845 ttacatctgc agct ctittct tctittgaatt to citatctgt atgtctgcct aattaaaaaa 905 atatatattg tattatgcta cct 928

SEQ ID NO 6 LENGTH: 223 TYPE : PRT ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: APCS

<4 OOs, SEQUENCE: 6

Met Asn Llys Pro Lieu. Lell Trp Ile Ser Wall Lieu Thir Ser Luell Lieu. Glu 1. 5 15

Ala Phe Ala His Thir Asp Lell Ser Gly Llys Val Phe Wall Phe Pro Arg 25

Glu Ser Val Thr Asp His Wall Asn Lieu. Ile Thr Pro Lell Glu Llys Pro 35 4 O 45

Leul Glin Asn. Phe Thr Lell Phe Arg Ala Tyr Ser Asp Luell Ser Arg US 9,347,088 B2 113 114 - Continued

SO 55 6 O Ala Tyr Ser Lieu. Phe Ser Tyr Asn Thr Glin Gly Arg Asp Asn. Glu Lieu 65 70 7s 8O Lieu Val Tyr Lys Glu Arg Val Gly Glu Tyr Ser Lieu. Tyr Ile Gly Arg 85 90 95 His Llys Val Thr Ser Lys Val Ile Glu Lys Phe Pro Ala Pro Val His 1OO 105 11 O Ile Cys Val Ser Trp Glu Ser Ser Ser Gly Ile Ala Glu Phe Trp Ile 115 12 O 125 Asn Gly Thr Pro Lieu Val Llys Lys Gly Lieu. Arg Glin Gly Tyr Phe Val 13 O 135 14 O Glu Ala Glin Pro Llys Ile Val Lieu. Gly Glin Glu Glin Asp Ser Tyr Gly 145 150 155 160 Gly Llys Phe Asp Arg Ser Glin Ser Phe Val Gly Glu Ile Gly Asp Lieu. 1.65 17O 17s Tyr Met Trp Asp Ser Val Lieu Pro Pro Glu Asn Ile Leu Ser Ala Tyr 18O 185 19 O Glin Gly. Thr Pro Lieu Pro Ala Asn. Ile Lieu. Asp Trp Glin Ala Lieu. Asn 195 2OO 2O5 Tyr Glu Ile Arg Gly Tyr Val Ile Ile Llys Pro Leu Val Trp Val 21 O 215 22O

<210s, SEQ ID NO 7 211 LENGTH: 577 &212s. TYPE: DNA <213> ORGANISM; Homo sapiens 22 Os. FEATURE: 223 OTHER INFORMATION: APOC4 mRNA 22 Os. FEATURE: <221s NAME/KEY: CDS <222s. LOCATION: (5) ... (385)

<4 OO > SEQUENCE: 7 agaia atg tcc ctic ct c aga aac agg ctic cag gcc ctg. cct gcc Ctg to 49 Met Ser Lieu. Lieu. Arg Asn Arg Lieu. Glin Ala Lieu Pro Ala Lieu. Cys 1. 5 1O 15

Ct c tic gtg ctg gtc. Ctg gcc tic att ggg gca to cag cca gag gCC 97 Lieu. CyS Val Lieu Val Lieu Ala Cys Ile Gly Ala Cys Glin Pro Glu Ala 2O 25 3O

Cag gala gga acc Ctg agc ccc cca cca aag cta aag atg agt cqc togg 145 Gln Glu Gly Thr Lieu Ser Pro Pro Pro Llys Lieu Lys Met Ser Arg Trp 35 4 O 45 agc Ctg gtg agg gC agg atg aag gag ctg. Ctg gag aca gtg gttgaac 193 Ser Lieu Val Arg Gly Arg Met Lys Glu Lieu. Lieu. Glu Thr Val Val Asn SO 55 60 agg acc aga gac ggg togg caa tig titc tig agc ccg agc acc tt C cqg 241 Arg Thr Arg Asp Gly Trp Gln Trp Phe Trp Ser Pro Ser Thr Phe Arg 65 70 7s ggc titc atg cag acc tact at gaC gaC cac ctg agg gac ctg ggit cog 289 Gly Phe Met Glin Thr Tyr Tyr Asp Asp His Lieu. Arg Asp Lieu. Gly Pro 8O 85 9 O 95

Ct c acc aag gCC tig titc Ct c gala to C aaa gac agc ctic titg aag aag 337 Lieu. Thir Lys Ala Trp Phe Lieu. Glu Ser Lys Asp Ser Lieu. Lieu Lys Llys 1 OO 105 11 O acc cac agc ctg. tcc ccc agg Ctt gt c tet ggg gac aag gaC cag ggit 385 Thir His Ser Lieu. Cys Pro Arg Lieu Val Cys Gly Asp Lys Asp Glin Gly 115 12 O 125 taaaatgttc ataaaa.gc.ca ggtgtggttg tdgcgggtgc Ctgtagt ccc agctact cag 445 US 9,347,088 B2 115 116 - Continued gaggctgagg taggatgatg gcttgagc cc aggagttcga gaccagcct g g gcaa.ca cag 505 cgagat ct ct toggggg taaa acaaaaagaa aaaaaaaagt toatact tct c caataaata 565 aagttct cacc td 577

<210s, SEQ ID NO 8 &211s LENGTH: 127 212. TYPE: PRT <213> ORGANISM: Homo sapiens 22 Os. FEATURE: 223 OTHER INFORMATION: APOC4

<4 OOs, SEQUENCE: 8 Met Ser Lieu. Lieu. Arg Asn Arg Lieu. Glin Ala Lieu Pro Ala Lieu. Cys Lieu. 1. 5 1O 15 Cys Val Lieu Val Lieu Ala Cys Ile Gly Ala Cys Glin Pro Glu Ala Glin 2O 25 3O Glu Gly Thr Lieu Ser Pro Pro Pro Llys Lieu Lys Met Ser Arg Trp Ser 35 4 O 45 Lieu Val Arg Gly Arg Met Lys Glu Lieu. Lieu. Glu Thr Val Val Asn Arg SO 55 6 O Thr Arg Asp Gly Trp Gln Trp Phe Trp Ser Pro Ser Thr Phe Arg Gly 65 70 7s 8O Phe Met Glin Thir Tyr Tyr Asp Asp His Lieu. Arg Asp Lieu. Gly Pro Lieu. 85 90 95 Thir Lys Ala Trp Phe Lieu. Glu Ser Lys Asp Ser Lieu Lleu Lys Llys Thr 1OO 105 11 O His Ser Lieu. Cys Pro Arg Lieu Val Cys Gly Asp Lys Asp Glin Gly 115 12 O 125

<210s, SEQ ID NO 9 &211s LENGTH: 28O1 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <223> OTHER INFORMATION: AOP9 mRNA 22 Os. FEATURE: <221s NAME/KEY: CDS <222s. LOCATION: (186) ... (1070)

<4 OOs, SEQUENCE: 9 ccaccagaag acgattaa.gc cacagcct ct aattggaacg gcatttgtac agt cagagac 6 O t cittaccaga catcto cagg aatctgtgag ccattgtcaa aacgt.ccatt ttcatctggc 12 O tgtgaaagtg aggaccacaa Cagg taggta ttggtagaala Caggagt cct cagagaa.gc.c 18O cCaag atg cag cct gag gga gca gala aag gga aaa agc titc aag cag aga 23 O Met Glin Pro Glu Gly Ala Glu Lys Gly Llys Ser Phe Lys Glin Arg 1. 5 10 15

Ctg gt C titg aag agc agc tita gCd aaa gala acc Ctc. tct gag tt C ttg 278 Lieu Val Lieu Lys Ser Ser Lieu Ala Lys Glu Thir Lieu. Ser Glu Phe Lieu. 2O 25 3O ggc acg tt C at C titg att gtc. Ctt gga tigt ggc tigt gtt gcc caa gCt 326 Gly. Thir Phe Ile Lieu. Ile Val Lieu. Gly Cys Gly Cys Val Ala Glin Ala 35 4 O 45 att Ct c agt ca gga cqt titt gga ggg gtC atc act atc aat gtt gga 374 Ile Lieu. Ser Arg Gly Arg Phe Gly Gly Val Ile Thir Ile Asin Val Gly SO 55 60 titt to a atg gCa gtt gca atg gcc att tat gtg gct ggc ggit gtc. tct 422 Phe Ser Met Ala Val Ala Met Ala Ile Tyr Val Ala Gly Gly Val Ser 65 70 7s

US 9,347,088 B2 119 120 - Continued tcaaatgitat titt cottaatt gcc cacttga gaacagaCat ttgacaagtt at atcaacga 188O ctgtgcttgt c catt attitt acacatgc cc tagaa.gc.caa aactgaaagc cactggat.cc 1940 tggit ctagot gaatct tcag agtgggaggt ctic caaaaag at attacCtt attgggctta 2 OOO acaatt Caca agg cacttitc acaccoatta totaatttala to Ctcataat gactatgtga ggcaaatgcc acattgcc.ca tttitt cagat aaagaaacaa aatct taggg aagataagtt 212 O gagttgtc.ca agagcacact gaaagttgaa tgttatctaa tgcattcctic tacctitt cag 218O aagat cagta gctggctgac aatctittgcc aaatctt CCt tgctago Cag aagtggaatt 224 O ggcagottct agaatatgta cacct ctdga caaaatgttc Ctcaatctta agatacaaag 23 OO accct cattg tctgggtcta titcCCaCact tactgagtac agatgaagga aagtggtagc 2360 aatttaat Ca talactitt Cat ttgctgaaaa acattatgag aaggcct coc titcctaa.gc.c 242 O acct ctdgto ttgctaagtic ttgat cittgc titcct gccag Caccalaa Cat tacattcagg 248O ggattt cotc tggcticagtic ttt toccott gaagttctict aatagatgtt acttittgaca 254 O aaagat.cgc.c tatgagttac aag caccagg ggatgcticta Cat Caaggga tgcacct tca 26 OO gtcaaactgt caaaaag.ccc agaattic.cca aaggcattag gtttcc.caac tgctttgtgc 266 O tgatat caga acagcagaaa ttaaatgtga aatgtttctg atgacittatg ttctacaat C 272 O tatgga cata cgggatttitt tttitcttgct ttgaagctac ctggatattt cct atttgaa 2780 ataaaattgt tcggtcattg t

SEQ ID NO 10 LENGTH: 295 TYPE : PRT ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: AOP9

<4 OOs, SEQUENCE: 10 Met Glin Pro Glu Gly Ala Glu Lys Gly Lys Ser Phe Glin Arg Lieu. 1. 5 1O 15

Wall Luell Lys Ser Ser Lieu Ala Lys Glu Thir Lieu. Ser Glu Phe Lieu. Gly 25

Thir Phe Ile Lieu. Ile Val Lieu. Gly Wall Ala Glin Ala Ile 35 4 O 45

Lell Ser Arg Gly Arg Phe Gly Gly Wall Ile Thr Ile Asn Wall Gly Phe SO 55 6 O

Ser Met Ala Wall Ala Met Ala Ile Tyr Val Ala Gly Gly Wall Ser Gly 65 70 7s 8O

Gly His Ile Asn. Pro Ala Wal Ser Lieu Ala Met Lell Phe Gly Arg 85 90 95

Met Trp Phe Lys Lieu Pro Phe Tyr Val Gly Ala Glin Phe Lieu. Gly 105 11 O

Ala Phe Wall Gly Ala Ala Thr Val Phe Gly Ile Tyr Asp Gly Lieu. 115 12 O 125

Met Ser Phe Ala Gly Gly Llys Lieu. Lieu. Ile Wall Gly Glu Asn Ala Thr 13 O 135 14 O

Ala His Ile Phe Ala Thr Tyr Pro Ala Pro Tyr Lell Ser Luell Ala Asn 145 150 155 160

Ala Phe Ala Asp Glin Val Val Ala Thir Met Ile Lell Lell Ile Ile Wall 1.65 17O 17s

Phe Ala Ile Phe Asp Ser Arg Asn Lieu. Gly Ala Pro Arg Gly Lieu. Glu US 9,347,088 B2 121 122 - Continued

18O 185 19 O

Pro Ile Ala Ile Gly Lell Lell Ile Ile Wall Ile Ala Ser Ser Lieu. Gly 195 2OO

Lell Asn Ser Gly Cys Ala Met Asn Pro Ala Arg Asp Lell Ser Pro Arg 21 O 215

Lell Phe Thir Ala Lell Ala Gly Trp Gly Phe Glu Wall Phe Arg Ala Gly 225 23 O 235 24 O

Asn Asn Phe Trp Trp Ile Pro Wall Wall Gly Pro Lell Wall Gly Ala Wall 245 250 255

Ile Gly Gly Luell Ile Wall Luell Wall Ile Glu Ile His His Pro Glu 26 O 265 27 O

Pro Asp Ser Wall Phe Thir Glu Glin Ser Glu Asp Lys Pro Glu Lys 27s 28O 285

Glu Luell Ser Wall Ile Met 29 O 295

SEQ ID NO 11 LENGTH: 3445 TYPE: DNA ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: BUB1 mRNA FEATURE: NAME/KEY: CDS LOCATION: (58) ... (3312)

< 4 OOs SEQUENCE: 11

Cgg.cggct to tagtttgcgg tt Caggitttg gcc.gctg.ccg gcc agcgtcc totggCC 57 atg gac acc cc.g gaa aat gtc citt cag atg citt gaa gcc CaC atg cag 105 Met Asp Thir Pro Glu Asn Wall Luell Glin Met Luell Glu Ala His Met Glin 1. 5 1O 15 agc tac aag ggc aat gac cott citt ggt gala tgg gaa aga at a cag 153 Ser Lys Gly Asn Asp Pro Luell Gly Glu Trp Glu Arg Ile Glin 2O 25 tgg gta gala gag aat titt cott gag aat a.a.a. gaa tac ttg act tta Trp Wall Glu Glu Asn Phe Pro Glu Asn Glu Lell Thir Lieu. 35 4 O 45

Cta gala Cat tta atg aag gaa titt tta gat aag aag a.a.a. tac CaC aat 249 Lell Glu His Luell Met Lys Glu Phe Luell Asp Lys Lys yr His Asn SO 55 6 O gac CC a aga ttic atc. agt tat tgt tta a.a.a. titt gct gag tac aac agt 297 Asp Pro Arg Phe Ile Ser Cys Luell Phe Ala Glu Asn. Ser 65 70 7s 8O gac citc. Cat Cala titt titt gag titt Ctg tac aac Cat 999 gga acc 345 Asp Luell His Glin Phe Phe Glu Phe Luell Tyr ASn His Gly Gly Thr 85 90 95

Ctg to a to c cott Ctg tac att gcc tgg gcg 999 Cat Ctg gala gcc caa 393 Lell Ser Ser Pro Lell Ile Ala Trp Ala Gly His Lell Glu Ala Glin 1OO 105 11 O gga gag Ctg cag Cat gcc gct gt C citt cag aga gga att Cala aac 441 Gly Glu Luell Glin His Ala Ser Ala Wall Luell Glin Arg Gly Ile Glin Asn 115 12 O 125

Cag gct gala cc c aga gag tto Ctg Cala Cala Cala tac agg tta titt cag 489 Glin Ala Glu Pro Arg Glu Phe Luell Glin Glin Glin Tyr Arg Luell Phe Glin 13 O 135 14 O a Ca cgc citc. act gaa a CC Cat ttg CC a gct Cala gct aga acc to a gaa s37 Thir Arg Luell Thir Glu Thir His Luell Pro Ala Glin Ala Arg Thir Ser Glu 145 150 155 160 cott Ctg Cat aat gtt Cag gtt tta aat Cala atg ata a Ca to a aaa toa 585

US 9,347,088 B2 125 126 - Continued cott gat att tot gat gac a.a.a. gat gala tgg Cala tot Cta gat Cala aat 545 Pro Asp Ile Ser Asp Asp Asp Glu Trp Glin Ser Lell Asp Glin Asn 485 490 495 gaa gat gca titt gaa gcc Cag titt Cala a.a.a. aat gta agg to a tot 999 593 Glu Asp Ala Phe Glu Ala Glin Phe Glin Lys ASn Wall Arg Ser Ser Gly SOO 505 51O gct tgg gga gt C aat aag atc. at C tot tot ttg toa tot gct titt Cat 641 Ala Trp Gly Wall Asn Lys Ile Ile Ser Ser Luell Ser Ser Ala Phe His 515 52O 525 gtg titt gala gat gga aac a.a.a. gala aat tat gga tta C Ca cag cott a.a.a. 689 Wall Phe Glu Asp Gly Asn Lys Glu Asn Tyr Gly Lell Pro Glin Pro 53 O 535 54 O aat a.a.a. cc c aca gga gcc agg acc titt gga gaa cgc tot gtC agc aga 737 Asn Pro Thir Gly Ala Arg Thir Phe Gly Glu Arg Ser Wall Ser Arg 5.45 550 555 560 citt cott to a a.a.a. C Ca aag gag gala gtg cott Cat gct gaa gag titt ttg Lell Pro Ser Pro Lys Glu Glu Wall Pro His Ala Glu Glu Phe Luell 565 st O sts gat gac to a act gta tgg ggit att cgc tgc aac a.a.a. a CC Ctg gca cc c 833 Asp Asp Ser Thir Wall Trp Gly Ile Arg Cys ASn Thir Lel Ala Pro 58O 585 59 O agt cott aag agc C Ca gga gac ttic aca tot gct gca Cala citt gcg tot 881 Ser Pro Lys Ser Pro Gly Asp Phe Thir Ser Ala Ala Glin Lel Ala Ser 595 6OO 605 a Ca CC a ttic CaC aag citt C Ca gtg gag to a gtg CaC att tta gala gat 929 Thir Pro Phe His Lys Lell Pro Wall Glu Ser Wall His Ile Lel Glu Asp 610 615 62O

gala aat gtg gta gCa Cag tgt acc Cag act ttg gat tect 977 Lys Glu Asn Wall Wall Ala Glin Cys Thir Glin Thir Lel Asp Ser 625 630 635 64 O tgt gag gala aac atg gtg gtg cott to a agg gat gga a.a.a. ttic agt CC a 2O25 Cys Glu Glu Asn Met Wall Wall Pro Ser Arg Asp Gly Phe Ser Pro 645 650 655 att Cala gag a.a.a. agc C Ca a.a.a. cag gcc ttg tog tot CaC atg tat to a Ile Glin Glu Lys Ser Pro Glin Ala Luell Ser Ser His Met Ser 660 665 67 O gca to c tta citt cgt Ctg agc cag cott gct gca ggit 999 gta citt acc 21.21 Ala Ser Luell Luell Arg Lell Ser Glin Pro Ala Ala Gly Gly Wall Luell Thir 675 68O 685

gag gca gag ttg ggc gtt gag gct tgc aga citc. a Ca gac act gac 21.69 Glu Ala Glu Lell Gly Wall Glu Ala Cys Arg Lell Thir Asp Thir Asp 69 O. 695 7 OO gct gcc att gca gaa gat C Ca CC a gat gct att gct 999 citc. Cala gca 2217 Ala Ala Ile Ala Glu Asp Pro Pro Asp Ala Ile Ala Gly Luell Glin Ala 7 Os 71O 71s 72O gaa tgg atg cag atg toa citt 999 act gtt gat gct CC a aac ttic 2265 Glu Trp Met Glin Met Ser Ser Luell Gly Thir Wall Asp Ala Pro Asn Phe 72 73 O 73 att gtt 999 aac C Ca tgg gat gat aag att tto a.a.a. citt tta tot 231.3 Ile Wall Gly Asn Pro Trp Asp Asp Lys Luell Ile Phe Luell Luell Ser 740 74. 7 O

999 citt tot a.a.a. C Ca gtg to c CC a aat act titt gala tgg Cala 2361 Gly Luell Ser Pro Wall Ser Ser Pro ASn Thir Phe Glu Trp Glin 7ss 760 765

a.a.a. citt CC a gcc atc. aag cc c act gaa titt Cala ttg ggt tot 24O9 Lys Luell Pro Ala Ile Lys Pro Thir Glu Phe Glin Luell Gly Ser 770 775 78O aag Ctg gt C tat gtc Cat CaC citt citt gga gaa gga gcc titt gcc cag 2457 Lys Luell Wall Tyr Wall His His Luell Luell Gly Glu Gly Ala Phe Ala Glin 78s 79 O 79. 8OO US 9,347,088 B2 127 128 - Continued gtg tac gala gct acc Cag gga gat Ctg aat gat gct a.a.a. aat a.a.a. cag 2505 Wall Glu Ala Thr Glin Gly Asp Luell Asn Asp Ala Asn Lys Glin 805 810 815 a.a.a. titt gtt tta aag gtc Cala aag cott gcc aac c cc tgg gala ttic tac 25.53 Phe Wall Lieu Lys Wall Glin Lys Pro Ala ASn Pro Trp Glu Phe 82O 825 83 O att 999 acc cag ttg atg gaa aga Cta aag CCa tot atg cag CaC atg Ile Gly Thir Gln Lieu. Met Glu Arg Luell Lys Pro Ser Met Glin His Met 835 84 O 845 titt atg aag ttic tatt tot gcc CaC tta ttic cag aat ggc agt gta tta 2649 Phe Met Lys Phe Tyr Ser Ala His Luell Phe Glin Asn Gly Ser Wall Luell 850 855 860 gta gga gag ct c tact agc gga aca tta tta aat gcc att aac citc. 2697 Wall Gly Glu Leu Tyr Ser Gly Thir Luell Luell Asn Ala Ile Asn Luell 865 87O 87s 88O tat a.a.a. aat acc Cot gaa gtg atg cott Cala ggit citt gtC at C tot 2745 Asn Thir Pro Glu Wall Met Pro Glin Gly Lell Wall Ile Ser 885 890 895 titt gct atg aga atg citt atg att gag Cala gtg Cat gac tgt gala 2793 Phe Ala Met Arg Met Lell Met Ile Glu Glin Wall His Asp Cys Glu 9 OO 905 91 O atc. att Cat gga gac att CC a gac aat titc ata citt gga aac gga 2841 Ile Ile His Gly Asp Ile Pro Asp Asn Phe Ile Lell Gly Asn Gly 915 92 O 925 titt ttg gala cag gat gat gaa gat gat tta tot gct ggc ttg gca Ctg 2889 Phe Luell Glu Glin Asp Asp Glu Asp Asp Luell Ser Ala Gly Luell Ala Luell 93 O 935 94 O att gac Ctg ggt Cag ata gat atg a.a.a. citt titt C Ca a.a.a. gga act 2937 Ile Asp Luell Gly Glin Ser Ile Asp Met Luell Phe Pro Gly Thir 945 950 955 96.O ata ttic aca gca aag gaa aca tot ggt titt Cag gtt gag atg 2985 Ile Phe Thir Ala Lys Glu Thir Ser Gly Phe Glin Wall Glu Met 965 97O 97. citc. agc aac a.a.a. C. Ca tgg aac tac cag at C gat tac titt 999 gtt gct Lell Ser Asn Llys Pro Trp Asn Tyr Glin Ile Asp Tyr Phe Gly Wall Ala 98O 985 99 O gca aca gta tat togc atg citc. titt ggc act tac atgaaa gtg aaa aat Ala Thir Wall Tyr Cys Met Lell Phe Gly Thr Tyr Met Lys Wall Lys Asn 995 1OOO gaa gga gga gag tdt aag cct gala ggt citt titt aga agg citt cott 3126 Glu Gly Gly Glu Cys Llys Pro Glu Gly Luell Phe A. Arg Lell Pro O1O O1 5

Cat gat atg td aat gala titt titt Cat gtt a ttg aat att 3171 His Asp Met Trp Asn. Glu Phe Phe His Wal M Luell Asn Ile

C Ca tgt Cat Cat citt coa tot ttg gat ttg t agg Cala aag 3216 Pro Cys His His Leul Pro Ser Luell Asp Luell I Arg Glin Lys

Ctg a.a.a. gta titt Cala Cala CaC tat act aac a att agg gcc 3.261 Lell Wall Phe Glin Gln His Thir Asn Llys Ile Arg Ala

Cta aat agg ct a att gta Ctg citc. tta gaa tot aag cgt toa 3306 Lell Asn Arg Lieu. Ile Val Luell Luell Luell Glu Cys Lys Arg Ser cga taaaatttgg atataga cag to cittaaaaa t cacactgta aatatgaatc 3362 Arg tgct cactitt aaacctgttt ttttitt catt tattgttitat gtaaatgttt gttaaaaata 3422