A r t i c l e s

Transposon mutagenesis identifies driving hepatocellular carcinoma in a chronic hepatitis B mouse model

Emilie A Bard-Chapeau1, Anh-Tuan Nguyen1, Alistair G Rust2, Ahmed Sayadi1, Philip Lee3, Belinda Q Chua1, Lee-Sun New4, Johann de Jong5, Jerrold M Ward1, Christopher K Y Chin1, Valerie Chew6, Han Chong Toh7, Jean-Pierre Abastado6, Touati Benoukraf 8, Richie Soong8, Frederic A Bard1, Adam J Dupuy9, Randy L Johnson10, George K Radda3, Eric Chun Yong Chan4, Lodewyk F A Wessels5, David J Adams2, Nancy A Jenkins1,11,12 & Neal G Copeland1,11,12

The most common risk factor for developing hepatocellular carcinoma (HCC) is chronic infection with hepatitis B virus (HBV). To better understand the evolutionary forces driving HCC, we performed a near-saturating transposon mutagenesis screen in a mouse HBV model of HCC. This screen identified 21 candidate early stage drivers and a very large number (2,860) of candidate later stage drivers that were enriched for genes that are mutated, deregulated or functioning in signaling pathways important for human HCC, with a striking 1,199 genes being linked to cellular metabolic processes. Our study provides a comprehensive overview of the genetic landscape of HCC.

Nearly 500,000 people are diagnosed with HCC each year, and their genome-wide association studies map to these distal enhancers9, rais- overall 5-year survival rate is below 12%. The highest incidence of ing the possibility that noncoding mutations in these distal elements HCC is in regions in which infection with HBV is endemic, and men might also substantially contribute to cancer and potentially explain- are two to four times more likely to develop HCC than women. HCC ing why some tumors have few or no mutated cancer genes10,11, even related to infection with HBV has also become the fastest-rising cause after extensive genome characterization. Tumors with a paucity of

Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature of cancer-related death in the United States during the past two dec- mutated genes might also have mutations in very infrequently mutated ades. Although the use of emerging sequencing and genomics tech- genes that we do not yet have the statistical power to detect. nologies has identified many mutated and/or differentially expressed One method for identifying these missing cancer genes, as well as genes in HCC, these techniques have also uncovered a surprising to validate the hundreds of candidate cancer genes already described, npg amount of intratumor and intertumor heterogeneity1–5. For these rea- is through comparative genomics involving transposon-based inser- sons, and because important DNA mutations can be concealed among tional mutagenesis12. It has recently become possible to mobilize the the large number of passenger mutations present in these tumors, Sleeping Beauty (SB) transposon in essentially any mouse tissue at it has been difficult to identify the complete complement of driver high enough frequencies to induce virtually any kind of cancer13–15. genes for HCC. Epigenetic silencing of tumor suppressor genes also Mutagenic SB transposons carry a strong promoter for activating frequently occurs in tumors6,7. This fact, combined with recent stud- proto-oncogenes and transcriptional stop cassettes for inactivating ies showing that there may be thousands of haploinsufficient tumor tumor suppressor genes, and the transposons therefore tag cancer genes suppressor genes8, makes the identification of all driver genes for in tumor cells. Human tumor genomes are complex, with multiple HCC even more difficult. Published reports from the Encyclopedia operative mutagenic processes. By contrast, transposons tag cancer of DNA Elements (ENCODE) project have also identified millions of genes directly, thus facilitating their identification. functional elements, many of which are transcription factor binding Here we sought to obtain a comprehensive list of genes that are sites that regulate the expression of genes often located hundreds of functionally necessary to trigger HCC by performing a large-scale kilobases away9. Nearly 70% of disease-associated SNPs identified in SB transposon mutagenesis screen13,14. Because the major etiology of

1Institute Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Biopolis, Singapore. 2Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. 3Clinical Imaging Research Centre, National University of Singapore, Centre for Translation Medicine, Singapore Bioimaging Consortium, Singapore. 4Department of Pharmacy, Faculty of Science, National University of Singapore, Singapore. 5Department of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, The Netherlands. 6Singapore Immunology Network (SIgN), A*STAR, Biopolis, Singapore. 7National Cancer Centre, Singapore. 8Cancer Science Institute of Singapore, National University of Singapore, Singapore. 9Department of Anatomy and Cell Biology, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA. 10Department of Biochemistry and Molecular Biology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA. 11Present address: The Methodist Hospital Research Institute, Houston, Texas, USA. 12These authors contributed equally to this work. Correspondence should be addressed to N.G.C. ([email protected]). Received 21 May; accepted 8 November; published online 8 December 2013; doi:10.1038/ng.2847

24 VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics A r t i c l e s

human HCC is chronic infection with HBV16, we mobilized SB in the ­identified because of local transposon hopping, and we therefore livers (liver-SB)17 of transgenic mice predisposed to develop HCC as a excluded CISs on this ). A -centric CIS-calling result of expression of the toxic HBV surface antigen (HBsAg) in their method (gCIS) that looks for a higher density of transposon inser- livers (called liver-SB/HBsAg mice)18. By using this mouse model, tions within the coding regions of all RefSeq genes than predicted we aimed to identify genes that could cooperate with HBV-induced by chance20 identified 2,525 gCIS genes (Supplementary Table 1a), liver inflammation in the induction of HCC, which is similar to that whereas a non–gene centric Gaussian kernel convolution (GKC) occurring in most human HCC. method21 that looks for a higher density of insertions within fixed kernel widths of 15–240 kb (ref. 22) identified 2,041 CISs containing RESULTS 2,103 genes (Supplementary Table 1b). There was an 83% overlap HBV-associated HCC mouse model for mutagenesis screening between the CISs identified by these two methods (P < 2 × 10−16, χ2 Liver-SB/HBsAg transgenic mice develop chronic liver inflamma- with Yates correction), and together they identified 2,871 CIS loci tion with associated reactive hyperplasia (Fig. 1a and Supplementary containing 2,881 genes (Supplementary Table 1c and Supplementary Fig. 1), hepatocytomegaly and ground glass hepatocytes (Fig. 1b), Fig. 5a). In vitro transposition cell culture assays have suggested that similarly to human chronic hepatitis B and the previously charac- SB is a random insertional mutagen and that the only requirement for terized HBsAg mice18. Liver-SB/HBsAg modifications lead to the insertion is a TA dinucleotide12. Although the GKC method scans the appearance of preneoplastic foci, which are visible from 19.7 weeks entire cancer genome for CISs, most CISs were located within or in of age, followed by hepatocellular adenoma and trabecular HCC close proximity to genes, providing additional evidence that SB tar- (Supplementary Fig. 2). SB induced a cooperative tumorigenic gets the coding regions of genes that confer a selective advantage for effect with HBsAg, as liver-SB/HBsAg mice displayed reduced sur- tumor growth. gCISs not identified by GKC often contained very large vival (Fig. 1c). We also noted a tendency toward larger numbers of (>300 kb) or small (<10 kb) genes and were probably missed in part tumors (Supplementary Fig. 2) and more advanced-stage disease because of the use of fixed kernel widths (Supplementary Fig. 5b,c). compared to HBsAg mice alone (Fig. 1d). To estimate the genetic coverage of this screen, we randomly To identify the genes mutated by SB that cooperate with HBsAg- selected subgroups of 10–220 tumors from the 228 total tumors and associated inflammation in tumor induction, we PCR amplified and then used GKC to identify the CISs for each subgroup. We then plot- sequenced the transposon insertion sites from 250 tumors harvested ted the number of tumors in each subgroup against the genomic base from 34 mice19, which yielded 328,687 sequence reads and identified pairs covered by all the CISs identified for each subgroup. The curve an average of 1,315 unique transposon insertion sites per tumor. By plateaued before reaching 100 tumors, indicating that the screen comparing the location of the transposon insertions in all tumors, was approaching saturation with as few as a 100 tumors (Fig. 2a,b). we found a few tumors that were genetically related. We removed Adding more tumors merely identified additional CISs of lower these tumors, leaving 228 genetically unrelated tumors, which frequency (Supplementary Figs. 6 and 7). Consistent with this notion we subsequently used for downstream analysis (Supplementary of saturation, this screen identified most of the CIS genes found pre­ Figs. 3 and 4). viously in much smaller-scale HCC transposon screens performed in p53 mutant17 and Sav1-deficient mice (Supplementary Fig. 8). Large-scale screen identifies common insertion sites To our knowledge, this is the first saturating transposon screen for We then screened for common insertion sites (CISs), which are cancer genes reported for mice. regions in cancer genomes that contain a higher density of transposon Nearly 68% of the CIS genes were mutated in <20% of the tumors, Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature insertions than predicted by chance and are therefore likely to con- and each tumor contained 387 mutated CIS genes on average tain a cancer gene (CISs on chromosome 1 could not be conclusively (Fig. 2c,d). These findings are indicative of extensive intratumor npg Figure 1 Liver-SB/HBsAg mice have chronic a b d liver inflammation and accelerated formation HCC and HCA No tumor of HCC. (a) Preneoplastic hepatocellular focus HCA only 100 (large arrow) of hyperplastic hepatocytes, 11% with inflammatory cells (small arrows), in a 25% 22.8-week-old liver-SB/HBsAg male mouse. 80 44% (b) Hepatocytes with endoplasmic reticulum 44% inclusions, so-called ground glass hepatocytes, 100 µm 200 µm 60 as are seen in human HBV-induced hepatitis. The sample shown is from a 20.4-week-old 50% c Female mice Male mice liver-SB/HBsAg male mouse. (c) Kaplan-Meier 100 100 40 33% survival curves for male and female mice of all Male mice (%) 75 75 four combinations of genotypes: liver-SB/HBsAg SB/HBV, n = 52 SB/HBV, n = 66 45% 50 SB, n = 69 50 20 (SB/HBV), liver-SB (SB), HBsAg transgene SB, n = 70 HBV, n = 6 25% 23% (HBV) and littermate control mice carrying an 25 25 HBV, n = 10 Survival (%) Control, n = 14 Control, n = 9 inactive transposon (no transposase) and no 0 0 0 HBsAg transgene (control). The median survival 50250 10075 50250 10075 Age (weeks) Age (weeks) for females was 79.1 weeks for liver-SB/HBsAg HBsAg n = 9 Liver-SB mice and 103.7 weeks for liver-SB mice n = 10 n = 18 −4 (log-rank test P < 10 ). The median survival for Liver-SB/HBsAg males was 70.4 weeks for liver-SB/HBsAg mice, 94.9 weeks for liver-SB mice and 88.4 weeks for HBsAg mice (log-rank test P < 10−4). (d) Summary of histopathology performed on livers of moribund male mice of various phenotypes. We observed more HCC and hepatocellular adenoma (HCA) in the liver-SB/HBsAg class than in the two classes liver-SB and HBsAg combined. Although a trend is visible, it was not significant (P = 0.086, one-sided Fisher’s exact test on a contingency table). The overall penetrance for hepatic tumorigenesis in liver-SB/HBV mice was 77%.

Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014 25 A r t i c l e s

a b c d 2.5 × 108 700 1.2 × 109 8 80 600 2.0 × 10 1.0 × 109 500 8 8 60 1.5 × 10 8.0 × 10 Mean = 386.8 CIS 8 400 8 6.0 × 10 40 insertions per tumor 1.0 × 10 8 300

4.0 × 10 per tumor 7 8 20 200 5.0 × 10 2.0 × 10 Genomic bases covered

Tumors with the CIS (%) 100 Genomic bases covered

0 Number of CIS insertions 50 100 150 200 50 100 150 200 0 500 1,000 1,500 2,000 2,500 0 Number of tumor samples Number of tumor samples 2,871 CISs sorted by frequency of occurence in tumors

Figure 2 Identification of driver e No insertion in CIS gene SB insertion in CIS gene genes for HCC in the liver-SB/HBV Fraction of insertions screen. (a,b) CISs were called on DNA strand Adk Negative Positive Sequencing read counts averages Tumors with insertion in CIS (%) using the GKC method from the Dpyd 1 0.5 0 0.5 1 100 10 0 50 100 Pard3 indicated numbers of randomly Nfia Zbtb20 chosen liver-SB/HBV tumors. Magi1 Iqgap2 The resulting CIS genomic loci Snd1 were overlapped with the CIS loci Ankrd17 Gsk3b from the 228 tumors. The numbers Mll5 Ghr of tumor samples used were Wac Pten plotted against the genomic bases Man2a1 Setd2 covered. For each combination Arid1a Rtl Chr12 locus of tumors, the median values Zfp106 Fam105b and 25th and 75th percentiles Sav1 20 were plotted. (a) Near saturation of the screen is seen with a 10 theoretical asymptote (in orange) 0 at 248,023,943 bases with an increasing number of samples. Number of insertions in driver genes per tumor The percentage saturation obtained with 100 samples is 75.4%. (b) Plot showing the data generated from randomized samples; the straightness of the line indicates no saturation. (c) Distribution of all 2,871 CISs according to the percentage of time they are mutated in tumors. When a CIS gene was identified using both gCIS and GKC, the average percentage was used. Only 79 CISs were mutated in more than 50% of the tumors, whereas 638 CISs were mutated in more than 25% of the tumors. (d) Number of transposon insertions found in CIS loci in each tumor (represented as dots). The mean (horizontal line) is 387 ± 118 (s.d.) CISs targeted per tumor. (e) Clustering in all 228 tumors of the 21 HCC driver genes with highest sequencing read counts and frequencies of occurrence in tumors. No statistically significant genetic correlations were found among the 21 driver genes.

­heterogeneity that likely resulted from branching tumor evolution, operate in cooperative networks to maximize proliferative fitness25, similar to that reported for several human cancers23, combined with which supports other studies indicating that even partial inactivation the large mutational load induced by SB. In this HCC model, every of tumor suppressor genes can contribute to tumorigenesis8. Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature liver cell contains ~350 copies of the transposon13, which, when Branching tumor evolution complicates efforts to implement per- induced to transpose, can transpose over and over again. In addition, sonalized medicine and suggests that targeted therapies might be because the transposon insertion sites are PCR amplified before being directed to genes that are mutated at the trunk of the evolutionary npg deep sequenced, insertional mutations present in a small number tree. Transposons provide powerful tools for identifying trunk genes, of cells in the tumor can be identified. It is this ability to sample the as insertions in trunk genes are more likely to be associated with a extensive intratumor heterogeneity at such a deep level that we believe higher number of sequencing reads. Despite the splinkerette PCR made it possible to approach saturation with only 100 tumors. method preventing a perfect indication of read quantification, previ- ous studies have shown that CIS genes with high read depth are more General features of HCC CIS genes likely to be highly penetrant, as is seen with APC22. To identify poten- The transposon13 contains a promoter that is used to deregulate onco- tial trunk genes for HCC, we again screened for CISs, but this time genes and stop cassettes in both transcriptional orientations to inac- we used only the 25,197 insertions that were represented by at least tivate tumor suppressor genes. Transposon insertions at oncogenes four sequencing reads each. Among the 524 CISs identified by this therefore tend to be located at the 5′ end of the gene in the same approach (Supplementary Table 1d) were 21 outlier CISs that showed transcriptional orientation, whereas insertions in tumor suppressor the highest sequencing read counts and frequencies of occurrence genes are usually located throughout the gene in either orientation. (Fig. 2e and Supplementary Fig. 9a). Six genes (Arid1a (refs. 10,11), Insertion site patterns are thus often predictive of gene function. For Gsk3b (ref. 26), Iqgap2 (ref. 27), Magi1 (ref. 28), Pten29 and Sav1 the genes identified here and in other transposon screens performed (refs. 30,31)) have known tumor suppressive roles in HCC, and Snd1 in solid tumors13–15,17,19,22,24, the majority are predicted to function as has been described as an oncogene that promotes HCC angiogen- tumor suppressor genes. Most genes also have insertions only in one esis32,33. Two genes (Zbtb20 (refs. 34,35) and Ankrd17 (refs. 36,37)) allele, suggesting that they may function as haploinsufficient tumor are involved in hepatocyte differentiation and maturation, and Zbtb20 suppressor genes, although it remains possible that the other allele is is a transcription factor that represses the expression of α-fetoprotein, silenced by an epigenetic or other mutational event. This haploinsuf- which is a biomarker widely used for HCC surveillance. The Setd2 ficienty is consistent with recent studies performed in human cells tumor suppressor gene38 and the putative tumor suppressor genes and tumors, which identified hundreds of haploinsufficient genes that Adk39, Dpyd40,41, Kmt2e (also known as Mll5) (ref. 42) and Nfia43,44 are able to drive cellular proliferation. These genes are postulated to were also identified as potential driver genes for HCC, although there

26 VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics A r t i c l e s

Table 1 Enrichment of liver-SB/HBV CIS genes among the genes mutated or misregulated ­significantly lower P values, and more of in human cancer these genes were upregulated than downreg- Number of genes ulated, although this difference was not large in the data set Number of Coverage of (Supplementary Fig. 11c,d). Comparison (mouse chromosome 1 overlapping the human Organism excluded) CIS genes data set (%) Enrichment P of the HCC microarray data with other CIS 19,24 22 Human HCC somatic mutations11 Human 858 177 20.6 9.53 × 10−14 gene lists from PDAC and CRC screens Human HCC somatic mutations10 Human 189 52 27.5 4.31 × 10−10 showed the highest enrichment for HCC CIS COSMIC database for somatic Human 499 83 16.6 0.00423 genes (Supplementary Fig. 11e), indicating mutations in human HCC better specificity when the data were from Human HBV insertions54 Human 103 23 22.3 0.00340 the same tumor type. Overall, 42.5% of the CGC database Human 468 120 25.6 <2 × 10−16 HCC CIS genes are mutated or misexpressed (somatic mutations, all cancers) in human HCC (Fig. 3 and Supplementary Genes misexpressed in human HCC4 Human 3,784 987 26.1 <2 × 10−16 Table 3). We subsequently confirmed the The enrichment P values were calculated using a two-by-two contingency table and χ2 test with Yates correction. deregulation of 30 HCC CIS genes by real- time PCR using a microfluidic dynamic array containing 18 HBV-positive human is no published evidence they are involved in HCC. Six driver genes tumors and 9 liver-SB/HBsAg tumors (Supplementary Fig. 11f and also regulate hepatic metabolism, with Pten45, Gsk3b (ref. 46), Supplementary Table 4). The direction of the expression change Adk47,48, Zbtb20 (ref. 49) and Ghr50,51 regulating lipid and glucose (upregulated or downregulated) was positively correlated between metabolism and Dpyd controlling pyrimidine catabolism in hepato- human and mouse tumors (Supplementary Fig. 11g). cytes. Seven driver genes (Mll5, Setd2, Wac, Arid1a, Nfia, Snd1 and Zbtb20) are transcription factors, cofactors or chromatin remodel- Transposons drive HCC through conserved pathways ers, highlighting the importance of the modulation of transcriptional HCC CIS genes are also enriched in the major cancer signaling path- programs in HCC. One driver CIS that we named Rtl1 Chr12 locus ways that are known to be important for human HCC, such as the (Fig. 2e) is the merger of several CISs at the same imprinted locus52 Ras-Erk (also called Mapk), p53, Akt, Wnt and Tgf-β–Bmp pathways, (Rian, Rtl1 and Sfp865 (also known as 6430526N21Rik) and a micro- together with the recently identified hepato-tumor suppressor Hippo RNA cluster), which was targeted in 38.4% of the tumors. The major- pathway30 (Fig. 4a, Supplementary Table 5 and Supplementary ity (92%) of the transposon insertions at this CIS are located on the Fig. 12). Another known HCC signaling pathway, Il-6–Stat3, was negative strand (Supplementary Fig. 9b), indicating a potential onco- less enriched, possibly because the tumor microenvironment locally genic loss of imprinting and transcriptional activation of Rtl1 or one secretes Il-6, thus activating this pathway in hepatocytes. A similar or more of the microRNAs that are located downstream. Notably, analysis for PDAC19,24 and CRC22 CIS genes showed that many of these microRNAs are among the most strongly upregulated micro these signaling pathways are also enriched for mutations in these RNAs in mouse and human HCC53. genes. Notably, the Hippo pathway was more highly mutated in HCC, whereas the Wnt pathway, which is critical for CRC22, was more CIS gene comparison with human data and other indications highly mutated in CRC (Supplementary Fig. 12a). Transposons also We next asked whether the 2,881 HCC CIS genes are specific for prominently targeted the Hippo pathway in tumors generated in a Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature HCC by comparing them to the genes identified in transposon Sav1-deficient­ background (Online Methods and Supplementary screens for colorectal cancer (CRC)22 and pancreatic adenocarci- Table 1e). Sav1 activates Mst1, which results in the downstream noma (PDAC)19,24. These comparisons showed that >60% of the activation of Hippo signaling55. Together these results suggest that npg HCC CIS genes are also mutated in CRC and PDAC (P < 2 × 10−16) (Supplementary Fig. 10). This is not surprising, as many cancer genes are known to function in more than one cancer cell type. HCC CIS 987 misexpressed in genes are also significantly mutated in human HCC, as shown by the human HCC very significant overlap with the genes somatically mutated in human HCC identified in two whole-genome HCC sequencing studies10,11 and in a genome-wide analysis of HBV integrations54 (Table 1). In addition, we found a significant enrichment among the genes listed in

the Catalogue of Somatic Mutations in Cancer (COSMIC) database, 139 which reports all of the somatic mutations identified in human HCC, and the genes listed in the Cancer Gene Census database, which aims 376 somatic mutations to catalog all of the driver genes in human cancers (Table 1). HCC in HCC CIS genes thus have important roles in human HCC and provide additional validation for the genes identified in human HCC. The expression of HCC CIS genes is also specifically deregulated 2,881 CIS genes in human HCC. Among 2,189 HCC CIS genes contained in a large microarray data set obtained from 223 HBV-positive patients with HCC4 (Supplementary Fig. 11a), 45% were significantly deregu- Figure 3 Many liver-SB/HBV CIS genes are deregulated or mutated in lated in human HCC (absolute fold change of >1.5 and adjusted human cancer. Repartition of the 42.5% liver-SB/HBV CIS genes found to P < 0.00001) (Supplementary Fig. 11b and Supplementary Table 2) be either mutated or misexpressed in human HCC. A total of 1,224 CISs compared to 29% misregulated genes on the array that are not CIS genes from the liver-SB/HBV screen were associated with a gene that is mutated (P < 2 × 10−16). Differentially expressed HCC CIS genes also displayed or misexpressed in human HCC.

Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014 27 A r t i c l e s

transposon-induced mutations in Hippo pathway genes cooperate linked to cellular metabolic processes (P = 1.4 × 10−60, 1,199 genes) with Sav1 deficiency to further deregulate Hippo signaling and are (Fig. 4b and Supplementary Table 6). We detected a less striking consistent with the results from a transposon screen performed in enrichment for metabolic processes for CRC and PDAC CIS genes Apc mutant mice, which identified a large number of CRC CIS genes (Supplementary Fig. 13), whereas most other ontologies, such as that targeted the Wnt pathway22. Liver-SB/Sav1-deficient mice also transcription or cell cycle, displayed similar enrichments among all developed multiple large liver tumors nearly 2 months earlier than three tumor types. When we focused the gene classification on meta- liver-SB/HBsAg mice, providing additional evidence for an oncogenic bolic processes, we found 47 metabolic categories that were targeted role for deregulated Hippo signaling in HCC. by the transposon during HCC tumor development as compared to only 9 and 10 metabolic categories for CRC and PDAC CIS genes, HCC CIS genes target metabolic events respectively (Supplementary Fig. 14). When we looked for enrich- When we classified the HCC CIS genes according to their gene ment for disease genes annotated in Ingenuity Pathway Analysis, we ontology, we noticed a striking over-representation of genes that are also found a highly significant enrichment for HCC CIS genes in

a Wnt pathway targeted in 96.9% of tumors p53 pathway targeted in 95.6% of tumors Akt pathway targeted in 96.5% of tumors 18.4% 18.0% 43.0% 15.8% 40.6% 17.5% Usp10 16.4% Lrp1 Lrp5 Lrp6 Mdm2 Insr Igf1 63.6% Crebbp 57.7% Usp7 17.8% Pten 33.1% Csnk1a1 Tnks 19.1% 21.9% Ywhaz 30.9% Csnk1d 62.3% Gsk3b Trp53 19.3% Pik3ca Pik3r1 34.9% Tnks2 38.2% Pias1 33.5% 11.0% Csnk1g1 11.6% Axin1 19.5% Pik3c2a Pik3ap1 32.9% 34.0% Csnk1g3 16.0% Apc 31.3% Prkag2 Trp53inp1 11.8% 11.8% Ppp2r1a 7.5% Akt2 34.7% Ppp2r2a 16.9% Ctnnb1 14.9% Ppp2r2d 28.7% Tcf7I2 19.1% 39.5% 9.0% Ppp2cb Rps6kb1 Foxo1 52.4% Ppp2r5e

Ras-Erk pathway targeted in Hippo pathway targeted in 91.2% of tumors Tgf-β–Bmp pathway targeted in 87.2% of tumors Il-6–Stat3 pathway targeted 91.7% of tumors in 51.3% of tumors 41.9% 21.3% 17.3% 33.1% 7.9% 23.7% Fat1 Egfr Met Ligands: Acvr1 Acvr2a Bmp1 Taok1 33.5% Bmpr1a 44.7% 16.0% II6st (gp130) 26.8% Grb2 receptors: Taok2 7.5% 32.2% Wwc1 16.2% Sos1 Sos2 13.6% Taok3 43.0% Sar1a 7.0% 32.9% Jak1 Sav1 29.8% Tab2 Kras 12.9% 12.3% 10.5% Tab3 Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature 31.6% Mobkl2b 16.9% Stat3 Lats1 22.8% Raf1 44.1% 16.2% Mobkl1a Smad3 17.5% Smad2 15.2% 23.9% Map3k1 Map3k2 28.1% Yap1 21.7%

npg Smad1 9.6% 19.1% Smad4 Smad5 14.1% 16.9% Map2k1 Tead1 23.4%

27.2% Mapk1

–log P 0 10 20 30 40 50 60 b Cellular metabolic process 42.1% Chromatin modification 3.1% Post-translational modification 8.9% Transcription 12.9% Organelle organization 8.4% Embryonic development 5.2% Protein ubiquitination 1.0% Cell communication 5.7% Vesicle-mediated transport 3.7% Protein translation 1.1% Intracellular signaling cascade 6.2% Figure 4 Liver-SB/HBV CIS genes drive tumorigenesis through conserved cancer Apoptosis 3.5% signaling pathways. (a) HCC CIS genes are enriched for genes in major oncogenic Endoplasmic reticulum stress 0.4% canonical signaling pathways. The percentages represent the fraction of tumors with Liver development 0.6% Cell cycle 5.4% a transposon insertion at the gene locus. Most tumors displayed insertions in genes Multicellular organism growth 0.8% in these canonical pathways. (b) analysis performed with DAVID Vasculature development 1.9% bioinformatics for biological processes. P values for enrichment are represented on Response to oxidative stress 0.7% a −log scale. The percentages indicate the proportion of HCC CIS genes found associated Cell migration 1.7% with a specific biological process.

28 VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics A r t i c l e s

Figure 5 Mapping of genomic and metabonomic 56 metabolites with significant variation 3,784 genes misexpressed in 2,881 liver-SB/HBV data to metabolic pathways disrupted in HCC. in liver-SB/HBV HCC human HCC (ref. 4) CIS genes To understand the metabolic pathways that are disrupted in HCC at the genetic, mRNA and metabolite levels, we used data from the liver- SB/HBV screen (Supplementary Table 1c), human expression data4 and metabolomic Glucose results (Supplementary Table 7). Metabolites Upregulated in human HCC Glut1/2* Downregulated in hHCC are written with larger font size, and genes CIS genes* are written with smaller font size. Red or blue Glycolysis Hk2 fonts indicate that the gene expression or Pgd G6P PP pathway Nucleotide synthesis Ribose metabolite amount is increased or decreased, Prps2 Pfkfb1 Pyridine respectively, in HCC tissue as compared to Uracil F6P adjacent nontumor tissue. The CIS genes that Uridine are listed in Supplementary Table 1c are labeled Gpd2 FBP Glycerol Fatty acid synthesis with asterisks. This map focuses on glycolysis, Lipid synthesis the TCA cycle, glutaminolysis and the pentose phosphate (PP) pathway. Despite the increased PEP Pkm1 Pkm2 glucose uptake (Glut1 and Glut2 (Glut1/2)) and Ldha Acaca* consumption, the tumor cells predominantly Mct4* Lactate Pyruvate expressed the pyruvate kinase M2 isoform Acat1* (Pkm2), which converts phosphoenolpyruvate Citrate Ac-CoA Pcx* Pdk1/2 Acly* (PEP) to pyruvate less efficiently than Pkm1 Ac-CoA does70. This promotes the accumulation and Me1* Citrate Aco1/2* shuttling of glycolytic intermediates such NH2 as glucose-6-phosphate (G6P), fructose-6- Oxaloacatate Isocitrate Gls/Gls2* phosphate (F6P) and fructose-1,6-biphosphate Glutaminolysis Cat2* Glutamine Mdh1* 2 (FBP) to the PP pathway and macromolecular Idh1/2* synthesis69. Thus, this termination of the Malate Glutamate glycolysis metabolic pathway, together with Fh Glud1/2* dysregulation of Pcx and Pdk1 and Pdk2 TCA cycle Fumarate α-ketoglutarate (Pdk1/2), would prevent pyruvate from entering Sdha* Ogdh* the TCA cycle. This would also lead to a Succinate truncated TCA cycle and insufficient glucose- dependent citrate production. However, most of the TCA-cycle gene products (Idh1, Idh2 (Idh1/2), Sdha, Fh and Ogdh) and corresponding intermediates are upregulated in tumor cells, except for the downregulation of the aconitase gene family (Aco1 and Aco2 (Aco1/2)), suggesting the TCA cycle is at least partially activated. Our results imply an activation of glutaminolysis (Cat2, Gls and Glud1 and Glud2 (Glud1/2)) that converts glutamine to α-ketoglutarate to replenish TCA-cycle intermediates. Effective maintenance of citrate synthesis is possible through reductive carboxylation of glutamine-derived α-ketoglutarate by Idh1/2 (ref. 73). These adaptations around glycolysis and the TCA cycle could allow rapid generation of both ATP for bioenergetics and important metabolites for biosynthesis. In addition, the breakdown of citrate by Acly could constitute a primary source of acetyl-CoA (ac-CoA) for fatty acid and lipid synthesis. Moreover, upregulation of Ldha could convert glutamine-derived pyruvate to lactate that is excreted by the Mct4 transporter outside the cells68. Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature

metabolic diseases (P = 4 × 10−6, 265 genes); genetic association to profiling on eight food-restricted liver-SB/HBsAg animals. We used npg cancer was also highly evident (P = 6 × 10−10, 431 genes). Overall, chemical derivatization and gas chromatography coupled to time-of- genes associated with protein, carbohydrate, lipid and nucleic acid flight mass spectrometry (GC/TOFMS) to identify the differences in metabolism were highly mutated by SB. Several other processes were metabotypes between malignant (tumor mass >3 mm) and adjacent also targeted, including oxidoreduction and metabolism of ATP, orga- normal liver tissue. We verified the quality of the acquired metabo- nophosphate, hormones, isoprenoid and vitamins (Supplementary lomics data by principal component analysis (PCA) and partial least- Table 6). squares discriminant analysis (PLS-DA) (Supplementary Fig. 16). To determine whether similar genetic changes occur in human Even with eight matched samples, statistical analyses using t test and HCC, we measured metabolic gene expression levels in 18 HBV- Welch test were able to identify 56 annotated metabolites that were positive human HCC tumors and 9 liver-SB/HBsAg tumors using significantly changed in tumors compared to normal adjacent tis- quantitative RT-PCR (qRT-PCR). The deregulation of key metabolic sues (Supplementary Fig. 17). Consistent with the transposon data, genes was positively correlated in these two species, and similar gene we noticed significant alterations corresponding to specific meta- expression signatures were found for most genes (Supplementary bolic pathways. For instance, carbohydrate metabolism was affected Fig. 15). The classification of HCC CIS genes has thus highlighted through increased levels of glucose and fructose and reduced levels of the importance and specificity of targeting metabolic genes in HCC. ribitol, ribonic acid and allonic acid. The levels of several amino acids, The fact that normal hepatocytes display important general meta- including isoleucine, valine, asparagine, tyrosine, methionine, serine, bolic functions that most likely need to be reorganized during malig- leucine, phenylalanine, threonine and alanine, were also significantly nant transformation may explain the reprogramming by SB of many (P < 0.01) downregulated in tumors, which is consistent with data aspects of hepatocyte metabolism. from human tumors56. Moreover, we found components of nucleic acids, including ribose, pyridine, uracil and uridine, at higher levels Metabolic profiling and mapping of mouse HCC tumors in tumors. We then integrated the genetic (Supplementary Table 1c), To better understand how genetic changes in metabolic genes affect expression4 and metabolic results (Supplementary Table 7) in order hepatic tumor metabolism, we performed nontargeted metabolic to identify potential correlations. We noticed a marked consistency

Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014 29 A r t i c l e s

a b c NT DT MT

Massive C]pyruvate C]pyrhydrate tumor 13 13 [1- [1- C]lactate 13

Anatomical scan Lactate Alanine 1 cm C]alanine C]bicarbonate [1- 13 C]aspartate 13 C]malate [1- 13 13 C]aspartate [1- 13 [4- [1-13C]lactate [1-13C]asparagine [1- [1- d [1-13C]malate [1-13C]pyruvate/10 [1-13C]alanine [1-13C]bicarbonate 0.14 e 0.16 n = 10 Nontumor 1 cm 0.12 0.14 Tumors <2 mm 0.10 n = 12 Tumors >3 mm 0.12 C signal

Figure 6 Quantitative changes in pyruvate 13 0.08 n = 9 n = 9 0.10 metabolism can be detected by hyperpolarized 0.06 n = 9 carbon-13 magnetic resonance spectroscopy 0.08 0.04 n = 9 in vivo. (a) T2-weighted magnetic resonance

Normalized 0.02 0.06 n = 9 imaging (MRI) scans illustrate the development n = 9 n = 11 of HCC at different stages in liver-SB/HBsAg 0 0.04 n = 9 n = 12

0 10 20 30 40 50 60 Metabolic ux (units per s) n = 9 animals. The anatomy appears normal in the Time (s) 0.02 n = 9 n = 11 n = 9 pretumor stage (NT) at 2–5 months of age. However, Time course for a tumor >3 mm 0 at approximately 8 months of age, tumors (white arrows) k pyr→lac k pyr→mal k pyr→ala k pyr→asp k pyr→bic began to appear within the normal liver (white arrowhead). This is referred to as the developing tumor (DT) stage. Massive tumors (MT) were detected by MRI approximately 2 months later (black arrows). (b) Live in vivo imaging provided the mapping of lactate and alanine production in HCC after hyperpolarized pyruvate injection (blue, green, yellow and red represent the amounts of molecules from lowest to highest). (c) Representation of the in vivo–measured hyperpolarized carbon-13 spectra in the liver. Metabolites that were detected include [1-13C]lactate (183.0 p.p.m.), [1-13C]malate (181.5 p.p.m.), [1-13C]pyruvate hydrate (179.1 p.p.m.), [4-13C]aspartate (177.8 p.p.m.), [1-13C]alanine (176.4 p.p.m.), [4-13C]oxaloacetate (175.6 p.p.m.), [1-13C]aspartate (175.0 p.p.m.), [1-13C]pyruvate (170.8 p.p.m.) and [1-13C]bicarbonate (160.8 p.p.m.). The pyruvate peak was truncated to better illustrate the downstream metabolite peaks. (d) Representative time course depicting the simultaneous production of downstream metabolites after infusion of hyperpolarized [1-13C]pyruvate in a mouse bearing a large liver tumor. (e) Quantitative changes in pyruvate metabolism can be detected by hyperpolarized 13C magnetic resonance spectroscopy in live mice in vivo. Rate of exchange (k) of 13C label from [1-13C]pyruvate (pyr) to [1-13C]lactate (lac), [1-13C]malate (mal), [1-13C]alanine (ala), [1-13C]aspartate (asp) and [1-13C]bicarbonate (bic) after [1-13C]pyruvate infusion into mice at different tumor development stages. Analysis of variance tests were performed for each metabolic flux result. At the false discovery rate (FDR)-corrected significance level of 0.05, no differences between group means were found. However, explicitly testing for a trend by computing Spearman correlation showed that the metabolic flux of pyr→ala significantly increased with tumor size (ρ = 0.51, FDR-corrected P = 0.022). Although a strong trend was also observed for the metabolic flux of pyr→lac, this did not reach statistical significance (ρ = 0.43, FDR-corrected P = 0.052). The number of animals in groups is indicated above each bar. Error bars, s.e.m. Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature

between genetic alterations, mRNA levels and actual amounts of increased in HCC as well56,59. Overall these experimental findings metabolites, suggesting a probable control of metabolic changes by were consistent with our predictions that were based on the integra- npg genetic modifications, at least to glycolysis, the tricarboxylic acid tion of multiple data sets. (TCA) cycle, glutaminolysis, the pentose phosphate pathway (Fig. 5) and redox regulation (Supplementary Fig. 18). Collectively these DISCUSSION metabolic alterations control bioenergetics, biosynthesis and redox To our knowledge, we report the first saturating transposon screen for regulation, which are known to optimize tumor cell metabolism for any type of cancer. In total, we identified 2,881 genes that were posi- proliferation and survival. tively selected during HCC development, and 42.5% of these genes We also tested our interpretation of the integrated data using were also mutated or transcriptionally altered in human HCC. The biochemical assays and in vivo live metabolic imaging57. We found human HCC genome contains a large spectrum of mutations, includ- that intrahepatic lactate dehydrogenase (Ldh), alanine transaminase ing DNA amplifications, deletions, rearrangements, point mutations (Gpt, also known as Alt) and glutaminase (Gls) enzymatic activities and loss of heterozygosity, in addition to epigenetic changes, that were consistently more strongly activated in large hepatic tumors together lead to tumor development. This has made it difficult to (Supplementary Fig. 19), supporting the enhanced conversion of discriminate between true driver and passenger mutations for HCC6. pyruvate to alanine and lactate and the inclusion of glutamine per- SB mutagenesis thus provides an important comparative genomics haps as an anaplerotic carbon source to supplement the increased tool for identifying and validating the driver genes for HCC. We also metabolic fuel requirement in larger tumors58. Metabolic imaging found that many genes mutated by SB encode that func- with hyperpolarized pyruvate57 also showed a major conversion of tion in cancer signaling pathways that are known to be important pyruvate into alanine and lactate (Fig. 6), whereas its conversion into for human HCC. In addition, we found that multiple genes from the bicarbonate and other TCA intermediates was unchanged, indicating same signaling pathway are often mutated in the same tumor. This is a shift toward aerobic glycolysis and anabolic processes relative to consistent with recent reports describing the Darwinian evolutionary oxidative phosphorylation. We note, however, that pyruvate is only processes that are operative in cancer cells60,61 and the recently recog- one of the fuel sources in HCC, and other processes generating ATP, nized importance of convergent evolution to tumor development, in such as lipid catabolism, have indeed been found to be significantly which the same gene or signaling pathway is mutated multiple times

30 VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics A r t i c l e s

in different branches of the tumor evolutionary tree because of high cancer genomics and metabolism, as well as offering potential selective pressure62. These results support the suggestions of others molecular and targets for cancer therapy, such as in signaling and that it might be better to therapeutically target the signaling pathways metabolic pathways. themselves rather than the individually mutated genes. HCC is thought to arise from the hepatocyte, a highly differenti- Methods ated cell that has programmed specialized functions. The transposon Methods and any associated references are available in the online screen identified a surprisingly high enrichment for genes that are version of the paper. implicated in metabolic processes. This high enrichment seemed specific for HCC, as it was not found in transposon screens for other Note: Any Supplementary Information and Source Data files are available in the online version of the paper. types of cancer15,19,22,63. One of the major functions of hepatocytes is the control of metabolic homeostasis, including protein synthe- Acknowledgments sis, transformation of carbohydrates and synthesis of cholesterol, We acknowledge K. Reifenberg at Johannes Gutenberg University, Germany, for bile and phospholipids. Our metabolomics study is also consistent giving us the HBsAg mouse strain (originated from F. Chisari). We also thank 64,65 K. Rogers and S. Rogers at the Institute of Molecular and Cell Biology Histopathology with assays performed using human samples of serum, urine or core for their necropsy and histotechnology assistance. We thank P. Cheok, N. Lim liver tissue56,59,66, demonstrating that these metabolic processes are and D. Chen for their help with mouse breeding and monitoring. This work was indeed altered in transformed human hepatocytes. Using an inte- supported by the Biomedical Research Council, Agency for Science, Technology and grative approach involving functional genomics and metabolomics, Research, Singapore and the Cancer Prevention Research Institute of Texas (CPRIT). A.G.R. and D.J.A. are supported by the Wellcome Trust and Cancer Research UK. we further mapped the genetic, transcriptomic and metabolomic N.A.J. and N.G.C. are both CPRIT Scholars in Cancer Research. results to core metabolic processes that are commonly affected in cancer67. This approach helped uncover molecular mechanisms that AUTHOR CONTRIBUTIONS are likely altered in HCC. This disrupted metabolism included the E.A.B.-C. performed the majority of experiments, designed experiments, analyzed increased uptake of glucose and glutamine, which is implied by the data and wrote the manuscript. A.-T.N. performed experiments, analyzed data and wrote the manuscript. A.G.R. and D.J.A. sequenced the samples and processed targeting of the membrane transporters Glut1, Glut2 (also known as and analyzed data. A.S., C.K.Y.C., T.B., R.S. and F.A.B. performed computational Slc2a2) and Slc7a2 (also known as Cat2) by SB, in addition to their analyses. B.Q.C. executed experiments. E.C.Y.C. and L.-S.N. performed the upregulated expression in human tumors. As human HCC tissues metabolomics study. P.L. and G.K.R. carried out in vivo metabolic imaging and also show elevation of glucose and glutamine levels66,68, our data metabolic assays. J.M.W. analyzed mouse hepatic pathology. J.-P.A., V.C. and H.C.T. provided and processed human patient samples. A.J.D. helped analyze data, support that they may be genetically selected sources fueling tumor and R.L.J. carried out the liver-SB screen in the Sav1 mutant background. J.d.J. and HCC cell metabolism. We also identified a shift toward aerobic gly- L.F.A.W. revised statistics and performed analyses. N.G.C. and N.A.J. designed the colysis and anabolic processes relative to oxidative phosphorylation study, analyzed the data and wrote the manuscript. All authors commented on and in liver tumors. We observed similar deregulation of metabolic genes edited the final manuscript. in human HCC, suggesting that anabolic pathways, in conjunction COMPETING FINANCIAL INTERESTS 69 with aerobic glycolysis (Warburg effect) , contribute to the enhanced The authors declare no competing financial interests. glucose metabolism in HCC. Notably, recent in vitro studies showed that heterogeneous nuclear ribonucleoproteins could induce alterna- Reprints and permissions information is available online at http://www.nature.com/ reprints/index.html. tive splicing and thereby result in a high PKM2-to-PKM1 expres- sion ratio. This change could be important for the promotion of cell Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature proliferation and aerobic glycolysis in tumors70,71. In line with these 1. Chen, X. et al. Gene expression patterns in human liver cancers. Mol. Biol. Cell 13, 1929–1939 (2002). studies, we found not only that SB targets Hnrnpa1 but also that 2. Imbeaud, S., Ladeiro, Y. & Zucman-Rossi, J. Identification of novel oncogenes and mouse and human HCCs significantly overexpress this gene pro- tumor suppressors in hepatocellular carcinoma. Semin. Liver Dis. 30, 75–86 npg portionally to the expression of PKM2. This finding suggests that (2010). 3. Mas, V.R. et al. Genes involved in viral carcinogenesis and tumor initiation in glucose is being shunted through the pentose phosphate pathway hepatitis C virus–induced hepatocellular carcinoma. Mol. Med. 15, 85–94 toward the biosynthesis of molecules, including nucleotides70, and (2009). supports the increased lactate production (aerobic glycolysis) that we 4. Roessler, S. et al. A unique metastasis gene signature enables prediction of tumor relapse in early-stage hepatocellular carcinoma patients. Cancer Res. 70, observed in our in vivo metabolic imaging. Tumor cell proliferation 10202–10212 (2010). diverts citrate toward fatty acid synthesis, as was suggested in the SB 5. Wurmbach, E. et al. Genome-wide molecular profiles of HCV-induced dysplasia and hepatocellular carcinoma. Hepatology 45, 938–947 (2007). screen. Because the TCA cycle is crucial for generating intermedi- 6. Herath, N.I., Leggett, B.A. & MacDonald, G.A. Review of genetic and epigenetic ate metabolites to ensure cell viability, anaplerotic pathways must be alterations in hepatocarcinogenesis. J. Gastroenterol. Hepatol. 21, 15–21 (2006). activated. Indeed, our results show glutaminolysis as one such pos- 7. Herceg, Z. & Paliwal, A. Epigenetic mechanisms in hepatocellular carcinoma: how environmental factors influence the epigenome. Mutat. Res. 727, 55–61 (2011). sible carbon source, with SB mutations in many genes regulating the 8. Berger, A.H., Knudson, A.G. & Pandolfi, P.P. A continuum model for tumour glutaminolytic pathway72,73. suppression. Nature 476, 163–169 (2011). Metabolic adaptations that are associated with glucose and 9. ENCODE Project Consortium. et al. An integrated encyclopedia of DNA elements in the . Nature 489, 57–74 (2012). glutamine pathways in cancerous hepatocytes could also enhance 10. Fujimoto, A. et al. Whole-genome sequencing of liver cancers identifies etiological the antioxidant defenses through NADPH-dependent glutathione influences on mutation patterns and recurrent mutations in chromatin regulators. and thioredoxin systems in order to maintain cellular redox bal- Nat. Genet. 44, 760–764 (2012). 11. Guichard, C. et al. Integrated analysis of somatic mutations and focal copy-number ance. In this way, changes in glucose and glutamine pathways could changes identifies key genes and pathways in hepatocellular carcinoma.Nat. Genet. support three basic needs for dividing tumor cells, namely bioen- 44, 694–698 (2012). 12. Copeland, N.G. & Jenkins, N.A. Harnessing transposons for cancer gene discovery. ergetics, biosynthesis and redox regulation. Inhibiting glycolysis Nat. Rev. Cancer 10, 696–706 (2010). and/or glutaminolysis might therefore be effective in restricting 13. Dupuy, A.J., Akagi, K., Largaespada, D.A., Copeland, N.G. & Jenkins, N.A. liver tumor formation and progression, as has been suggested for Mammalian mutagenesis using a highly mobile somatic Sleeping Beauty transposon system. 436, 221–226 (2005). 74 Nature other cancer types . Genes identified by our transposon screen have 14. Dupuy, A.J. et al. A modified sleeping beauty transposon system that can be used to thus provided a broader and deeper understanding of hepatocellular model a wide variety of human cancers in mice. Cancer Res. 69, 8150–8156 (2009).

Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014 31 A r t i c l e s

15. Rad, R. et al. PiggyBac transposon mutagenesis: a tool for cancer gene discovery 46. Cohen, P. & Frame, S. The renaissance of GSK3. Nat. Rev. Mol. Cell Biol. 2, in mice. Science 330, 1104–1107 (2010). 769–776 (2001). 16. Yang, J.D. & Roberts, L.R. Hepatocellular carcinoma: a global view. Nat. Rev. 47. Bjursell, M.K. et al. Adenosine kinase deficiency disrupts the methionine cycle and Gastroenterol. Hepatol. 7, 448–458 (2010). causes hypermethioninemia, encephalopathy, and abnormal liver function. Am. J. 17. Keng, V.W. et al. A conditional transposon-based insertional mutagenesis screen Hum. Genet. 89, 507–515 (2011). for genes associated with mouse hepatocellular carcinoma. Nat. Biotechnol. 27, 48. Boison, D. et al. Neonatal hepatic steatosis by disruption of the adenosine kinase 264–274 (2009). gene. Proc. Natl. Acad. Sci. USA 99, 6985–6990 (2002). 18. Chisari, F.V. et al. Molecular pathogenesis of hepatocellular carcinoma in hepatitis 49. Sutherland, A.P. et al. Zinc finger protein Zbtb20 is essential for postnatal survival B virus transgenic mice. Cell 59, 1145–1156 (1989). and glucose homeostasis. Mol. Cell. Biol. 29, 2804–2815 (2009). 19. Mann, K.M. et al. Sleeping Beauty mutagenesis reveals cooperating mutations 50. Fan, Y. et al. Liver-specific deletion of the growth hormone receptor reveals essential and pathways in pancreatic adenocarcinoma. Proc. Natl. Acad. Sci. USA 109, role of growth hormone signaling in hepatic lipid metabolism. J. Biol. Chem. 284, 5934–5941 (2012). 19937–19944 (2009). 20. Brett, B.T. et al. Novel molecular and computational methods improve the accuracy 51. Mavalli, M.D. et al. Distinct growth hormone receptor signaling modes regulate of insertion site analysis in Sleeping Beauty–induced tumors. PLoS ONE 6, e24668 skeletal muscle development and insulin sensitivity in mice. J. Clin. Invest. 120, (2011). 4007–4020 (2010). 21. de Ridder, J., Uren, A., Kool, J., Reinders, M. & Wessels, L. Detecting statistically 52. da Rocha, S.T., Edwards, C.A., Ito, M., Ogata, T. & Ferguson-Smith, A.C. Genomic significant common insertion sites in retroviral insertional mutagenesis screens. imprinting at the mammalian Dlk1-Dio3 domain. Trends Genet. 24, 306–316 PLoS Comput. Biol. 2, e166 (2006). (2008). 22. March, H.N. et al. Insertional mutagenesis identifies multiple networks of cooperating 53. Luk, J.M. et al. DLK1–DIO3 genomic imprinted microRNA cluster at 14q32.2 genes driving intestinal tumorigenesis. Nat. Genet. 43, 1202–1209 (2011). defines a stemlike subtype of hepatocellular carcinoma associated with poor survival. 23. Swanton, C. Intratumor heterogeneity: evolution through space and time. Cancer J. Biol. Chem. 286, 30706–30713 (2011). Res. 72, 4875–4882 (2012). 54. Sung, W.K. et al. Genome-wide survey of recurrent HBV integration in hepatocellular 24. Pérez-Mancera, P.A. et al. The deubiquitinase USP9X suppresses pancreatic ductal carcinoma. Nat. Genet. 44, 765–769 (2012). adenocarcinoma. Nature 486, 266–270 (2012). 55. Edgar, B.A. From cell structure to transcription: Hippo forges a new path. Cell 124, 25. Solimini, N.L. et al. Recurrent hemizygous deletions in cancers may optimize 267–273 (2006). proliferative potential. Science 337, 104–109 (2012). 56. Budhu, A. et al. Integrated metabolite and gene expression profiles identify lipid 26. Whittaker, S., Marais, R. & Zhu, A.X. The role of signaling pathways in the biomarkers associated with progression of hepatocellular carcinoma and patient development and treatment of hepatocellular carcinoma. Oncogene 29, 4989–5005 outcomes. Gastroenterology 144, 1066–1075 (2013). (2010). 57. Lee, P. et al. In vivo hyperpolarized carbon-13 magnetic resonance spectroscopy 27. Schmidt, V.A., Chiariello, C.S., Capilla, E., Miller, F. & Bahou, W.F. Development reveals increased pyruvate carboxylase flux in an insulin resistant mouse model. of hepatocellular carcinoma in Iqgap2-deficient mice is IQGAP1 dependent. Hepatology 57, 515–524 (2013). Mol. Cell. Biol. 28, 1489–1502 (2008). 58. DeBerardinis, R.J. & Cheng, T. Q’s next: the diverse functions of glutamine in 28. Zhang, G., Liu, T. & Wang, Z. Downregulation of MAGI1 associates with poor metabolism, cell biology and cancer. Oncogene 29, 313–324 (2010). prognosis of hepatocellular carcinoma. J. Invest. Surg. 25, 93–99 (2012). 59. Beyog˘lu, D. et al. Tissue metabolomics of hepatocellular carcinoma: tumor energy 29. Horie, Y. et al. Hepatocyte-specific Pten deficiency results in steatohepatitis and metabolism and the role of transcriptomic classification. Hepatology 58, 229–238 hepatocellular carcinomas. J. Clin. Invest. 113, 1774–1783 (2004). (2013). 30. Lu, L. et al. Hippo signaling is a potent in vivo growth and tumor suppressor pathway 60. Anderson, K. et al. Genetic variegation of clonal architecture and propagating cells in the mammalian liver. Proc. Natl. Acad. Sci. USA 107, 1437–1442 (2010). in leukaemia. Nature 469, 356–361 (2011). 31. Zheng, T., Wang, J., Jiang, H. & Liu, L. Hippo signaling in oval cells and 61. Hou, Y. et al. Single-cell exome sequencing and monoclonal evolution of a JAK2- hepatocarcinogenesis. Cancer Lett. 302, 91–99 (2011). negative myeloproliferative neoplasm. Cell 148, 873–885 (2012). 32. Santhekadur, P.K. et al. Multifunction protein staphylococcal nuclease domain 62. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by containing 1 (SND1) promotes tumor angiogenesis in human hepatocellular multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012). carcinoma through novel pathway that involves nuclear factor κB and miR-221. 63. Berquam-Vrieze, K.E. et al. Cell of origin strongly influences genetic selection in a J. Biol. Chem. 287, 13952–13958 (2012). mouse model of T-ALL. Blood 118, 4646–4656 (2011). 33. Yoo, B.K. et al. Increased RNA-induced silencing complex (RISC) activity contributes 64. Gao, H. et al. Application of 1H NMR-based metabonomics in the study of metabolic to hepatocellular carcinoma. Hepatology 53, 1538–1548 (2011). profiling of human hepatocellular carcinoma and liver cirrhosis. Cancer Sci. 100, 34. Kojima, K. et al. MicroRNA122 is a key regulator of α-fetoprotein expression and influences 782–785 (2009). the aggressiveness of hepatocellular carcinoma. Nat. Commun. 2, 338 (2011). 65. Wang, B. et al. Metabonomic profiles discriminate hepatocellular carcinoma from 35. Xie, Z. et al. Zinc finger protein ZBTB20 is a key repressor of α-fetoprotein gene liver cirrhosis by ultraperformance liquid chromatography-mass spectrometry.

Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature transcription in liver. Proc. Natl. Acad. Sci. USA 105, 10859–10864 (2008). J. Proteome Res. 11, 1217–1227 (2012). 36. Flaim, C.J., Chien, S. & Bhatia, S.N. An extracellular matrix microarray for probing 66. Yang, Y. et al. Metabonomic studies of human hepatocellular carcinoma using cellular differentiation. Nat. Methods 2, 119–125 (2005). high-resolution magic-angle spinning 1H NMR spectroscopy in conjunction with 37. Watt, A.J. et al. A gene trap integration provides an early in situ marker for hepatic multivariate data analysis. J. Proteome Res. 6, 2605–2614 (2007). specification of the foregut endoderm. Mech. Dev. 100, 205–215 (2001). 67. Ward, P.S. & Thompson, C.B. Metabolic reprogramming: a cancer hallmark even npg 38. Duns, G. et al. Histone methyltransferase gene SETD2 is a novel tumor suppressor Warburg did not anticipate. Cancer Cell 21, 297–308 (2012). gene in clear cell renal cell carcinoma. Cancer Res. 70, 4287–4291 (2010). 68. Chen, T. et al. Serum and urine metabolite profiling reveals potential biomarkers 39. Jackson, R.C., Morris, H.P. & Weber, G. Adenosine deaminase and adenosine kinase of human hepatocellular carcinoma. Mol. Cell. Proteomics 10, M110.004945 in rat hepatomas and kidney tumours. Br. J. Cancer 37, 701–713 (1978). (2011). 40. Nii, A. et al. Significance of dihydropyrimidine dehydrogenase and thymidylate 69. Cairns, R.A., Harris, I.S. & Mak, T.W. Regulation of cancer cell metabolism. synthase mRNA expressions in hepatocellular carcinoma. Hepatol. Res. 39, Nat. Rev. Cancer 11, 85–95 (2011). 274–281 (2009). 70. Christofk, H.R. et al. The M2 splice isoform of pyruvate kinase is important for 41. Queener, S.F., Morris, H.P. & Weber, G. Dihydrouracil dehydrogenase activity in cancer metabolism and tumour growth. Nature 452, 230–233 (2008). normal, differentiating and regnerating liver and in hepatomas. Cancer Res. 31, 71. David, C.J., Chen, M., Assanah, M., Canoll, P. & Manley, J.L. HnRNP proteins 1004–1009 (1971). controlled by c-Myc deregulate pyruvate kinase mRNA splicing in cancer. Nature 42. Cheng, F. et al. Camptothecin-induced downregulation of MLL5 contributes to the 463, 364–368 (2010). activation of tumor suppressor p53. Oncogene 30, 3599–3611 (2011). 72. DeBerardinis, R.J. et al. Beyond aerobic glycolysis: transformed cells can engage 43. Bernard, F. et al. Alterations of NFIA in chronic malignant myeloid diseases. in glutamine metabolism that exceeds the requirement for protein and nucleotide Leukemia 23, 583–585 (2009). synthesis. Proc. Natl. Acad. Sci. USA 104, 19345–19350 (2007). 44. Johnson, M.R., Look, A.T., DeClue, J.E., Valentine, M.B. & Lowy, D.R. Inactivation 73. Wise, D.R. et al. Hypoxia promotes isocitrate dehydrogenase–dependent carboxylation of the NF1 gene in human melanoma and neuroblastoma cell lines without impaired of α-ketoglutarate to citrate to support cell growth and viability. Proc. Natl. Acad. regulation of GTP.Ras. Proc. Natl. Acad. Sci. USA 90, 5539–5543 (1993). Sci. USA 108, 19611–19616 (2011). 45. Vinciguerra, M. & Foti, M. PTEN at the crossroad of metabolic diseases and cancer 74. Vander Heiden, M.G. Targeting cancer metabolism: a therapeutic window opens. in the liver. Ann. Hepatol. 7, 192–199 (2008). Nat. Rev. Drug Discov. 10, 671–684 (2011).

32 VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics ONLINE METHODS reach saturation, the GKC CIS calls from all 228 samples were analyzed using Mice and histology. We used the following alleles to generate a mouse model the ACT software package78. ACT considers genomic locations generated by of HBV-associated hepatocellular carcinoma: HBsAg transgenic18, Alb-Cre75, multiple samples for specific biological phenomenon under study (for example, T2/Onc2 (6113)13 or T2Onc3 (12740) and Rosa26-lsl-SB11 (ref. 14). The chromatin immunoprecipitation sequencing (ChIP-seq) peaks) to determine resulting cohorts were on a mixed B6.129 genetic background. Alb-Cre/+; the saturation of a screen. The program considers the various combinations T2Onc2/+; Rosa26-lsl-SB11/+; HBsAg/+ mice were used for the SB screen. in which samples can be added so that the increase in base-pair coverage is a Alb-Cre/+; T2Onc3/+; Rosa26-lsl-SB11/+; HBsAg/+ mice were used for meta- range of values that is based on all the samples. The results can be depicted as bolic imaging. All animals were monitored on a biweekly basis in accordance a series of box plots showing the increase in coverage, where the box with Institutional Animal Care and Use Committee of A*STAR guidelines. plot at each position n on the x axis shows the coverage values of all combina- Mice of both gender were used for experiments unless stated otherwise. Full tions of n samples. Box plots that approach a horizontal asymptote indicate necropsies were performed. For DNA extraction, liver tumors were measured that the coverage has reached saturation. and snap frozen. For histology, livers were 10% formalin fixed and paraffin For the GKC CISs generated by all 228 samples, the insertion sites that embedded. 5-µm sections were processed for haematoxylin and eosin stain- contributed to CISs were extracted, resulting in a set of approximately 131,000 ing or orcein staining. All slides were reviewed by our veterinary pathologist sites. The insertion sites were then selected per sample, and pseudokernels of (J.M.W.). Histological classification of hepatic lesions and tumors were as 7,500 nucleotides either side of each insertion were applied to mimic GKC previously reported18,76. kernels of 15,000 nucleotides. Overlapping kernels within each sample were merged into continuous genomic regions. These 228 modified insertion files Identification of transposon insertion sites. Identification of transposon were then analyzed using ACT. For each combination of samples, the median insertion sites was performed using splinkerette PCR to produce barcoded values and 25th and 75th percentiles were plotted using the ggplot2 visuali- PCR products that were pooled and sequenced on the 454 GS-Titanium zation package for the R statistical analysis platform. As a control, the 228 (Roche) platform. Reads from sequenced tumors were mapped to the mouse samples were reanalyzed where the same number of insertion sites per sample genome assembly NCBI m37 and merged together to identify nonredundant was selected at random across the mouse genome (excluding chromosome 1, SB insertion sites. Cloning and mapping of the transposon insertion sites were the donor chromosome). The pseudo–15,000-nucleotide kernels were applied. performed as previously described19,22. 328,687 nonredundant insertion sites Figure 2a shows the saturation plot for all 228 samples, clearly indicating a (Supplementary Data Set) were used to identify CISs using a gene-centric ‘knee’ in the profile as the graph asymptotes with an increasing number of method20 and a GKC statistical framework method21. samples. Visual inspection of the graph indicates that, below 50 samples, the screen does not appear to have reached saturation. From 100 samples upwards, GSK method for CIS identification. Any insertions sites on the transpo- there appears to be sufficient coverage to report saturation. Conversely, the son donor mouse chromosome 1 for tumors derived from T2/Onc2 were plot for the randomized samples is virtually a straight line (Supplementary excluded from CIS analysis. The likelihood of ‘local hopping’ of the transposon Fig. 4a), indicating no saturation. Although the analysis does not produce a is increased where the transposon array is located77. This phenomenon can clear-cut asymptote, this is to be expected because of the type of data under substantially increase the background level of transposon insertion events, consideration. ACT was designed to analyze such data as ChIP-seq arrays thereby complicating CIS analysis. 328,687 nonredundant insertion sites were for predicting transcription factor binding sites. In these scenarios, ChIP-seq used to identify CISs using a GKC statistical framework21. The previous GKC replicates should ideally report the same key binding sites and genomic loca- analysis approach22 was enhanced by using multiple kernel scales (widths of tions. Hence, across multiple samples, the same locations should be reported. 15,000, 30,000, 50,000, 75,000, 120,000 and 240,000 nucleotides). CISs pre- For SB screens, however, although insertions in the same gene will be found dicted across multiple scales and overlapping in their genomic locations were from different samples, the locations of the insertion sites will not overlap per- clustered together, such that the CIS with the smallest genomic ‘footprint’ fectly, even with the addition of the 15,000 nucleotide pseudokernels. Hence, was reported as the representative CIS. For highly significant CISs with nar- each sample will introduce new regions such that the overall coverage will

Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature row spatial distributions of insertion sites, the 15,000 kernel is typically the continue to increase even if the screen has truly reached a ‘saturation’ point. scale on which CISs are identified. The P value for each CIS was adjusted by Also, not all samples will contribute to all CISs. Different combinations of chromosome, and a cutoff of P < 0.05 was used. samples will therefore result in varying coverages, causing the coverage profile not to asymptote perfectly. npg Pruning genetically similar tumors from the same animals. We initially sequenced 250 tumors from 34 mice and called CISs using the GKC method Cross-species oncogenomics analysis. Gene expression assays, conducted on on all tumors (with a nonstringent P value significance threshold to capture 223 HBV-positive human patients diagnosed with HCC, were used to compare many loci). Using the boundaries of the CIS loci, we obtained genomic posi- tumors and paired nontumor tissues. The microarray data are publicly avail- tions that defined 2,110 bins on the genome. We scored every tumor for every able4 and downloadable at the Gene Expression Omnibus (GEO) database with bin as having an insertion in the bin or not. We computed the Spearman cor- accession number GSE14520. Human hepatocellular carcinoma expression relation between all tumors and performed hierarchical clustering (complete profiles were downloaded from the GEO database (GSE14520). Only the 223 linkage). When tumors from the same animal clustered together as pairs in paired samples hybridized to the HG-U133A platform were selected. Then, the dendrogram, we retained only the tumor that had the largest total number CEL files were robust multiarray average (RMA) normalized, and differential of insertions. The aim of the pruning is to ensure that there is at least a sin- gene expression between normal and tumor cells was computed using the gle tumor from another animal that is the most similar to a tumor from a Affy and limma package implemented in Bioconductor. As a result, adjusted given animal. In theory, pairs of clustered tumors from the same animal can P values (FDR-based P value adjustment according to Benjamini and remain after the pruning step if, for example, three tumors originally clustered Hochberg) and fold changes were obtained for each probe set. Our mouse together and only one was removed. To resolve this problem, the pruning candidate cancer genes were converted to human genes using three different step can be repeated iteratively. After the pruning step, we tested whether databases of homologous genes (MGI, HomoloGene and Ensembl)79–81. The the median Spearman correlation between the 228 remaining same-animal human genes obtained were then mapped to the list of Affymetrix platform tumors and the median Spearman correlation between the non–same animal probe sets. For genes detected with more than one probe set, the probe set with tumors was statistically significant. If this was not the case, we terminated the best adjusted P value was considered as the gene representative. The threshold pruning process. On our data set, a single pruning round was sufficient. We of significance used was the absolute value of fold change more than 1.5 and used the remaining 228 tumor samples further to call CISs with both GKC adjusted P < 0.00001. and gCIS methods. Smaller-scale liver mutagenesis screen in the Sav1-deficient background. Analysis of screen saturation. To determine whether the SB screen had This study was performed in a Sav1-deficient background with Alb-Cre/+; reached saturation and, if it had, what number of samples was sufficient to T2Onc2/+; Rosa26-lsl-SB11/+; Sav1flox/flox animals. The Sav1 conditional

doi:10.1038/ng.2847 Nature Genetics allele was previously described30. A total of 136 large tumors from 31 animals Metabolomics experiments. Chemicals for metabolomics experiments. HPLC- were used for analysis. Most animals were euthanized at 8–9 months of age, grade chloroform and methanol were purchased from Tedia Company Inc. when they had multiple large tumors. The protocol for genomic DNA extrac- HPLC-grade toluene was purchased from JT Baker. MSTFA (N-methyl-N- tion, barcoded amplification and sequencing on an Illumina Hi-Seq2000 plat- (trimethylsilyl)trifluoroacetamide) with 1% TMCS (trimethylchlorosilane) form was previously described20. 6,251,915 reads were mapped to the mouse and methoxyamine hydrochloride in pyridine (MOX) were purchased from genome (mm9) and filtered when they were represented by very few sequence Pierce. Urease of Sigma type III, alkane standard mixture (C10 to C40) and reads20. The gene-centric CIS identification method (gCIS) identified 73 com- sodium sulfate (anhydrous) were obtained from Sigma Aldrich. MilliQ water mon insertion sites (Supplementary Table 1e). was obtained from Millipore. All other chemicals were of analytical grade. Mouse liver tissue sample preparation. Each control and tumor liver tissue Patient samples. Resected tumor samples were obtained from patients under- was weighed accurately and transferred to labeled 15-ml glass centrifuge tubes. going curative resection for HCC (n = 172). Operations took place between Monophasic extraction solvent consisting of chloroform, methanol and water 1991 and 2009 at the National Cancer Centre, Singapore. All samples were (2:5:2 (vol/vol/vol)) was added to each sample such that the liver tissue con- collected in accordance with the requirements of the local Singapore Ethical centration of each sample was 20 mg ml−1. The tissue-solvent mixtures were Committees, and informed consent was obtained from all subjects. The homogenized by ultrasonicating at ambient temperature (24–28 °C) for 90 min patient demographics and clinical characteristics have been described in a followed by mill mixing using stainless steel beads for another 15–20 min. previous study82. Total RNA was isolated using TRIzol (Invitrogen) following The homogenates were centrifuged at 3,000 r.p.m. at ambient temperature for the manufacturer’s instructions, and the RNA concentration was quantified 3 min, and 800 µl of each supernatant was transferred to a clean 15-ml glass by NanoDrop (Thermo Scientific). RNA samples were reverse transcribed tube. Quality-control samples (n = 3) were prepared by pooling equal amounts to cDNA using the SuperScript III cDNA Synthesis Kit (Invitrogen) in of supernatant from four control and four tumor mouse liver tissues. These 10-µl reactions containing 1 µg total RNA and oligo(dT) primers according samples were dried under a gentle flow of nitrogen at 50 °C for 60 min in the to manufacturer’s instructions. Turbovap LV (Caliper Life Sciences). 100 µl of toluene (kept anhydrous with sodium sulfate) was added to each of the tissue extracts, vortexed for 1 min Gene expression analysis from patient samples. Specific primers targeting and dried at 50 °C for 45 min using Turbovap LV in order to eliminate any genes of interest were designed using Primer Express software version 3.0. The trace of water that might interfere with the GC/MS analysis. The dried meta- primer sequences used for specific target amplification (STA preamplification bolic extract was derivatized first with 40 µl of MOX (20 mg ml−1) at ambient reaction), as well as those used for qRT-PCR, are listed in Supplementary temperature for 16 h. Subsequently, 60 µl of MSTFA with 1% TCMS was added Table 4. For the STA reaction, each cDNA sample was preamplified with to the mixture and incubated for 30 min at 70 °C to form the trimethylsilyl 200 nM pooled STA primer mix and TaqMan PreAmp Master Mix (Applied derivatives. The mixtures were cooled, and 90 µl of each derivatized sample Biosystems) in a 5-µl reaction, which was run for 14 cycles according to the was transferred into a vial and subjected to GC/TOFMS analysis. manufacturer’s protocol. To remove unincorporated primers, each sample GC/TOFMS analysis. The GC/TOFMS of the derivatized liver samples was treated with Exonuclease I (ExoI) (Fermentas) after incubation at 37 °C was performed using the Pegasus 4D GC × GC/TOFMS system (LECO for 30 min. For inactivation, the mix was incubated at 80 °C for 15 min in a Corporation). The chromatography separation was conducted in one- second step. At the end of the ExoI treatment, the reactions were diluted 1:5 ­dimensional GC mode in which a 30-m DB-1 GC column with an internal in Tris–ethylenediaminetetraacetic acid (TE) buffer (pH 8.0) before use for diameter of 250 µm and film thickness of 0.25 µm (Agilent Technologies) qRT-PCR. The Fluidigm BioMark real-time PCR system and 48.48 Microfluidic was used. Helium was used as the carrier gas at a flow rate of 1.5 ml per min. Dynamic Arrays were employed for high-throughput qRT-PCR analysis83. An injection volume of 1 µl was used, and the injector split ratio was set to As the volume per inlet is 5 µl, the 6-µl volume per inlet with overage was 1:5. The injector, transfer line and ion source temperatures were maintained prepared. For the samples, 2.7 µl of each STA and ExoI-treated sample were at 220, 200 and 250 °C, respectively, throughout each analysis. The oven tem- mixed with 20× DNA Binding Dye Sample Loading Reagent (Fluidigm) and 2× perature was programmed at 70 °C for 0.2 min, increased at 10 °C per min to

Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature SsoFast EvaGreen SuperMix with Low ROX (Bio-Rad). For the gene expression 270 °C, where it was held for 10 min, and then further increased at 40 °C per assays, 0.3 µl of mix primer pairs (100 µM) was added with 2× Assay Loading min to 310 °C, where it was held for 9 min. The MS was operated in electron Reagent (Fluidigm) after the addition of 1× TE buffer to a 6 µl volume. Before ionization mode (70 eV), and the detector voltage was set at 1,800 V. Data loading the samples and assays into the inlets, the chip was primed in the acquisition was performed in the full scan mode with a mass-to-charge ratio npg NanoFlex 4-IFC Controller. The samples and assays were then loaded into the (m/z) of 40–600 and an acquisition rate of 20 spectra s−1. inlets of the dynamic array. After loading and mixing of the samples and assays Metabolic data preprocessing. Each chromatogram obtained from GC/ into the chip by the IFC Controller, PCR was run with the following reaction TOFMS analysis was processed for baseline correction, noise reduction, conditions: 50 °C for 2 min, 95 °C for 10 min, followed by 40 cycles of 95 °C smoothing, peak deconvolution, analyte alignment, preliminary analyte iden- for 15 s and 60 °C for 60 s. Global threshold and linear baseline correction tification and area calculation using a data processing method created using were automatically calculated for the entire chip. The melting curve analysis ChromaTOF software version 4.21 (LECO Corporation). Only peaks with and threshold cycle (Ct) were provided by Fluidigm Real-Time PCR Analysis a signal-to-noise ratio greater than 210 were used for further analysis. The software version 3. Fold changes in expression of the genes of interest between area of each peak was calculated using unique mass. Preliminary metabolite liver tumors and adjacent nontumor samples were determined using the identifications were performed for peaks with a similarity index of more than comparative Ct method84 following the formula: 2−∆Ct(tumor)/2−∆Ct(nontumor). 60%, and these peaks were assigned putative metabolite identities based on GUSB and Gusb were used as the internal control genes in the human and the National Institute of Standards and Technology (NIST) library, the LECO/ mouse samples, respectively. The −∆Ct data obtained from this calculation Feihn Metabolomics library (LECO Corporation) and internally compiled were then used to generate the heatmap, as well as supervised hierarchical spectral libraries. The resulting data table subsequent to data preprocessing clustering of tumor and adjacent nontumor samples by dChip software. In was then exported to an Excel spreadsheet. For certain samples, missing values addition, statistical analysis was performed by a standard t test to identify in their respective data tables were replaced by either integrating the baseline significant differentially expressed genes (P ≤ 0.05) between the two studied at retention times for which peaks were confirmed to be missing or manually groups for each species. integrating peaks that were confirmed to have their retention times shifted. The total area normalization for each sample was performed by dividing the Pathway and gene ontology analyses. Gene annotation enrichment analyses were integrated area of each analyte by the sum of all peak areas of analytes present completed with DAVID Bioinformatics v6.7 (ref. 85). The list of all human genes in the sample. The normalization aided in correcting variations due to the was used as the default background. For gene ontology analyses, GOTERM and amount of liver collected, sample preparation and analysis. PANTHER were used. For pathway analyses, KEGG was selected. We obtained Chemometric data analysis. Normalized data were exported to SIMCA-P+ enrichment of genes that display a genetic association with cancer through (version 12.0, Umetrics) to perform PCA to identify clustering trends, as well DAVID using GENETIC_ASSOCIATION_DB_DISEASE and OMIM. as to detect and exclude outliers. Before PCA analysis, the GC/TOFMS data

Nature Genetics doi:10.1038/ng.2847 were mean centered and unit-variance scaled. A DModX plot was calculated baseline and direct-current offset corrected based on the last half of acquired to check for any outliers. Quality-control samples were also analyzed in the points. Peaks corresponding to [1-13C]pyruvate and its metabolic deriva- PCA analysis to ensure that the data acquisition for GC/TOFMS metabolic tives [1-13C]lactate, [1-13C]malate, [1-13C]pyruvate hydrate, [1-13C]alanine, profiling was reproducible for all samples. After an initial overview of the [4-13C]oxaloacetate, [1-13C]aspartate and [1-13C]bicarbonate were fitted GC/TOFMS data using PCA analysis, the data were subjected to supervised with prior knowledge assuming a Lorentzian line shape, peak frequencies, data analysis using PLS-DA, in which a model was built and used to identify relative phases and line widths. Quantified peak areas were plotted against marker metabolites that accounted for the differentiation in the two groups time in Excel (Microsoft). The rate of exchange of the 13C label from the consisting of liver tumor tissues and normal matched liver tissues. To validate hyperpolarized pyruvate to each of its downstream metabolites was calculated and investigate overfitting of the data in the PLS-DA model, permutation tests with a kinetic model designed specifically to assess hyperpolarized pyruvate with 100 iterations were carried out using SIMCA-P+. This permutation test metabolism using the fitted peak areas as input data. The model accounts for compared the goodness of fit of the original model with the goodness of fit of many of the variables in the hyperpolarized experiment, including the rate of several models that were based on data for which the order of the Y observa- injection, the initial polarization level of the hyperpolarized pyruvate and the tions were randomly permuted while the X matrix was kept intact. rate of decay of each of the hyperpolarized compounds. First, the change in Marker metabolite screening. The criteria for the selection of marker metab- [1-13C]pyruvate signal over the 60-s acquisition time was fitted to the inte- olites were variable importance values (VIP) >1.0 and t test (Welch’s correc- grated [1-13C]pyruvate peak area data using equation (1): tion) P < 0.05. The fold change of each marker metabolite detected within the control and tumor groups was calculated. Fold change values of less than 1 rateinj −kpyr() t − t arrival  (1 − e ) tarrival≤ t ≤ t end indicated a higher level of marker metabolite in the normal tissue. Hierarchical  k M() t =  pyr (1)(1) clustering was performed using algorithms within dChip software according pyr  −kpyr() t − t end to established methods86. Pearson correlation subtracted from unity was used Mpyyr()t end e t≥ tend as the distance metric using the centroid linkage method, which provides 13 bounded distances in the range (−2, 2). The significance threshold for func- In this equation, Mpyr(t) represents the [1- C]pyruvate peak area as a function tion enrichment was P < 0.01. of time. This equation fits the parameters kpyr, the rate constant for pyruvate −1 −1 signal decay (s ), rateinj, the pyruvate arrival rate (arbitrary units (AU) s ), Ex vivo biochemical enzyme activity assays. For each enzyme activity assay, tarrival, the pyruvate arrival time (s), and tend, the time correlating with the end 100 mg of liver tissue was homogenized in 200 µl of ice-cold 100 mM Tris-HCl of the injection (s). These parameters were then used in equation (2) along 13 13 buffer and then centrifuged for 10 min at 13,000g to remove insoluble material. with the dynamic [1- C]malate and [1- C]aspartate to calculate kpyr→x, the All assays were based on a continuous spectrophotometric rate determination. rate constant for the exchange of pyruvate to each of its metabolites (s−1), −1 Hepatic ALT, AST (aspartate aminotransferase, also known as SGOT) and GLS and kx, the rate constant for signal decay of each metabolite (s ), which was enzyme activities were measured with commercially available enzymatic assay assumed to consist of metabolite T1 decay and signal loss from the low–flip kits (Biovision, catalog #752-100, #753-100; Sigma-Aldrich, EC 3.5.1.2). angle radiofrequency pulses. In equation (2), t′ = t − tdelay, where tdelay repre- sents the delay between pyruvate arrival and metabolite appearance caused by Magnetic resonance spectroscopy (MRS) and metabolic imaging. This tech- the traversal of the pyruvate through the cardiopulmonary circulation before nology is partially described in a recent publication57. Pyruvate polarization arrival at the hepatic arteries:

′   −kx () t′ − tarrival −kpyr (tt− tarrival )  kpyr→xrate inj 1 − e 1 − e   −  t≤ t′ < t  k− k  k k  arrival end  pyr x  x pyr  (2) Mx () t =   M() t k → ′ −k() t′ − t ′ pyr end pyr x −kx () t− tend − pyr end + −kx ( t− tend )) ′ ≥ Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature  (e e ) Mx () tend e t tend  kpyyr − kx

and dissolution (metabolic imaging). 40 mg of [1-13C]pyruvic acid (99% Metabolic imaging. A two-dimensional gradient-echo chemical shift imaging npg carbon-13–labeled, part # 677175, Sigma Aldrich) doped with 15 mM trityl- pulse sequence was implemented in a 9.4 T MRI scanner. Imaging begins about radical (OXO63, GE Healthcare) and 3 µl of gadoterate meglumine (10 mM, 16 s after the infusion of hyperpolarized pyruvate to ensure phase encoding Dotarem, Guerbet) was hyperpolarized in a polarizer with 45 min of micro- starts only when all metabolites began to form. They were then axially imaged wave irradiation. The sample was subsequently dissolved in a pressurized and over a time course of 2 min with TR = 120 ms, TE = 0.6 ms, flip angle = 5°, heated alkaline solution containing 100 mg l−1 ethylenediaminetetraacetic acid spectral width = 80 p.p.m., spectral points = 256, FOV = 25.6 × 25.6 mm2, to yield a solution of 80 mM hyperpolarized sodium [1-13C]pyruvate with a matrix size = 16 × 16 and slice thickness = 5 mm. Post-processing with Matlab polarization of 45% and physiological temperature and pH87. picked up the peak of each metabolite in the spectra, filled up its k space and Metabolic flux in the liver measured by hyperpolarized 13C MRS. Mice were zero filled to 32 × 32, followed by two-dimensional Fourier transform. A refer- positioned in a 9.4 T horizontal-bore MR scanner interfaced to a DD2 console ence proton image was acquired with matrix size = 128 × 128, TR = 100 ms, (Varian Medical Systems) and inserted into a dual-tuned (1H/13C) mouse TE = 1.28 ms and slice thickness = 1 mm. volume coil (diameter, 39 mm). Correct positioning was confirmed by the acquisition of an axial proton gradient-echo image: echo time/repetition time (TE/TR), 8.0/100.0 ms; matrix size, 128 × 128; field of view (FOV), 30 × 30 mm; 75. Postic, C. & Magnuson, M.A. DNA excision in liver by an albumin-Cre transgene occurs progressively with age. 26, 149–150 (2000). slice thickness, 1.0 mm; excitation flip angle, 30°. A respiratory-gated shim was Genesis 76. Thoolen, B. et al. Proliferative and nonproliferative lesions of the rat and mouse used to reduce the proton line width to approximately 160 Hz. Immediately hepatobiliary system. Toxicol. Pathol. 38, 5S–81S (2010). before injection, a respiratory-gated 13C MR pulse-acquire spectroscopy 77. Morrow, M., Samanta, A., Kioussis, D., Brady, H.J. & Williams, O. TEL-AML1 sequence was initiated. 250–350 µl (0.5 mmol per kg body weight) of hyper- preleukemic activity requires the DNA binding domain of AML1 and the dimerization and corepressor binding domains of TEL. 26, 4404–4414 (2007). polarized pyruvate was intravenously injected over 3 s into the anesthetized Oncogene 78. Jee, J. et al. ACT: aggregation and correlation toolbox for analyses of genome tracks. mouse. To localize signal over the liver, a 4- to 6-mm slice excitation was Bioinformatics 27, 1152–1154 (2011). applied. Sixty individual liver spectra were acquired over 1 min after injection 79. Eppig, J.T. et al. Mouse genome informatics (MGI) resources for pathology and (TR, 1 s; excitation flip angle, 25°; sweep width, 4,000 Hz; acquired points, toxicology. Toxicol. Pathol. 35, 456–457 (2007). 80. Kasprzyk, A. BioMart: driving a paradigm change in biological data management. 2,048; frequency centered on the pyruvate resonance). Database (Oxford) 2011, bar049 (2011). 13 MRS data analysis. Liver C MR spectra were analyzed using the AMARES 81. Sayers, E.W. et al. Database resources of the National Center for Biotechnology algorithm as implemented in the jMRUI software package. Spectra were Information. Nucleic Acids Res. 40, D13–D25 (2012).

doi:10.1038/ng.2847 Nature Genetics 82. Chew, V. et al. Chemokine-driven lymphocyte infiltration: an early intratumoural 85. Huang, W., Sherman, B.T. & Lempicki, R.A. Systematic and integrative analysis of large event determining long-term survival in resectable hepatocellular carcinoma. gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009). Gut 61, 427–438 (2012). 86. Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display 83. Spurgeon, S.L., Jones, R.C. & Ramakrishnan, R. High throughput gene expression of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 measurement with real time PCR in a microfluidic dynamic array. PLoS One 3, (1998). e1662 (2008). 87. Schroeder, M.A. et al. Measuring intracellular pH in the heart using hyperpolarized 84. Schmittgen, T.D. & Livak, K.J. Analyzing real-time PCR data by the comparative carbon dioxide and bicarbonate: a 13C and 31P magnetic resonance spectroscopy C(T) method. Nat. Protoc. 3, 1101–1108 (2008). study. Cardiovasc. Res. 86, 82–91 (2010). Nature America, Inc. All rights reserved. America, Inc. © 201 4 Nature npg

Nature Genetics doi:10.1038/ng.2847