Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Cancer Integrated Systems and Technologies Research

Genomic Landscape of Somatic Alterations in Esophageal Squamous Cell Carcinoma and Gastric Cancer Nan Hu1, Mitsutaka Kadota2, Huaitian Liu2, Christian C. Abnet1, Hua Su1, Hailong Wu2, Neal D. Freedman1, Howard H. Yang2, Chaoyu Wang1, Chunhua Yan2, Lemin Wang1, Sheryl Gere2, Amy Hutchinson1,3, Guohong Song2, Yuan Wang4, Ti Ding4, You-Lin Qiao5, Jill Koshiol1, Sanford M. Dawsey1, Carol Giffen6, Alisa M. Goldstein1, Philip R. Taylor1, and Maxwell P. Lee2

Abstract

Gastric cancer and esophageal cancer are the second and that oxidation of guanine may be a potential mechanism sixth leading causes of cancer-related death worldwide. Multi- underlying cancer mutagenesis. Furthermore, we identified ple genomic alterations underlying gastric cancer and esoph- with mutations in gastric cancer and ESCC, including ageal squamous cell carcinoma (ESCC) have been identified, well-known cancer genes, TP53, JAK3, BRCA2, FGF2, FBXW7, but the full spectrum of genomic structural variations and MSH3, PTCH, NF1, ERBB2, and CHEK2, and potentially novel mutations have yet to be uncovered. Here, we report the results cancer-associated genes, KISS1R, AMH, MNX1, WNK2,and of whole-genome sequencing of 30 samples comprising tumor PRKRIR. Finally, we identified recurrent altera- and blood from 15 patients, four of whom presented with tions in at least 30% of tumors in genes, including MACROD2, ESCC, seven with gastric cardia adenocarcinoma (GCA), and FHIT,andPARK2 that were often intragenic deletions. These four with gastric noncardia adenocarcinoma. Analyses revealed structural alterations were validated using the The Cancer that an A>CmutationwascommoninGCA,andinadditionto Genome Atlas dataset. Our studies provide new insights into the preferential nucleotide sequence of A located 5 prime to the understanding the genomic landscape, genome instability, mutation as noted in previous studies, we found enrichment of and mutation profile underlying gastric cancer and ESCC T in the 5 prime base. The A>C mutations in GCA suggested development. Cancer Res; 76(7); 1714–23. 2016 AACR.

Introduction region have been attributed to these diseases (3). However, the cause of the high rates and geographical overlap of these two Gastric cancer and esophageal cancer cause an estimated anatomically adjacent but histologically distinct tumors has not 783,000 and 407,000 deaths, respectively, each year, and repre- been determined. Gastric cancers in this area occur primarily sent the second and sixth leading causes of cancer-related death in the uppermost portion of the stomach and are referred to as worldwide (1). In China, gastric cardia adenocarcinoma (GCA) GCA, whereas those in the remainder of the stomach are referred and esophageal squamous cell carcinoma (ESCC) occur together to as gastric noncardia adenocarcinoma (GNCA). In addition to in the Taihang Mountains of north central China, including being anatomically adjacent, GCA and ESCC share many of the Shanxi and Henan Provinces, at some of the highest rates reported same etiologic risk factors, and before the widespread use of for any cancer (2), and historically over 20% of all deaths in this endoscopy and biopsy, they were diagnosed as a single disease referred to as "esophageal cancer" (4). The reason for the high 1Division of Cancer Epidemiology and Genetics, National Cancer Insti- rates of GCA and ESCC in this geographic area and their relation to tute, NIH, Bethesda, Maryland. 2Center for Cancer Research, National each other remains unclear, but there are almost certainly com- 3 Cancer Institute, NIH, Bethesda, Maryland. Cancer Genomics mon etiologically important environmental exposures, and a Research Laboratory, Leidos, Gaithersburg, Maryland. 4Shanxi Cancer Hospital, Taiyuan, Shanxi, PR China. 5Cancer Institute, Chinese Acad- recent genome-wide association study of germline DNA found emy of Medical Sciences, Beijing, PR China. 6Information Management that the same SNPs in the PLCE1 had the strongest associa- Services, Inc., Silver Spring, Maryland. tions with risk for both GCA and ESCC (5). This led to our Note: Supplementary data for this article are available at Cancer Research concurrent examination of these two cancers plus GNCA in the Online (http://cancerres.aacrjournals.org/). current study. Corresponding Authors: Philip R. Taylor, Genetic Epidemiology Branch, Division Recent advances in next-generation sequencing technology of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical have revolutionized how we study cancer genomes. The iden- Center Drive, Rm 6E444 MSC 90892, Rockville, MD 90892-9769. Phone: 240- tification of IDH1/2 mutations, initially in glioma (6, 7) and 276-7235; Fax: 240-276-7832; E-mail: [email protected]; and Maxwell P. Lee, more recently in many other cancers such as AML (8), has [email protected] transformed our understanding of cancer by relating mutations doi: 10.1158/0008-5472.CAN-15-0338 to metabolic control and epigenetic regulation (9). IDH1/2 2016 American Association for Cancer Research. encodes isocitrate dehydrogenases (IDH), which convert

1714 Cancer Res; 76(7) April 1, 2016

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Somatic Alterations in Esophageal and Gastric Cancer

isocitrate to 2-oxoglutarate. But mutant IDHs produce 2-hydro- Materials and Methods xyglutarate, which inhibits the methyl cytosine hydroxylase Study population TET2 as well as H3K36 demethylases, thus changes the global This study was approved by the Institutional Review Boards of epigenetic landscape. Whole genome sequencing (WGS) is the Shanxi Cancer Hospital, the Cancer Institute and Hospital of particularly useful for elucidating complex genomic changes, the Chinese Academy of Medical Sciences (CICAMS, Shanxi, PR including translocations, inversions, tandem duplications, and China), and the U.S. National Cancer Institute (NCI, Bethesda, large deletions. The importance of these structural changes has MD). Fifteen cases were analyzed with WGS. These cases came from been well documented in the case of BCR-ABL in leukemia, a larger study sample and were selected on the basis of high-quality, TMPRESS2-ERG in prostate cancer (10), and EML4-ALK in lung sufficiently large amount of DNA (requiring at least 10 mg) avail- cancer (11). able for WGS, and patients being deceased. Three cases with ESCC, For gastric adenocarcinoma and ESCC, several publications seven cases with GCA, and four cases with GNCA diagnosed have reported genomic scale analyses of cancers using exome between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, or WGS technology. Wang and colleagues performed exome Shanxi Province, PR China, were recruited to participate in this sequencing of 22 gastric cancer samples and identified fre- study. One ESCC case from Yaocun Commune Hospital in Linxian, quent mutations in ARID1A (12). Mutations in ARID1A were Henan Province was also recruited. None of the cases had therapy particularly high in gastric cancers with microsatellite insta- before their surgical resection. After obtaining informed consent, bility (MSI; 83%) or with Epstein-Barr virus (EBV) infection cases were interviewed to obtain information on demographics, (73%). Exome sequencing of 15 gastric adenocarcinomas and cancer risk factors (e.g., detailed family history of cancer), and their matched normal DNAs by Zang and colleagues also clinical information. Only deceased cases were selected for study. identified frequent mutations of ARID1A (13). In addition, Clinical data are described in Supplementary Table S1. they found 5% of gastric cancer contained FAT4 mutations. A study by Agrawal and colleagues reported exomic sequencing Biological specimen collection and processing of 11 esophageal adenocarcinomas (EAC) and 12 ESCCs Venous blood (10 mL) was taken from each case before surgery NOTCH1 and found frequent mutations in ESCC (14). and germline DNA from whole blood was extracted and purified Nagarajan and colleagues performed WGS analysis for two using the standard phenol/chloroform method. Tumors obtained gastric cancer samples and found three mutational signatures during surgery were snap-frozen in liquid nitrogen and stored at (15). Dulak and colleagues did exome sequencing for 149 130C until used. The specimens were chosen for this study EAC tumor-normal pairs and WGS for 15 EACs and matched based on two criteria: (i) histologic diagnosis of ESCC or gastric > normals. They found a high prevalence of A C transversions at cancer confirmed by pathologists at the Shanxi Cancer Hospital or AAdinucleotides(16).ArecentstudybyWangandcolleagues CICAMS, and the NCI; (ii) availability of high purity tumor tissue – analyzed 100 tumor normal pairs of gastric cancer with (at least >75%). WGS and identified MUC6, CTNNA2, GLI3, RNF43,and RHOA fi as signi cantly mutated driver genes. They found that Tissue DNA isolation RHOA fi mutations are speci c for diffuse-type tumors (17). DNA from frozen tumors was extracted using AllPrep DNA/ Recently, the International Cancer Genome Consortium re- RNA/ Mini Kit (Qiagen, Inc.). DNA was dissolved in 100 search team published a study involving WGS of 17 ESCC mL Buffer BE. Concentrations of DNA were measured with the cases and exome sequencing of 71 ESCC cases, which iden- NanoDrop 2000 Spectrophotomer (Thermo Fisher Scientific) fi ADAM29 FAM135B ti ed two novel cancer driver genes, and according to the manufacturer's instructions. DNAs were run on (18). Lin and colleagues performed exome sequencing on 0.7% agarose gel (UltraPure Agarose Powder, Invitrogen) in 1 20 ESCC cases and targeted sequencing of 139 ESCC cases TAE buffer to identify high-molecular weight genomic DNA (>20 fi and identi ed several mutated genes that were previously Kb single band) and photographed using AlphaImager EC system FAT1, FAT2, ZNF750 KMT2D unknown, including ,and (19), (Biosciences, Inc.). DNA was quantified using Quant-iT Pico- and Gao and colleagues did exome sequencing on 113 ESCC Green Kit (Invitrogen). cases and noted that histone modifier genes were frequently mutated, including KMT2D, KMT2C, KDM6A, EP300,and Whole genome sequencing CREBBP. (20). Zhang and colleagues recently reported exome Of note, 10 mg DNA was sent to CGI for WGS. CGI delivered sequencing of 90 ESCC and WGS of 14 ESCCs and found data for SVs, DNA copy number variations, and single nucle- that an APOBEC-mediated mutational signature in 47% of otide variations (SNV). A detailed description of WGS data can tumors (21). be found in the company's website http://www.completege- Despite this progress, the full spectrum of genomic altera- nomics.com/. Summary of the WGS data is described in Sup- tions in gastric cancer and ESCCs, particularly genomic struc- plementary Table S2. We focused on somatic alterations, that is, tural variations (SV) and intergenic and intronic mutations, tumor-specific changes present in tumor but absent in blood. remainstobecharacterized.Todiscover and characterize geno- We used cgatools, CallDiff, and JunctionsDiff to generate mic alterations in GCA, GNCA, and ESCC, we analyzed whole- somatic SNVs and somatic SVs, respectively. Somatic copy genome sequences of tumors and matched blood DNA samples number alterations (CNA) data came from CGI reports. These generated by Complete Genomics Inc. (CGI). We present here three types of somatic changes for each patient are shown in our findings of novel mutation substitution pattern, driver Circos plots in Fig. 1. mutations, and recurrent SVs. Complete characterization of the genomic landscape of these cancers will hopefully provide new Genotyping with the Affymetrix SNP5.0 array strategies for early diagnosis and therapy for these deadly We performed genotyping experiments for 12 samples using diseases. the Affymetrix GeneChip Human Mapping 5.0 arrays. Briefly,

www.aacrjournals.org Cancer Res; 76(7) April 1, 2016 1715

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Hu et al.

Figure 1. Circos plots of somatic single nucleotide variants, copy number alterations, and structural variations in the 15 cancer genomes. The inner ring displays SVs: black for intrachromosomal SVs and red for interchromosomal SVs. The second ring next to SVs is CNA, shown in gray. The third ring is SNV, shown in green. The outside ring is the chromosome ideogram. The sample description can be found in Supplementary Table S1.

250 ng of DNA was digested with Nsp I or Sty I, ligated to an Table S3). We noted that agreement was slightly lower in adaptor, and amplified by PCR. PCR products were processed tumors (median 98.85%); this reduced concordance was due for fragmentation and labeling; labeled DNA was hybridized to genomic regions containing SVs and CNAs. Concordances onto the chips. The chip was scanned with the Affymetrix for CC0996, EC1475, and EC8413 were below 99%. GeneChip Scanner 7G Plus using Affymetrix GeneChip Com- GEO accession number for the SNP array is GSE43470. mand Console software, and the data files were automatically generated. Genotype calls were generated by Genotyping PCR and Sanger sequencing Console v4.1 software (Affymetrix). SNP calls generated by To validate somatic SNVs and SVs identified from the WGS the SNP array and WGS agreed very well, with a median of data, we used PCR to amplify genomic DNA spanning SNVs or SV 99.65% of the time across all 12 genomes (Supplementary junctions. In addition, we also amplified SV junction sequences

1716 Cancer Res; 76(7) April 1, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Somatic Alterations in Esophageal and Gastric Cancer

from cDNAs. Target regions spanning SNVs and SVs were ampli- analysis.html). We used the following cascade of filters to fied by PCR using genomic DNA or cDNA as a template. The identify potential cancer driver mutations: (i) selection of primers were designed using Primer3 (http://www.broadinsti- variants based on common variants database (1,000 genome, tute.org/genome_software/other/primer3.html), and they are CGI database, and NHLBI ESP exomes); (ii) selection of vari- summarized in Supplementary Tables S7 and S8. PCR products ants based on predicted deleterious effects (literature, SIFT, were purified using QiaQuick PCR Purification Kit (Qiagen), phyloP, etc.). Variants were selected on the basis of phenotypes and sequenced using ABI BigDye Terminator BDT 3.1 (Applied (pathogenic or unknown significance), gain of function (liter- Biosystems). Sequencing reactions were carried out at 96Cfor ature, Ingenuity Knowledge Base, or BSIFT), or loss of function 10 seconds, 50C for 5 seconds, and 60Cfor2minutesfor25 (non-synonymous, damaging by SIFT, splicing sites, or affect- cycles, and the reaction productwasanalyzedin3730XLDNA ing microRNA); (iii) selection of variants based on genetic Analyzer (Applied Biosystems). We carried out validation for a analysis (gain or loss of function, number of alleles, absence subset of somatic missense mutations in subjects CC0996 and of variant in the control). We excluded variants found in any EC0379 using Sanger sequencing. High-quality sequencing blood sample. We included homozygous, hemizygous, or data were obtained for 62 mutations from CC0996; 51 compound heterozygous; (iv) selection of variants were based (82%) validated and 11 did not. For EC0379, we examined on cancer driver variants. 30 mutations, and 17 (63%) were validated. Mutations called by WGS but not validated by Sanger sequencing often had a low Miscellaneous analyses. All statistical analyses and plots were fraction of the mutant allele (below 15% of total sequence generated using the R environment, and we used shell scripts reads). The discrepancy between the two methods is partly and Perl scripts for general bioinformatics data processing. attributable to the stringent criteria of mutation calling we used for the Sanger sequencing data, which usually requires a minimal of 15% of the mutant allele. Validated mutations are Results summarized in Supplementary Table S4. We analyzed 30 genomes from DNAs isolated from two tissues (tumor and blood) from 15 patients, seven with gastric cardia Comparison of somatic mutation call using blood control adenocarcinoma (GCA), four with gastric noncardia adenocarci- versus adjacent normal tissue control noma (GNCA), and four with ESCC, by WGS, generated by the In addition to blood DNA controls, we had WGS data for six Complete Genomics Inc. (CGI). Patient clinical data are described adjacent normal tissues. We compared somatic variant call in Supplementary Table S1. Total sequences generated for each ¼ using blood DNA as control versus adjacent normal tissue as genome ranged from 193.7 Gb to 363.4 Gb (median 321.7 Gb). control. Two tumors with very low somatic mutation rates were Details of our WGS data for each genome are described in excluded from this analysis because they had higher false- Supplementary Table S2. fi positive calls. We found that the concordance between blood We focused on identi cation and characterization of somatic DNA controls versus adjacent normal tissue controls ranged changes, that is, changes in tumors that were absent in matched from 86% to 95%. The tumors with higher somatic mutation germline (blood) DNA. Figure 1 shows the Circos plots of three rates also had higher concordance. This is also consistent with types of somatic changes of interest (SNVs, SVs, and CNAs) for the the validation data from Sanger sequencing described in the 15 cancer genomes Gastric cancer, in particular GNCA, had more previous section. structural alterations than ESCC. Characterization of somatic single nucleotide substitutions Statistical analysis Figure 2A shows somatic mutation rates per million bases. Whole-genome sequence analysis. We used cgatools (http://www. Mutation rates ranged between 12.7 and 70.9 per million bases completegenomics.com/sequence-data/cgatools/) and custom- (median¼33.4) for intergenic regions, between 10.2 and 41.5 built Perl scripts to analyze WGS data. The calldiff and junction- (median¼21.7) for introns, and between 9.4 and 27.8 (med- diff (cgatools) were used to identify somatic SNVs and SVs, ian¼17.3) for exons. These mutation rates were similar to those respectively; the junctions2events method (cgatools) was used reported for colon cancer (22), pancreatic cancer (23), liver cancer to identify genomic events such as deletions, duplications, inver- (24) gastric cancer (12, 13), and esophageal cancer (14). Mutation sions, and complex events that underlie observed SV junction rates were highest in intergenic regions, followed by introns and sequences. The Circos plots were generated as described (http:// exons. This is consistent with the role of a transcription-coupled circos.ca/). repair process in reducing intragenic mutations. The coding mutation rates ranged from 3.9 to 14.0 per million bases (med- Ingenuity variant analysis. For functional interpretation of ian¼8.6) for missense mutations, and from 0.13 to 0.64 (med- somatic mutations, we used Ingenuity Variant Analysis (IVA). ian¼0.36) for nonsense mutations. The mutation rates of syn- IVA uses a cascade of filters to identify potential cancer driver onymous, missense, nonsense, and other changes are summa- mutations; these filters consist of selecting variants based on rized in Fig. 2B. common variants database (1,000 genome, CGI database, and We noted that BC5439, CC1649, CC1730, and EC1475 had NHLBI ESP exomes), predicted deleterious effects (literature, much higher mutation rates, exceeding 40 million per Mb in SIFT, phyloP, etc.), genetic analysis(gainorlossfunction, intergenic regions. To investigate a potential mutagenic mech- number of alleles, absence of variant in the control), and can- anism for the higher mutation rates, we calculated mutation cer driver variants [literature, Catalogue of Somatic Mutations rates for each type of single nucleotide substitution (Fig. 2C). and Cancer, and The Cancer Genome Atlas (TCGA)]. (For de- Transition rates, A>GandC>T, were high as expected. How- tails, see http://www.ingenuity.com/products/ingenuity_variant_ ever, we saw high A>C mutation rates in six of the 11 gastric

www.aacrjournals.org Cancer Res; 76(7) April 1, 2016 1717

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Hu et al.

A B

60 10 40 5 20

0 0

60 10 40 5 20 Type Region Missense 0 Exon 0 Nonsense Intergenic Other Intron 60 Synonymous 10 Mutations per Mb Mutations per Mb 40 5 20

0 0

60 10 40 5 20

0 0 Region Type

CD Mutation spectrum

EC8413 60,000 EC1475 40,000 EC1415 20,000 EC0379 0 CC5615

60,000 CC1791 Log (mutation) 40,000 10 Type CC1730 20,000 A>C 8 CC1649 A>G 0 6 A>T CC1553

C>A prime base Five 4 Mutations 60,000 C>G CC1220 C>T 40,000 CC0996 20,000 BC5439 0 BC5430

60,000 BC1684

40,000 BC1586

20,000 A CCCCGTA C GGGGGTTTTTAAAA C A>C A>G A>T C>A C>G C>T 0 Three prime base Type

Figure 2. Summary of somatic mutations and mutation spectrum in GNCA, GCA, and ESCC genomes and non-negative matrix factorization analysis of SNV substitution matrix. Somatic SNVs are summarized in Figure 2A and B. The numbers of somatic mutations per million bases in intergenic, intronic, and exonic regions of GCA, GNCA, and ESCC genomes are illustrated with bar graphs. Here, somatic mutations include single nucleotide variations and small indels. A, graph shows mutations in intergenic, intronic, and exonic regions. B, graph shows mutations of various types of amino-acid changes. SNV substitution patterns are summarized in C and D. C, graph shows the numbers of somatic mutations for the six types of base substitutions, summarized on the left. We also considered the context of 5 prime base and 3 prime base; there are 96 possible combinations (4 6 4). D, the heatmap shows the results for the 15 tumors.

cancers. This increased A>C mutation was also observed in sequences, and displayed the result in a heatmap (Fig. 2D). several recent studies involving EAC (14, 16) and gastric We found that the A>C mutation showed a preference of cancer (17). We also calculated the nucleotide substitution nucleotide sequence of A located 5 prime to the mutation rates with their dependency on the flanking nucleotide (Fig. 2D).

1718 Cancer Res; 76(7) April 1, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Somatic Alterations in Esophageal and Gastric Cancer

Characterization of cancer driver mutations with the presence of two types of cancers: high and low genomic We characterized somatic mutations to identify potential can- instability; an observation we previously reported in another cer driver mutations by analyzing our WGS data using IVA. Our series of ESCC tumors from this same high-risk region evaluated initial preliminary results from IVA analysis identified 200 genes. by LOH and CNV (25). We further filtered the genes by requiring that they be mutated in To validate SVs identified from the WGS data, we selected at least three tumors. This resulted in a final list of 24 genes. SVs with junction sequences derived from two different genes, Tumors with mutations in the 24 genes are illustrated in Fig. 3. but without containing DNA repeats. We successfully PCR- Half of these genes are well-known cancer genes, including TP53, amplified 12 SV junction fragments and sequenced all of JAK3, BRCA2, FGF2, FBXW7, MSH3, PTCH, NF1, ERBB2, and them by the Sanger method. All 12 SV junction sequences CHEK2. We also identified multiple potentially novel cancer were confirmed, that is, they are identical to the WGS data genes, including KISS1R, AMH, MNX1, WNK2, and PRKRIR. The (data not shown). full list of the 200 genes is summarized in Supplementary Table SVs that recur in multiple cases are of greatest interest. The genes S5. This list is enriched for genes that affect diseases of the stomach across breakpoints are summarized in terms of frequency of (Supplementary Table S6). tumors affected (Fig. 4E). Those affecting at least five tumors are also shown (Fig. 4E), and there are 14 of those genes with SVs Characterization of DNA structural variations occurring in at least five tumors. The details of SVs (only deletions Compared with exome sequencing, an advantage of WGS is its are shown here that span exonic regions) are shown for ability to identify SVs. We focused our analysis on tumor-specific MACROD2, FHIT, and PARK2 (Fig. 5). Four tumors deleted exon SVs (somatic events), and they are summarized in Fig. 4. Figure 4 6ofMACROD2 (NM_080676), which removed amino acids depicts the numbers of SVs per tumor summarized with respect to 140–180 and also caused frame-shift (Fig. 5A). The change likely gene location (Fig. 4A), relation to repeats (Fig. 4B), interchro- resulted in a loss-function mutation of MACROD2. Seven tumors mosomal versus intrachromosomal (Fig. 4C), and genetic events deleted exon 5 of FHIT (NM_002012), which removed the first 35 (Fig. 4D), and recurrence in multiple tumors (Fig. 4E). Note that amino acids and also shifted the reading frame (Fig. 5B). The here the complex type also contained some interchromosomal deletion likely inactivated the protein. The locations and sizes of SVs. We noted a wide range of SVs across the tumors. Such PARK2 deletions varied (Fig. 5C). Two removed exons 3 and 4 variation in frequency of SVs among different tumors is consistent (NM_004562), two removed exon 2, one deleted exon 4, and one

Figure 3. Summary of potential cancer driver mutations. The potential driver mutations were identified using IVA. We identified 24 genes that were frequently mutated in ESCC and gastric cancer. Mutations and affected tumors are shown in the heatmap.

www.aacrjournals.org Cancer Res; 76(7) April 1, 2016 1719

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Hu et al.

A SV breakpoints contained within genes B SV breakpoints contained within repeats EC8413 EC8413

EC1475 EC1475

EC1415 EC1415

EC0379 EC0379

CC5615 CC5615

CC1791 CC1791

CC1730 Gene.class CC1730 Repeat.class Bo1h Bo1h CC1649 None CC1649 None

Tumor One Tumor One CC1553 CC1553

CC1220 CC1220

CC0996 CC0996

BC5439 BC5439

BC5430 BC5430

BC1684 BC1684

BC1586 BC1586

0 50 100 150 200 0 50 100 150 200 Count Count

C Interchromosomal vs Intrachromosomal SVs D Event types of SVs EC8413 EC8413

EC1475 EC1475

EC1415 EC1415

EC0379 EC0379

CC5615 CC5615

CC1791 CC1791 Type CC1730 CC1730 Complex Interchromosomal Deletion Interchromosomal CC1649 CC1649 Duplication Interchromosomal Tumor

Tumor Interchromosomal CC1553 CC1553 Inversion

CC1220 CC1220

CC0996 CC0996

BC5439 BC5439

BC5430 BC5430

BC1684 BC1684

BC1586 BC1586

0 100 200 300 400 0 50 100 150 Count Count E Recurrent SVs

20

15

10

5 The number of genes that contain SVs in multiple tumors of genes that contain SVs in multiple The number

0

3456 78910 The number of tumors with SV for the afected gene

Figure 4. Characterization of somatic SVs in gastric cancer and ESCC genomes. A, bar graph of SVs with respect to whether breakpoints are located with genes or not. The count on x-axis refers to the number of SVs. B, bar graphs of SVs with respect to whether breakpoints are located with repeats or not. C, bar graph of interchromosomal versus intrachromosomal SVs. D, bar graph of SVs classified by the processes that generated these SVs. E, bar graph of recurrent SVs. Sample count refers to the number of tumors affected by the SVs, and gene count refers to the number of genes that showed recurrent SVs for a given number of sample count.

1720 Cancer Res; 76(7) April 1, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Somatic Alterations in Esophageal and Gastric Cancer

Figure 5. IGV views of structural changes of recurrent SVs. IGV views for MACROD2, FHIT, and PARK2. Blue rectangles are exons. Red rectangles are deleted regions, and the number below the box refers to SV id. Tumors are labeled on the left. A, MACROD2;B,FHIT;C,PARK2.

www.aacrjournals.org Cancer Res; 76(7) April 1, 2016 1721

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Hu et al.

deleted exons 2–4. The latter three deletions also caused reading exon 6 of MACROD2 (NM_080676), spanning amino acids 140– frame shift. These deletions likely also inactivated the protein. 180, was deleted. The deletion also caused a frame-shift. Similarly, To seek additional supporting evidence for these SVs, we exon 5 of FHIT (NM_002012), containing the first 35 amino analyzed 39 tumors from the TCGA gastric cancer dataset that acids, was deleted in multiple tumors. The mutation also shifted had high coverage WGS data, which is the subset of the 441 the reading frame. In the case of PARK2, the deletions were more STAD tumors with copy number data that will be discussed heterogeneous, and varied from deletions involving one to three below. We used BreakDancer to identify SVs. All three genes, exons of exons 2, 3, and 4 (NM_004562). These deletions were MACROD2, FHIT,andPARK2, contained deletions involving very frequent, affecting about 50% of gastric cancer samples. They coding exons with 16, 9, and 6 tumors having deletions in were supported by deletion events observed in gastric tumors MACROD2, FHIT,andPARK2, respectively. Since these SVs from the TCGA dataset. It will be interesting to develop deletion- are caused by deletions, we reviewed the deletion analysis specific assays to screen a large number of gastric cancers and to reports from TCGA gastric cancer data performed by the Broad associate the deletions with clinical phenotypes. GDAC team, which has 441 STAD tumor samples analyzed. Our studies provide new insights into understanding the geno- FHIT, MACROD2,andPARK2 were in the 6th, 7th, and 12th mic landscape, genome instability, and mutation mechanisms for most significantly deleted regions (http://gdac.broadinstitute. developing gastric cancer and ESCC. A limitation of the current org/runs/analyses__2014_10_17/reports/cancer/STAD-TP/ study is its small sample size. Future studies should include CopyNumber_Gistic2/nozzle.html). validation of these findings in a larger panel of tumors that are selected from multiple tumor types and use combinations of Discussion exome sequencing, RNA-seq analysis, and target-resequencing for both SNVs and SVs. WGS data provide a unique opportunity to combine SNV with large SVs and to perform an integrated analysis. There are three Disclosure of Potential Conflicts of Interest major findings from this study. First, A>C mutations were com- No potential conflicts of interest were disclosed. mon in GCA and GNCA. In addition to the 5 prime A noted in previous studies of esophageal adenocarcinoma and gastric ade- Authors' Contributions nocarcinoma (14, 16, 17), we found enrichment of 5 prime T, Conception and design: N. Hu, C.C. Abnet, N.D. Freedman, S.M. Dawsey, which was weakly associated with ESCC. Second, we identified 24 A.M. Goldstein, P.R. Taylor, M.P. Lee driver mutations, including a subset that has not been previously Development of methodology: N. Hu, H. Wu, P.R. Taylor, M.P. Lee reported. Third, we identified recurrent chromosome alterations Acquisition of data (provided animals, acquired and managed patients, that occurred in at least 30% of tumors in 14 genes, including provided facilities, etc.): N. Hu, M. Kadota, N.D. Freedman, S. Gere, CAMK1D, MACROD2, ANKRD30BL, FHIT, KCNB2, and PARK2. A. Hutchinson, G. Song, T. Ding, Y.-L. Qiao, J. Koshiol, A.M. Goldstein, P.R. A>C mutation rates are often low (range 2.6%–6.6% of all Taylor, M.P. Lee Analysis and interpretation of data (e.g., statistical analysis, biostatistics, mutations; ref.23). However, a recent exome sequencing study of computational analysis): H. Liu, C.C. Abnet, H.H. Yang, C. Yan, P.R. Taylor, > esophageal cancer found higher A C mutation rates in EAC than M.P. Lee in ESCC (14). Another recent study of EAC also found a high A>C Writing, review, and/or revision of the manuscript: N. Hu, C.C. Abnet, H. Su, mutation rate, particularly with the 5 prime A (16). Our study N.D. Freedman, L. Wang, J. Koshiol, S.M. Dawsey, A.M. Goldstein, P.R. Taylor, extended this observation to gastric adenocarcinoma. We found M.P. Lee high A>C mutations in both GCA and GNCA. In addition, we Administrative, technical, or material support (i.e., reporting or organizing > data, constructing databases): N. Hu, M. Kadota, H. Su, C. Wang, L. Wang, found the enrichment of A C mutations with the prime T in Y. Wang, T. Ding, Y.-L. Qiao, C.G.C. Giffen, P.R. Taylor, M.P. Lee > ESCC. The presence of A C mutations in GCA, GNCA, and ESCC Study supervision: N. Hu, P.R. Taylor, M.P. Lee suggests the oxidation of guanine as a potential mutagen for Other (validation of the results): H. Su gastric cancer and ESCC. In cancer cells under high oxidative Other (validation the results):L. Wang stress, 8-oxo-dGTP accumulating in the DNA can result in G>T, which is equivalent to C>A. It is conceivable that a similar Grant Support mechanism may contribute to the observed high A>C mutation This work was supported by the Intramural Research Program of the NIH and rate. the National Cancer Institute, Division of Cancer Epidemiology and Genetics A major interest of this study was to identify recurrent SVs. We (DCEG), and Center for Cancer Research (CCR). The costs of publication of this article were defrayed in part by the payment found 14 genes with SVs occurring in at least five tumors (Fig. 4). of page charges. This article must therefore be hereby marked advertisement The most frequent SVs appeared as deletions. Deletions that in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. removed coding sequences were of greatest interest. Three exam- ples of these genes are shown in Fig. 5. These deletions were Received February 9, 2015; revised October 19, 2015; accepted December 8, clustered in small genomic regions. In the case of MACROD2, 2015; published OnlineFirst February 8, 2016.

References 1. Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of 4. Liu SF, Shen Q, Dawsey SM, Wang GQ, Nieberg RK, Wang ZY, et al. worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer Esophageal balloon cytology and subsequent risk of esophageal and 2010;127:2893–917. gastric-cardia cancer in a high-risk Chinese population. Int J Cancer 2. Ke L. Mortality and incidence trends from esophagus cancer in selected 1994;57:775–80. geographic areas of China circa 1970–90. Int J Cancer 2002;102:271–4. 5. Abnet CC, Freedman ND, Hu N, Wang Z, Yu K, Shu XO, et al. A shared 3. Li JY. Epidemiology of esophageal cancer in China. Natl Cancer Inst susceptibility in PLCE1 at 10q23 for gastric adenocarcinoma and Monogr 1982;62:113–20. esophageal squamous cell carcinoma. Nat Genet 2010;42:764–7.

1722 Cancer Res; 76(7) April 1, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Somatic Alterations in Esophageal and Gastric Cancer

6. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, et al. An 16. Dulak AM, Stojanov P, Peng S, Lawrence MS, Fox C, Stewart C, et al. Exome integrated genomic analysis of human glioblastoma multiforme. Science and whole-genome sequencing of esophageal adenocarcinoma identifies 2008;321:1807–12. recurrent driver events and mutational complexity. Nat Genet 7. Yan H, Parsons DW, Jin G, McLendon R, Rasheed BA, Yuan W, et al. IDH1 2013;45:478–86. and IDH2 mutations in gliomas. N Engl J Med 2009;360:765–73. 17. Wang K, Yuen ST, Xu J, Lee SP, Yan HH, Shi ST, et al. Whole-genome 8. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, et al. sequencing and comprehensive molecular profiling identify new driver Recurring mutations found by sequencing an acute myeloid leukemia mutations in gastric cancer. Nat Genet 2014;46:573–82. genome. N Engl J Med 2009;361:1058–66. 18. Song Y, Li L, Ou Y, Gao Z, Li E, Li X, et al. Identification of genomic 9. Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman alterations in oesophageal squamous cell cancer. Nature 2014;509:91–5. BP, et al. Identification of a CpG island methylator phenotype that defines a 19. Lin DC, Hao JJ, Nagata Y, Xu L, Shang L, Meng X, et al. Genomic and distinct subgroup of glioma. Cancer Cell 2010;17:510–22. molecular characterization of esophageal squamous cell carcinoma. Nat 10. Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Genet 2014;46:467–73. et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in 20.GaoYB,ChenZL,LiJG,HuXD,ShiXJ,SunZM,etal.Genetic prostate cancer. Science 2005;310:644–8. landscape of esophageal squamous cell carcinoma. Nat Genet 2014; 11. Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, et al. 46:1097–102. Identification of the transforming EML4-ALK fusion gene in non-small-cell 21. Zhang L, Zhou Y, Cheng C, Cui H, Cheng L, Kong P, et al. Genomic analyses lung cancer. Nature 2007;448:561–6. reveal mutational signatures and frequently altered genes in esophageal 12. Wang K, Kan J, Yuen ST, Shi ST, Chu KM, Law S, et al. Exome sequencing squamous cell carcinoma. Am J Hum Genet 2015;96:597–611. identifies frequent mutation of ARID1A in molecular subtypes of gastric 22. Bass AJ, Lawrence MS, Brace LE, Ramos AH, Drier Y, Cibulskis K, et al. cancer. Nat Genet 2011;43:1219–23. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent 13. Zang ZJ, Cutcutache I, Poon SL, Zhang SL, McPherson JR, Tao J, et al. Exome VTI1A-TCF7L2 fusion. Nat Genet 2011;43:964–8. sequencing of gastric adenocarcinoma identifies recurrent somatic muta- 23. Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, et al. Core tions in cell adhesion and chromatin remodeling genes. Nat Genet signaling pathways in human pancreatic cancers revealed by global geno- 2012;44:570–4. mic analyses. Science 2008;321:1801–6. 14. Agrawal N, Jiao Y, Bettegowda C, Hutfless SM, Wang Y, David S, et al. 24. Li M, Zhao H, Zhang X, Wood LD, Anders RA, Choti MA, et al. Inactivating Comparative genomic analysis of esophageal adenocarcinoma and squa- mutations of the chromatin remodeling gene ARID2 in hepatocellular mous cell carcinoma. Cancer Discov 2012;2:899–905. carcinoma. Nat Genet 2011;43:828–9. 15. Nagarajan N, Bertrand D, Hillmer AM, Zang ZJ, Yao F, Jacques PE, et al. 25. Hu N, Wang C, Ng D, Clifford R, Yang HH, Tang ZZ, et al. Genomic Whole-genome reconstruction and mutational signatures in gastric cancer. characterization of esophageal squamous cell carcinoma from a high-risk Genome Biol 2012;13:R115. population in China. Cancer Res 2009;69:5908–17.

www.aacrjournals.org Cancer Res; 76(7) April 1, 2016 1723

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst February 8, 2016; DOI: 10.1158/0008-5472.CAN-15-0338

Genomic Landscape of Somatic Alterations in Esophageal Squamous Cell Carcinoma and Gastric Cancer

Nan Hu, Mitsutaka Kadota, Huaitian Liu, et al.

Cancer Res 2016;76:1714-1723. Published OnlineFirst February 8, 2016.

Updated version Access the most recent version of this article at: doi:10.1158/0008-5472.CAN-15-0338

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2016/02/06/0008-5472.CAN-15-0338.DC1

Cited articles This article cites 25 articles, 5 of which you can access for free at: http://cancerres.aacrjournals.org/content/76/7/1714.full#ref-list-1

Citing articles This article has been cited by 3 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/76/7/1714.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at Subscriptions [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/76/7/1714. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2016 American Association for Cancer Research.