Whole-Genome Reconstruction and Mutational Signatures in Gastric
Total Page:16
File Type:pdf, Size:1020Kb
Nagarajan et al. Genome Biology 2012, 13:R115 http://genomebiology.com/2012/13/12/R115 RESEARCH Open Access Whole-genome reconstruction and mutational signatures in gastric cancer Niranjan Nagarajan1*†, Denis Bertrand1†, Axel M Hillmer2†, Zhi Jiang Zang3,4†, Fei Yao2,5, Pierre-Étienne Jacques1, Audrey SM Teo2, Ioana Cutcutache6, Zhenshui Zhang2, Wah Heng Lee1, Yee Yen Sia2, Song Gao7, Pramila N Ariyaratne1, Andrea Ho2, Xing Yi Woo1, Lavanya Veeravali8, Choon Kiat Ong9, Niantao Deng10, Kartiki V Desai11, Chiea Chuen Khor4,12, Martin L Hibberd4,12, Atif Shahab8, Jaideepraj Rao13, Mengchu Wu14, Ming Teh15, Feng Zhu16, Sze Yung Chin15, Brendan Pang14,15, Jimmy BY So17, Guillaume Bourque18,19, Richie Soong14,15, Wing-Kin Sung1, Bin Tean Teh9, Steven Rozen6, Xiaoan Ruan2, Khay Guan Yeoh16, Patrick BO Tan10,12,14* and Yijun Ruan2,20* Abstract Background: Gastric cancer is the second highest cause of global cancer mortality. To explore the complete repertoire of somatic alterations in gastric cancer, we combined massively parallel short read and DNA paired-end tag sequencing to present the first whole-genome analysis of two gastric adenocarcinomas, one with chromosomal instability and the other with microsatellite instability. Results: Integrative analysis and de novo assemblies revealed the architecture of a wild-type KRAS amplification, a common driver event in gastric cancer. We discovered three distinct mutational signatures in gastric cancer - against a genome-wide backdrop of oxidative and microsatellite instability-related mutational signatures, we identified the first exome-specific mutational signature. Further characterization of the impact of these signatures by combining sequencing data from 40 complete gastric cancer exomes and targeted screening of an additional 94 independent gastric tumors uncovered ACVR2A, RPL22 and LMAN1 as recurrently mutated genes in microsatellite instability-positive gastric cancer and PAPPA as a recurrently mutated gene in TP53 wild-type gastric cancer. Conclusions: These results highlight how whole-genome cancer sequencing can uncover information relevant to tissue-specific carcinogenesis that would otherwise be missed from exome-sequencing data. Background salt diet, smoking, and infection by Helicobacter pylori Gastric cancer (GC) is the fourth most common cancer [1]. Understanding the mutational impact of these envir- and the second leading cause of cancer death worldwide. onmental exposures on the genomes of gastric epithelial Early stage GC is often asymptomatic or associated with cells is essential to shed light on specific genes and non-specific symptoms, resulting in most patients pre- pathways associated with gastric tumorigenesis. senting at advanced disease stages. Treatment options Previous studies in lung cancer [2,3], melanoma [4], for late-stage GC patients are limited, with surgery and and leukemia [5] have shown that environmental carci- chemotherapy regimens offering modest survival bene- nogens and drugs can elicit specific somatic mutational fits. Environmental risk factors for GC include a high profiles in cancer genomes, referred to as ‘mutational signatures’. While previous studies on GC have applied * Correspondence: [email protected]; [email protected]; exome-sequencing approaches to identify frequently [email protected] mutated genes [6,7], identifying mutational signatures is † Contributed equally best done using whole-genome data, due to its comple- 1Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore teness and ability to simultaneously uncover micro- and 2Genome Technology and Biology, Genome Institute of Singapore, macro-scale somatic alterations. In this study, we sought Singapore 138672, Singapore toprovideamorecomprehensiveunderstandingof Full list of author information is available at the end of the article © 2012 et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Nagarajan et al. Genome Biology 2012, 13:R115 Page 2 of 10 http://genomebiology.com/2012/13/12/R115 mutational processes in GC by analyzing whole-genome Table 1 Somatic variations in two GC tumors identified sequences of two GCs and their matched-normal con- by whole genome sequencing approaches trols, using both short-read (SR) next-generation Patient ID NGCII082 NGCII092 sequencing and a long insert (approximately 10 kbp) SNVs, all somatic 14,856 17,473 DNA paired-end tag (DNA-PET) protocol [8]. We also Coding regions 119 116 sought to explore the combination of these datasets for Non-synonymous 86 73 de novo assembly of cancer and normal genomes and to Promoter regions 101 161 comprehensively catalogue a range of (point mutations Indels, all somatic 11,738 2,486 to megabase-sized) somatic alterations in the tumor. Coding regions 12 2 Finally, we used this catalogue to characterize the CNVs, all somatic 836 21,776 impact of mutational processes on genes and used a Affecting genes 3 265 screening approach to validate recurrently mutated SVs, all somatic 12 146 genes in subtypes of GC defined by specific mutational Affecting genes 11 96 processes. Deletions 6 56 Tandem duplications 2 8 Results Unpaired inversions 0 26 Integrative short read/DNA-PET analysis and de novo Inversions 0 2 assembly Insertions (intra-chromosomal) 0 0 The matched tumor and normal samples analyzed were Insertions (inter-chromosomal) 0 0 from two Singaporean patients. One GC exhibited evi- Isolated translocations 0 3 dence of microsatellite instability (MSI) and active H. Balanced translocations 0 0 pylori infection (see Table S1 in Additional file 1 for other Complex events (intra- chromosomal) 4 49 clinical characteristics). Each tumor and matched normal Complex events (inter- chromosomal) 0 2 sample was sequenced to more than 30-fold average base pair coverage by Illumina SR sequencing (Materials and methods; Table S2 in Additional file 1), and to > 130-fold were unable to identify nearly half of the validated SVs physical coverage using large-insert (approximately 10 and fusions genes; Note 1 in Additional file 1). For exam- kbp) DNA-PET sequencing [9] on the SOLiD platform ple, NGCII092 exhibited a focal genomic amplification on (Materials and methods; Table S3 and Note 1 in Addi- chromosome 12p11-12 in a region containing the wild- tional file 1). Single nucleotide variants (SNVs) and short type KRAS gene, a genomic event frequently observed in insertions and deletions (indels) from tumor and normal GC [10]. The combined SR/DNA-PET data (Materials and genomes were combined to identify somatic variants methods) enabled a detailed putative reconstruction of the (Table 1 and Materials and methods) and reliability of evolutionary lineage of the amplified KRAS locus with somatic calls was confirmed using targeted sequencing concomitant deletion of a proposed tumor suppressor (validation rate of 90% for SNVs and 96% for indels; Mate- gene RASSF8 (as well as another focal amplicon at chro- rials and methods). SR and DNA-PET data were also used mosome 6p) as described in the supplementary text to identify somatic copy-number variations (CNVs) and (Figure 1; Figures S1 and S2 and Note 3 in Additional file structural variations (SVs) (validation rate = 81%; Materi- 1). Reconstruction of the tumor genomes also allowed the als and methods; Note 1 in Additional file 1). prediction of fusion genes and complex rearrangements We integrated the SR and DNA-PET sequence informa- that resemble patterns created by replication coupled tion to perform de novo assembly of the tumor and nor- mechanisms [11] and are further described in the supple- mal genomes. While complete de novo assembly of a mentary text (Note 4 and Figures S3 and S4 in Additional tumor genome still poses significant technical challenges file 1 and Table S6 in Additional file 2). and has not been attempted before, we were able to use Second, a combined SR/DNA-PET analysis allowed us the SR/DNA-PET data to construct highly contiguous to assemble sequences present in the tumor genome but draft assemblies of median scaffold lengths (N50) in the not in the reference human genome. For example, in range of 41 to 148 kb, with DNA-PET data assisting in tri- patient NGCII082 exhibiting active H. pylori infection, pling sequence contiguity of the assemblies (Materials and we detected approximately 2,000 short-sequence reads methods; Note 2 and Table S5 in Additional file 1). Impor- and > 600 DNA-PET tags corresponding to the H. pylori tantly, performing de novo SR/DNA-PET assembly genome (the first such report for a bacterial pathogen revealed several findings not observed using conventional from tumor sequencing), in addition to a tumor-asso- analyses of the SR data. First, the de novo approach ciated microbiome (these were not seen in NGCII092; allowed for characterization of large-scale somatic struc- see Figure S5 and Note 5 in Additional file 1 for details). tural variations at single base-pair resolution (SR libraries Note that, despite being fewer in number, the DNA-PET Nagarajan et al. Genome Biology 2012, 13:R115 Page 3 of 10 http://genomebiology.com/2012/13/12/R115 (a) NGCII082 20 1 2 3 4 5 6 7 8 9 10 11 12 1314151617 19 21 X 18 20 22 15 10 Copy number 5 2 NGCII092 20 15 10 Copy number 5 2 (b) chr12 0M 10M 20M 30M 40M 50M 60M 70M 80M 90M 100M 110M 120M 130M 20 10 Copy number 0 chr12:17-34Mb 17M 18M 19M 20M 21M 22M 23M 24M 25M 26M 27M 28M 29M 30M 31M 32M 33M 34M KRAS 20 RASSF8 10 Copy number 0 Cluster size 0 45 57 50 61 76 100 129 150 127 122 200 250 282 300 (c) SOX5 OVCH1 OVCH1 SOX5 129_+24063846..24075698 129_+29495931..29508168 Figure 1 Copy number of two gastric cancer genomes, mechanism of 12p amplification and creation of a fusion gene.