Published OnlineFirst May 2, 2018; DOI: 10.1158/2326-6066.CIR-17-0453
Research Article Cancer Immunology Research Whole Exome and Transcriptome Analyses Integrated with Microenvironmental Immune Signatures of Lung Squamous Cell Carcinoma Jeong-Sun Seo1,2,3,4, Ji Won Lee2,3, Ahreum Kim2,3, Jong-Yeon Shin2,4, Yoo Jin Jung5, Sae Bom Lee5, Yoon Ho Kim5, Samina Park6, Hyun Joo Lee6, In-Kyu Park6, Chang-Hyun Kang6, Ji-Young Yun2,4, Jihye Kim2,4, and Young Tae Kim2,5,6
Abstract
The immune microenvironment in lung squamous cell carci- immune competent. We analyzed infiltrating stromal and noma (LUSC) is not well understood, with interactions between immune cells to further characterize the tumor microenviron- the host immune system and the tumor, as well as the molecular ment. Elevated expression of macrophage 2 signature genes in the pathogenesis of LUSC, awaiting better characterization. To date, immune competent subtype confirmed that tumor-associated no molecularly targeted agents have been developed for LUSC macrophages (TAM) linked inflammation and mutation-driven treatment. Identification of predictive and prognostic biomarkers cancer. A negative correlation was evident between the immune for LUSC could help optimize therapy decisions. We sequenced score and the amount of somatic copy-number variation (SCNV) whole exomes and RNA from 101 tumors and matched noncancer of immune genes (r ¼ 0.58). The SCNVs showed a potential control Korean samples. We used the information to predict detrimental effect on immunity in the immune-deficient subtype. subtype-specific interactions within the LUSC microenvironment Knowledge of the genomic alterations in the tumor microenvi- and to connect genomic alterations with immune signatures. ronment could be used to guide design of immunotherapy Hierarchical clustering based on gene expression and mutational options that are appropriate for patients with certain cancer profiling revealed subtypes that were either immune defective or subtypes. Cancer Immunol Res; 1–12. 2018 AACR.
Introduction immune system (7–9). Most cells in tumor stroma have some capacity to suppress a tumor, although this capacity changes as the Lung cancer is the second leading cause of death in Korea. The cancer progresses; invasion and metastasis can follow (10–13). most common type of primary lung cancer, lung adenocarcino- Immune and stromal characteristics have emerged as prognos- ma, has been characterized at the molecular level (1, 2). Lung tic and predictive factors that could be used to guide a person- squamous cell carcinoma, which accounts for 30% of all lung alized approach in cancer immunotherapy (14, 15). Analyses of cancers (3), is not well characterized due to poor understanding of genomic alterations, especially somatic mutations, have been the cancer's genomic evolution (4) and the antitumor activity of used to predict response to immunotherapy (16, 17). Here, we immune cells (5, 6). Genomic alterations in the tumor charac- used genomic and transcriptomic analysis to define molecular terize various stages of cancer progression. Immune defenses, on subtypes of tumors with immune responses. We show that the other hand, are governed by tumor stroma, including base- genomic alterations affect the tumor microenvironment and ment membrane, extracellular matrix, vasculature, and cells of the tumor development in a subtype-specific manner. The data show how genomic alterations and tumor microenvironment impact 1Precision Medicine Center, Seoul National University Bundang Hospital, cancer proliferation and invasion, and how predicted roles of Seongnamsi, Korea. 2Genomic Medicine Institute (GMI), Medical Research immune cells and their interactions with cancer cells in lung 3 Center, Seoul National University, Seoul, Republic of Korea. Department of squamous cell carcinoma (LUSC) might affect cancer therapy Biomedical Sciences, Seoul National University College of Medicine, Seoul, and patient survival. Republic of Korea. 4Macrogen Inc., Seoul, Republic of Korea. 5Seoul National University Cancer Research Institute, Seoul, Republic of Korea. 6Department of Thoracic and Cardiovascular Surgery, Seoul National University Hospital, Materials and Methods Seoul, Republic of Korea. RNA and whole-exome sequencing Note: Supplementary data for this article are available at Cancer Immunology All protocols of this study were approved by the Institutional Research Online (http://cancerimmunolres.aacrjournals.org/). Review Board of Seoul National University Hospital (IRB J.-S. Seo, J.W. Lee, A. Kim, and J.-Y. Shin contributed equally to this article. No:1312-117-545). Corresponding Authors: Jeong-Sun Seo, Precision Medicine Center, Seoul One hundred and one cases of lung squamous cell cancer National University Bundang Hospital, Seongnamsi 13605, Korea. Phone: 82- samples, taken between 2011 and 2013, were included. Of these 31-600-3001; E-mail: [email protected]; and Young Tae Kim, Department of 101 patients we excluded two patients, a patient treated with one Thoracic and Cardiovascular Surgery, Seoul National University Hospital, Seoul cycle of weekly docetaxel 65 mg and cisplatin 48 mg regimen 03080, Republic of Korea. Phone: 82-22-072-3161; Email: [email protected] preoperatively, and another patient who died of massive pulmonary doi: 10.1158/2326-6066.CIR-17-0453 embolism at 16 days after operation, from subsequent survival 2018 American Association for Cancer Research. analysis. All the tumor and matched adjacent noncancer
www.aacrjournals.org OF1
Downloaded from cancerimmunolres.aacrjournals.org on September 25, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst May 2, 2018; DOI: 10.1158/2326-6066.CIR-17-0453
Seo et al.
Table 1. Clinical data summary ment software. The preprocessing pipeline on the GTAK website Korean (n ¼ 101) was followed (19). The raw read counts were generated using Patient characteristics Number of patients HTSeq-count for each annotated gene. Age at diagnosis, years Median 70 Range 35–83 Unsupervised subtype clustering Sex With the Ensembl gene set, the number of raw reads aligned to Male 95 each gene was computed by HT-seq count and was normalized by Female 6 the Variance Stabilizing Data (VSD) method with use of the R Smoking status package DEseq2. The variance for each gene was calculated, and Never-smoker 12 the top 1,000 genes by variance were selected for PCA analysis Former smoker 61 Current smoker 28 (20). PCA analysis using the 1,000 most variable genes was Median follow-up, months 45 conducted with all tumor and noncancer control samples. Sam- Tumor stage ples were clustered based on principal components into three I51groups noncancer control with 95% confidence interval by hier- II 29 archical clustering methods as implemented in the R package rgl III 21 (21). When analyzing RNA sequencing data, batch effects should IV 0 T stage be considered if experimental conditions and library preparation T1 27 varied. All of our samples were processed in the same batches, thus T2 58 additional batch-effect corrections were not necessary (22). T3 15 T4 1 Differentially expressed gene analysis N stage Differentially expressed genes of tumor subtypes compared with N0 62 N1 24 noncancer control expression in noncancer control cells were fi P < N2 15 determined by the signi cance criteria (adjusted 0.05, |Log2 Recurrancy 31 (fold change)| 1, and base mean 100) as implemented in the R Total LN packages DESeq2 and edgeR. The adjusted P value for multiple Median 30 testing was calculated by using the Benjamini–Hochberg correction – Range 5 66 from the computed P value (23). The centered VSD values of the differentially expressed gene list were applied to the array hierar- chical clustering algorism (Cluster 3.0) with uncentered correlation control tissue specimens were grossly dissected immediate after and average linkage (24). The gene expression pattern was visual- surgery and preserved in liquid nitrogen. Data on clinical ized with use of JAVA treeview. The hierarchical tree by arrays was features such as smoking history, pathologic TNM stage, tumor generated by the clustering process and two types of gene sets in size, and degree of differentiations were collected (Table 1; differentially expressed genes (subtype A-UP and B-DOWN, sub- Supplementary Table S1). For RNA-seq, we extracted RNA from type A-DOWN and B-UP) were selected and enriched for Gene tissue using RNAiso Plus (Takara Bio Inc.), followed by puri- Ontology (GO) gene sets by Gene Set Enrichment Analysis (GSEA) fication using RNeasy MinElute (Qiagen). RNA was assessed for in order to determine genes enriched in ranked gene lists. quality and was quantified using an RNA 6000 Nano LabChip on a 2100 Bioanalyzer (Agilent Technologies). The RNA-seq Fragments per kilobase million (FPKM) calculation and libraries were prepared as previously described (18). normalization For whole-exome sequencing, genomic DNA was extracted and Raw reads (HTseqcounts) were normalized using FPKM as 3 mg from each sample was sheared and used for the construction implemented in the R package edgeR, and the FPKM values were of a paired-end sequencing library as described in the protocol transformed to log2 values and adjusted to the median-centered provided by Illumina. Enrichment of exonic sequences was then gene expression values by subtracting the row-wise median from performed for each library using the SureSelect Human All Exon the expression values in each row (Cluster 3.0). The centered and 50Mb Kit (Agilent Technologies) following the manufacturer's log2-transformed VSD and FPKM expression were used to illus- instructions. trate gene expression pattern in a heat map. Libraries for RNA and whole-exome sequencing were sequenced with Illumina TruSeq SBS Kit v3 on a HiSeq 2000 GSEA and network analysis sequencer (Illumina Inc.) to obtain 100-bp paired-end reads. The GO analysis of the gene expression data was performed using image analysis and base calling were performed using the Illu- GSEA (v2.24) desktop tools (permutation ¼ 1,000) and visual- mina pipeline (v1.8) with default settings. ized by the Enrichment Map tool in Cytoscape [P 0.05, false discovery rate (FDR) q value 0.1, and similarity 0.5; ref. 25]. RNA-seq analysis To characterize the LUSC transcriptome profile in cancer and Tumor microenvironment analysis noncancer control cells, we performed RNA-seq for 101 LUSC and The fractions of stromal and immune cells in tumor samples matched noncancer control samples. Total RNA extracted from were estimated by Estimation of STromal and Immune cells in lung specimens and depleted of ribosomal RNA was sequenced at MAlignant Tumours using Expression data (ESTIMATE) scores, the desired depth (100 ) on RNA-Seq (Illumina HiSeq). The with predictions of tumor purity based on the absolute method reads were aligned to the human genome (version GRCh37) with previously reported (26). The abundance of six infiltrating þ þ the Spliced Transcripts Alignment to a Reference (STAR) align- immune cell types (B cells, CD4 T cells, CD8 T cells,
OF2 Cancer Immunol Res; 2018 Cancer Immunology Research
Downloaded from cancerimmunolres.aacrjournals.org on September 25, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst May 2, 2018; DOI: 10.1158/2326-6066.CIR-17-0453
Genomic Landscape and Microenvironmental Immune Signature
X neutrophils, macrophages, and dendritic cells) in the two sub- Cell-cycle-related SCNV ¼ Copy-number change at copy-number types of LUSC was estimated with the Tumor IMmune Estimation Resource (TIMER) algorithm (27). region of genes used in cell-cycle score
Immune score, stromal score, tumor purity, and cell-cycle score Immune score, stromal score, and tumor purity were made using X Immune-related SCNV ¼ Copy-number change at copy-number ESTIMATE. The gene set for cell-cycle score was used to calculate the cell-cycle score from the one in Davoli and colleagues (28). region of genes used in immune score
Statistical analyses Prediction of neoantigens Quantitative data are presented as mean standard deviation. We predicted neoantigens using the pVAC-Seq pipeline (36). We used R-3.2.3 to perform the statistical analyses. The normality of We used nonsynonymous mutations to follow the pVAC-seq the variables was tested by the Shapiro–Wilk normality test (29). For pipeline. Amino acid changes and transcript sequences were two groups, significance (P value) for normally distributed variables annotated by variant effect predictor. Epitopes predicted by was determined by an unpaired Student t test, and nonnormally HLAminer and were filtered by RNA expression (FPKM>1) and distributed variables were analyzed by the Mann–Whitney U test. coverage (tumor coverage >10 and noncancer control coverage For more than two groups, Kruskal–Wallis and one-way ANOVA >5 ). tests were used for the nonparametric and parametric methods, respectively (30). Statistically significant differences were tested at P Availability of data and material values < 0.05. R value was computed by Pearson and distance Whole-exome and RNA sequencing data are available under correlation. Survival graph was plotted by the Kaplan–Meier meth- the NCBI Sequence Read Archive (SRA) study accession no. od, and comparison between the subtypes was analyzed with the SRP114315. log-rank test, and Cox proportional hazards model was also used for survival analysis (SPSS for Windows, version 22; SPSS Inc.). Results Identification of LUSC subtypes Somatic mutation detection using whole-exome sequencing In this study, 101 LUSC and matched noncancer control fi To nd somatic mutations, we performed whole-exome samples were used to discover significant differential gene expres- sequencing (100 ). Reads were aligned to the NCBI human sion, and principal component analysis (PCA) on entire samples reference genome (hg19) using BWA (31). Picard was applied was performed to identify distinctive clusters based on gene to mark duplicates and we used Genome Analysis Toolkit (GATK variability between LUSC and noncancer control samples. PCA Indel Realigner) to improve alignment accuracy. Somatic single- with the top 1,000 most variable genes and unsupervised hier- nucleotide variants (SNV) from 101 LUSC samples with matched archical clustering with the k-means algorithm were applied to noncancer control samples were called using MuTect (32). GATK's RNA-seq data and distinguished noncancer control from tumor Haplotype Caller was also used for indel detection. All variants samples. In contrast, 19 tumor samples overlapped and were were annotated with information from several databases using more closely associated with the noncancer control group with ANNOVAR (33). 95% confidence interval ellipsoids (Fig. 1A; Supplementary Fig. S1). Additional PCA with 101 tumor samples distinguished the 19 Somatic copy-number variant detection tumor samples (subtype B) from 82 other major tumor samples To detect somatic copy-number variants from whole-exome (subtype A) at the 95% confidence interval. Thus, PCA revealed sequencing data, EXCAVATOR analysis was applied (34). GISTIC two molecular subtypes, A and B. These two subtypes were also fi analysis was used to identify recurrent ampli cation and deletion distinguishable in The Cancer Genome Atlas (TCGA) LUSC peaks (35). cohort (n ¼ 431) by PCA and unsupervised hierarchical clustering at 95% confidence interval (Supplementary Fig. S2). Calculation of SCNV levels To identify the pattern of genomic alterations in LUSC, we fi We obtained ampli ed and deleted copy-number variants sequenced the exome of independent noncancer control samples using EXCAVATOR and GISTIC analyses. Arm and focal SCNV (n ¼ 101) and tumor samples (n ¼ 101). The number of total levels of each patient were respectively calculated by summing the mutations in subtype B was less than that in subtype A (P < 0.001 copy-number changes at each copy-number event. Arm, chromo- by Mann–Whitney U test; Fig. 1B; Supplementary Fig. S3A). some, and focal CNV level were normalized to the mean and Mutations in genes encoding TP53, NAV3, CDH10, KMT2D, standard deviation among the samples (28). X NFE2L2, CTNNA3, KEAP1, NTRK3, RB1, NOTCH1, PTEN, FGFR2 Total SCNV ¼ Copy-number change at total and EGFR were also identified in the current study (Fig. 1C). Mutational signature combinations in subtype A were significant- copy-number region ly different from those in subtype B (P < 0.01 by Mann–Whitney X U test) showing more irregular patterns [Fig. 1D; Supplementary ¼ Arm-level SCNV Copy-number change at arm Fig. S3D, (32)]. We compared the burden of somatic copy-num- > copy-number region ber variation (SCNV; ref. 28). Arm-level SCNVs (length 98% of a chromosome arm), focal-level SCNVs (length < 98% of a X chromosome arm), and total SCNVs (all of a chromosome), Focal-level SCNV ¼ Copy-number change at focal which comprise the burden of SCNV, were mostly found in copy-number region subtype A (Fig. 1E and F). To elucidate the effect of genomic alteration on the tumor microenvironment, we also investigated
www.aacrjournals.org Cancer Immunol Res; 2018 OF3
Downloaded from cancerimmunolres.aacrjournals.org on September 25, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst May 2, 2018; DOI: 10.1158/2326-6066.CIR-17-0453
Seo et al.
Figure 1. Identification of the subtypes of LUSC. A, Principal component analysis (PCA) of the top 1,000 most variable genes among the subtype A, B, and noncancer control were performed across all samples. 3D PCA scores plots of top 1,000 of most variable genes was drawn as meshes containing cancer and noncancer control points (left) and subtype A and B points (right) based on K-means clustering (k-means ¼ 2) on the first three PCs with 95% confidence interval ellipsoids. B, Ratio of somatic synonymous and nonsynonymous mutations in mutations per megabase and subtypes were classified. C, Name of significantly mutated genes (left), distribution of mutations across 101 LUSC, and frequency of significantly mutated genes (right) were plotted. D, The mutational signature revealed by somatic mutations in whole-exome sequencing. E, Arm-level CNV, focal-level CNV, and total level CNV of individual sample displayed in each column. F, Significant, focally amplified (red), and deleted (blue) regions are plotted across chromosomal locations.
OF4 Cancer Immunol Res; 2018 Cancer Immunology Research
Downloaded from cancerimmunolres.aacrjournals.org on September 25, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst May 2, 2018; DOI: 10.1158/2326-6066.CIR-17-0453
Genomic Landscape and Microenvironmental Immune Signature