Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Cancer Genome and Epigenome Research

The Open Chromatin Landscape of Non–Small Cell Lung Carcinoma Zhoufeng Wang1, Kailing Tu2, Lin Xia2, Kai Luo2,Wenxin Luo1, Jie Tang2, Keying Lu2, Xinlei Hu2, Yijing He2, Wenliang Qiao3, Yongzhao Zhou1, Jun Zhang2, Feng Cao2, Shuiping Dai1, Panwen Tian1, Ye Wang1, Lunxu Liu4, Guowei Che4, Qinghua Zhou3, Dan Xie2, and Weimin Li1

Abstract

Non–small cell lung carcinoma (NSCLC) is a major cancer identified 21 joint-quantitative trait loci (joint-QTL) that type whose epigenetic alteration remains unclear. We ana- correlated to both assay for transposase accessible chroma- lyzed open chromatin with matched whole-genome tin sequencing peak intensity and gene expression levels. sequencing and RNA-seq data of 50 primary NSCLC cases. Finally, we identified 87 regulatory risk loci associated with We observed high interpatient heterogeneity of open chro- lung cancer–related phenotypes by intersecting the QTLs matin profiles and the degree of heterogeneity correlated to with genome-wide association study significant loci. In several clinical parameters. Lung adenocarcinoma and lung summary, this compendium of multiomics data provides squamous cell carcinoma (LUSC) exhibited distinct open valuable insights and a resource to understand the land- chromatin patterns. Beyond this, we uncovered that the scape of open chromatin features and regulatory networks broadest open chromatin peaks indicated key NSCLC genes in NSCLC. and led to less stable expression. Furthermore, we found that the open chromatin peaks were gained or lost together Significance: This study utilizes state of the art genomic with somatic copy number alterations and affected the methods to differentiate lung cancer subtypes. expression of important NSCLC genes. In addition, we See related commentary by Bowcock, p. 4808

Introduction mount (4). The epigenetic profiles, such as DNA methylation (5), histone modifications (6), and noncoding RNA (7) have been Lung cancer is one of the leading causes of cancer-related death characterized in NSCLC. Until now, however, the open chromatin worldwide (1). Non–small cell lung carcinoma (NSCLC) landscape of NSCLC remains undetermined. accounts for approximately 85% of lung cancer cases, with lung Recently, the highly efficient assay for transposase accessible adenocarcinoma and lung squamous cell carcinoma (LUSC) chromatin sequencing (ATAC-seq) approach (8) has successfully being the two major histologic types (2). Recent genome sequenc- mapped genome-wide open chromatin patterns in multiple ing efforts have identified millions of somatic mutations in human cell types and provided valuable insights into the under- NSCLC (3), which further led to the discovery of "driver muta- lying regulatory mechanisms (9). Several works have profiled the tions" in key oncogenes. While the genetic factor only accounts for open chromatin state in lymphocytic leukemia (10, 11). To date, a parts of the interpersonal variability in NSCLC risk, the epigenetic few studies have characterized open chromatin state in primary contributions to this disease are becoming increasingly para- NSCLC samples. A recent work published by Corces and collea- gues cataloged the open chromatin states of 23 cancer types (12), 1Department of Respiratory and Critical Care Medicine, State Key Laboratory of including NSCLC. However, none has associated the open chro- Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China. matin variations to genomic alterations among patients with 2 National Frontier Center of Disease Molecular Network, State Key Laboratory of NSCLC. It is known that the profile of open chromatin in primary Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China. fi 3Lung Cancer Center, West China Hospital Sichuan University, Chengdu, Sich- NSCLC samples is particularly dif cult partially because of cancer uan, China. 4Department of Thoracic Surgery, West China Hospital, Sichuan tissues confounded by cell-type heterogeneity (13). Recent work University, Chengdu, Sichuan, China. has shown that using negative cell isolation to deplete immune fi fi Note: Supplementary data for this article are available at Cancer Research cells and broblasts could signi cantly increase the purity of Online (http://cancerres.aacrjournals.org/). cancer cells from primary tumor samples (14), enabling - ingful epigenomic analysis. Z. Wang, K. Tu, L. Xia, K. Luo, and W. Luo contributed equally to this article. The integrative analysis combining whole-genome sequencing Corresponding Authors: Dan Xie, West China Hospital of Sichuan University, (WGS) and RNA-seq data with open chromatin state from the Chengdu, Sichuan 610000, China. Phone: 136-9346-2346; Fax: 028-85164165; same patients could potentially delineate the effects of genomic E-mail: [email protected]; and Weimin Li, Phone: 189-8060-1009; E-mail: [email protected] alterations on the gene regulatory network in NSCLC. Notably, we can explore the effects of genomic mutations and structure varia- Cancer Res 2019;79:4840–54 tions on open regulatory elements and understand how it is doi: 10.1158/0008-5472.CAN-18-3663 associated with NSCLC transcriptome. In this study, we generated 2019 American Association for Cancer Research. matched ATAC-seq, WGS, and RNA-seq data from 50 primary

4840 Cancer Res; 79(19) October 1, 2019

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

NSCLC cases. The comprehensive open chromatin landscape of to the instructions of the manufacturer. The RNA quality was NSCLC provided important resources that allowed us to identify assessed at the Bioanalyzer 2100 DNA Chip 7500 (Agilent Tech- key regulatory elements in this disease. The integration of mul- nologies), and samples with an RNA integrity number of over 7 tiple-omics datasets revealed novel insights into the gene regu- were further analyzed by RNA-seq. RNA sequencing libraries were latory mechanism of NSCLC. generated using the rRNA-depleted RNA by NEBNext Ultra Direc- tional RNA Library Prep Kit for Illumina following the manu- fi Materials and Methods facturer's recommendations. The products were puri ed (AMPure XP system) and library quality was assessed on the Agilent Patients and clinical information Bioanalyzer 2100 system. The libraries were sequenced on an Patients with NSCLC were staged according to the American Illumina HiSeq 4000 platform, and 150 bp paired-end reads were Joint Committee on Cancer version 6 and initially diagnosed with generated. lung cancer at West China Hospital of Sichuan University (Chengdu, China) from September 2016 to December 2018. WGS data processing Information, including patients' age, gender, ethnicity, patholo- Raw pair-end WGS reads were subjected to adapter and low- gy, and tumor stage was collected for these 51 patients (Supple- quality sequence trimming by using Trimmomatic (version 0.36; mentary Table S1). All of them received surgical treatment, and ref. 15) with default parameters. We mapped trimmed pair-end none of them underwent neoadjuvant therapy before surgery. reads to human reference build hg19 by using BWA mem (version Tumors and matched distal normal lung tissues were obtained 0.7.13-r1126; ref. 16). BAMs were sorted and indexed using during surgery. All samples were evaluated by two pathologists to SAMtools (version 1.3; ref. 17), and marking duplicates using determine the pathologic diagnosis and tumor cellularity. Only Picard (version 2.2.1; http://broadinstitute.github.io/picard.). tumor tissues containing at least 80% of tumor cells were includ- The Genome Analysis Toolkit (GATK, version 3.6; ref. 18) was ed. This study was approved by the Institutional Review Board of used for local realignment and base quality recalibration, proces- West China Hospital of Sichuan University (Chengdu, China; sing tumor/normal pairs independent. project identification code: 2017.114) and all patients provided written informed consent. Germline mutation detection Using default parameters, GATK HaplotypeCaller (18) was ATAC-seq library preparation and sequencing used to detect germline single-nucleotide variant (SNV) and To profile open chromatin, ATAC-seq was performed as indels. The known sites' files used for germline SNVs and indels 5 described previously (8). A total of 1 10 cell pellets were calling were downloaded from ftp://gsapubftp-anonymous@ftp. washed once with PBS and cells were pelleted by centrifugation broadinstitute.org/bundle/. using the previous settings. Cell pellets were resuspended in 50 mL of lysis buffer, and nuclei were pelleted by centrifugation for 10 Somatic mutation detection minutes at 500 g,4C. The supernatant was discarded, and Somatic SNVs and indels were predicted using MuTect2 (19) nuclei were resuspended in 50 mL reaction buffer containing 2.5 and VarScan2 (20) with default parameters. Somatic SNVs and mL of Tn5 transposase and 22.5 mL of TD buffer (Nextera Illu- indels identified by VarScan2 (20) were retained if all of the mina). The reaction was incubated at 37C for 30 minutes. following criteria were met: (i) P value of the reported somatic Tagmented DNA was isolated by MinElute PCR Purification Kit SNV 0.05; (ii) the natural of the reported somatic (Qiagen). Libraries were amplified for 10 cycles and purified using SNV 5%; (iii) tumor frequency of the reported somatic SNV 2 SPRI Cleanup (Agencourt). Library sizes were determined 10%; and (iv) the count of reads that supported the reported using LabChip GXII Touch HT (PerkinElmer), and 2 75 paired- somatic SNV 2. Only somatic SNVs and indels identified by end sequencing performed on NextSeq500 (Illumina) to yield on both MuTect2 (19) and VarScan2 (20) were retained for further average 50 M reads/sample. analysis. All somatic SNV and indels were functionally annotated by ANNOVAR (version 20160201; ref. 21). DNA-seq library preparation and sequencing WGS was performed with DNA extracted from fresh-frozen Somatic copy-number variation detection tumor and normal material. DNA was extracted from fresh-frozen Somatic copy-number variations (CNV) were called for 42 tissues, using the Gentra Puregene DNA Extraction Kit (Qiagen) tumor samples with paired adjacent tissue samples using Con- following the protocol of the manufacturer. DNA isolates were trol-FreeC (version 9.5; parameters: ploidy ¼ 2, breakPointThres- hydrated in TE buffer and confirmed to be of high molecular hold ¼ 0.8, breakPointType ¼ 0.8, coefficientOfVariation ¼ 0.05, weight (10 kb) by agarose gel electrophoresis. The concentration contamination ¼ 0, contaminationAdjustment ¼ FALSE, min- was measured using Qubit 2.0 Fluorometer (Life Technologies). MappabilityPerWindow ¼ 0.85, window ¼ 5000, numberOfPro- Short insert DNA libraries were prepared with the TruSeq Nano cesses ¼ 10, mateOrientation ¼ FR; ref. 22). The GISTIC2 (version DNA HT Sample Prep Kit (Illumina), the DNA libraries were 2.0.22; ref. 23) was used to identify regions of the genome that sequenced on Illumina HiSeq X platform, and 150 bp paired-end are significantly amplified or deleted across 37 samples (5 sam- reads were generated. Human DNA libraries were sequenced to ples that had abnormal somatic mutation count were removed; obtain coverage of minimum 30 or 50 for both tumor and parameters: -genegistic 1 -smallmem 1 -broad 1 -brlen 0.98 -conf matched normal. 0.90 -savegene 1).

RNA-seq library preparation and sequencing ATAC-seq data processing For RNA extractions, tissue sections were first lysed and homog- Paired-end ATAC-seq fragments of human samples were enized with the TissueLyser (Qiagen). Subsequent RNA extrac- aligned to human reference build hg19 using BWA mem com- tions were performed with the Qiagen RNeasy Mini Kit according mand with -M parameter. PCR duplicates were removed by

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4841

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

Picard (version 1.119). Then, reads mapped to mitochondria reads; the WGS depth exceeded 30; and the RNA-seq data were discarded. Only uniquely mapped paired-end reads generated 80.6165.6 M ( ¼ 95.63 M) reads (Supple- with fragment length less than 2,000 bp and mapping quality mentary Table S2). > 30 were kept for further analysis. ATAC-Seq peak regions of We characterized the genomic variations, including somatic each sample were called using MACS2 (version 2.1.0; ref. 24) SNV, indel, and CNV for each tumor sample (see Materials and with parameters: –nomodel –nolambda –call-summits -q 0.001. Methods). The majority of the somatic SNVs and indels were Blacklisted regions (http://hgdownload.cse.ucsc.edu/golden located in intergenic regions and introns, whereas only a small Path/hg19/encodeDCC/wgEncodeMapability/) were excluded proportion of somatic SNVs were located in exons (Fig. 1B). In from called peaks. Open chromatin regions from The Cancer accordance with previous report (3), we identified similar somat- Genome Atlas (TCGA) were converted to hg19 genome using ic SNVs in driver genes of NSCLC, including TP53 in 26.92% of liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). To gener- patients with lung adenocarcinoma and 33.33% of patients with ate a consensus set of unique peaks, we scanned the genome using LUSC, EGFR in 19.23% of patients with lung adenocarcinoma, 200 bp sliding windows and 100 bp step size. Adjacent open CSMD3 in 33.3% of patients with LUSC, and NFE2L2 in 33.3% windows that appeared in at least one sample were combined as a of patients with LUSC (Fig. 1C). The transition to transversion consensus open region list. Jaccard index, which captures the ratioofsomaticSNVrangedfrom0.22to2.1inlungadeno- proportion of open chromatin regions that are shared by two carcinoma and from 0.04 to 0.69 in LUSC (Supplementary Fig. samples, was used to measure the similarity between samples. S1A). The proportion of C>A transition, which was reported to Using all ATAC-seq data from NSCLC samples, we constructed a be associated with smoking and lung cancer, was significantly symmetric similarity matrix based on the presence of the open higher in NSCLC samples than benign samples (P ¼ 0.04878; region. This distance matrix was clustered using hierarchical Mann–Whitney U test; Supplementary Fig. S1B). We further clustering algorithm from the "pheatmap" R package. Cluster- performed mutation signature analysis on the somatic SNV (26) specific accessible peaks were identified with Dunn (1964) Krus- and identified four enriched mutational signatures in all kal–Wallis test. All ATAC-seq peaks that were no more than 2.5 kb NSCLC samples (S1–S4, Supplementary Fig. S1C; Supplemen- away from annotated gene TSS (GENCODE v19) were selected as tary Materials and Methods). Three of the four enriched sig- promoter ATAC-seq peaks. GREAT functional enrichment anal- natures (S2–S4) had high cosine similarity to cancer-related ysis (version 3.0.0; ref. 25) was used to catalog the function of signatures cataloged in the COSMIC database (Supplementary peak sets with default parameter. Materials and Methods; ref. 27). The signature S2 was analo- gous to COSMIC signature 9 (cosine similarity: 0.57). COSMIC RNA-seq processing signature 9 has been found in chronic lymphocytic leukemia, as Raw RNA-seq reads were aligned to the hg19 genome assembly well as malignant B-cell lymphomas and nerized by a pattern of using Hisat2, and FPKM of gene expression were quantified by mutation caused by polymerase h. The signature S3 resembled StringTie. Genes with mean FPKM greater than 1 in either lung COSMIC signature 5 (cosine similarity: 0.65) that existed in all adenocarcinoma group or LUSC group were retained. In addition, kinds of cancers and most cancer samples. The signature S4 only genes annotated as protein_coding, lincRNA, snRNA, matchedtoCOSMICsignature3(cosinesimilarity:0.73).This miRNA, and snoRNA were kept and used for xQTL association COSMIC signature was observed in breast, ovarian, and pan- analysis, leaving 12,776 genes for expression quantitative trait creatic cancers and associated with failure of DNA double- locus (eQTL) analysis. strand break repair by homologous recombination. Interest- ingly, all LUSC samples were strongly associated with S3, Data access whereas most of the lung adenocarcinoma samples were asso- Processed data of NSCLC samples in this study, including ciated with S2 (Fig. 1D). We clustered all the samples based on ATAC-seq peaks, gene expression matrix, and genomic altera- the contributions of these four signatures. The three pathologic tions, are available via the data center of precision medicine in types of samples (lung adenocarcinoma, LUSC, and BSPN) West China Hospital (https://pms.cd120.com/download.html; could be well classified, with only one lung adenocarcinoma Sichuan, China). sample being misclassified to the LUSC cluster (Fig. 1E). As is supported by previous TCGA study (28), many known NSCLC driver genes were found to have significant amplifica- Results tion such as EGFR and NKX2-1 (FDR < 10e-3), or deletion such Generation of matched ATAC-seq, genome, and transcriptome as CDKN2A (FDR < 10e-5) in lung adenocarcinoma samples. data in NSCLC samples Besides, we also observed the amplification of other known To reveal the landscape of open chromatin in NSCLC and lung cancer–related genes (Supplementary Table S3), including elucidate the association between the open chromatin, genomic DDR2, TPM3, TPR,andLRIG3 (FDR < 0.1). Furthermore, variations, and gene expression profiles in NSCLC, we collected significant deletions of other known cancer-related genes were primary tumors from 51 patients, including 34 patients with lung also identified (see Materials and Methods), like PIK3R1, IL6ST, adenocarcinoma, 13 patients with LUSC, and 4 patients with and MAP3KQ (FDR < 0.1; Fig. 1F; Supplementary Table S3). In benign solitary pulmonary nodules (BSPN; Supplementary Table terms of somatic CNV in LUSC, we found significant amplifi- S1; see Materials and Methods). We also collected paired adjacent cation of SOX2,BCL11A,REL,WHSC1L1,andFGFR genes, and tissues from 40 patients with NSCLC and 2 patients with non- significant deletion of CDKN2A. These results were consistent malignant nodules for somatic mutation calling (see Materials with the TCGA study (29). Also, we found significant deletions and Methods). For each sample, we performed ATAC-seq, WGS, of other known genes that are frequently mutated in lung and RNA-seq (Fig. 1A; see Materials and Methods). Each of the cancer, such as MYCL, ROS1,andEZR (FDR < 10e-2;Fig.1F; ATAC-seq data generated 61.72243.83 M (median ¼ 103.47 M) Supplementary Table S3).

4842 Cancer Res; 79(19) October 1, 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

A Open ATAC-seq chromatin peaks

WGS Cancer regulatory network

Gene Mutations Tumor Adjacent tissue RNA-seq expression (SNV,CNV)

B C

LUSC LUAD 50,000 Exonic Exonic;splicing EZR STRN Frameshift deletion SMARCA4 Nonsynonymous SNV 40,000 Intergenic HIP1 Intronic RBM10 Stopgain Synonymous SNV ncRNA Exonic DROSHA 30,000 RFWD3 ncRNA Exonic;splicing EML4 NRG1 ncRNA Intronic

Count EPHA3 FAM47C 20,000 ncRNA Splicing Splicing FAM135B FAM135B 10,000 CUL3 3’ UTR NFE2L2 5’ UTR CSMD3 CSMD3 0 5’ UTR;3’ UTR EGFR TP53 TP53

0123 0246 LUAD4 LUAD5 LUAD1 LUAD2 LUAD6 LUAD7 LUAD8 LUAD9 LUSC3 LUSC2 LUSC1 LUSC4 LUSC5 BSPN1 BSPN3 LUAD13 LUAD24 LUAD11 LUAD14 LUAD19 LUAD20 LUAD22 LUAD23 LUAD28 LUAD29 LUAD30 LUAD31 LUAD12 LUAD25 LUAD27 LUAD32 LUAD50 LUAD51 LUSC11 LUSC13 LUSC14 LUSC10 Counts of mutated cases Counts of mutated cases D E S1 S2 S3 0.04 0.75 S4

0.50

0.02 0.25

Signature contribution 0.00 0.00 LUAD1 LUAD5 LUAD8 LUAD6 LUAD9 LUSC2 LUAD7 LUAD4 LUSC5 LUSC4 LUSC1 LUAD2 LUSC3 BSPN1 BSPN3 LUAD22 LUAD23 LUAD24 LUAD25 LUAD28 LUAD19 LUAD31 LUAD32 LUAD29 LUAD51 LUAD14 LUAD30 LUAD27 LUSC10 LUSC11 LUAD11 LUSC14 LUAD50 LUAD13 LUAD12 LUAD1 LUAD8 LUAD2 LUAD4 LUAD5 LUAD6 LUAD9 LUAD7 LUSC2 LUSC4 LUSC5 LUSC1 LUSC3 BSPN1 BSPN3 LUAD11 LUAD13 LUAD19 LUAD23 LUAD32 LUAD24 LUAD25 LUAD28 LUAD30 LUAD51 LUAD12 LUAD14 LUAD22 LUAD27 LUAD29 LUAD31 LUAD50 LUSC11 LUSC10 LUSC14

F LUAD LUSC 1q21.2(1) 1p34.2(1) MYCL 1 1 DDR2, TPM3, TPR 1q21.3(26) 2p21(4) 1q44(4) 2p15(3) 2 2 BCL11A, REL 2q14.2

3 2q37.3 3 3q25.1(4) 3q27.1(2) 5p15.33(2) 4q13.2 SOX2 4 5p13.2 4 PIK3R1, IL6ST, MAP3K1 5q13.2(3) 6p22.1(1) 5 6p21.32 5 6p21.32 7p22.3 6p21.32(37) ROS1, EZR 6 7p11.2(1) 6 EGFR 7q11.21 7 8q21.11 7 8p22(1) PCM1 WHSC1L1, FGFR1 8p11.23(2) 8q21.13 8 8 8q21.2(1) 8p11.22 9p21.3(1) CDKN2A 9p21.3(1) CDKN2A 9 8q22.3(2) 9 8q24.22(1) 10 8q24.3 10 11 12q14.1(4) 11 12 LRIG3 12q14.2 12 12q15 12q12(1) 13 12q21.1 13 14 NKX2-1 14q13.3(1) 14 14q21.1 15 15 14q23.3(1) 16 14q32.2 16 17p11.2 17 14q32.31 17 17q22 18 18 19p13.3 19p13.3 19 19q13.11(2) 19 20 20q13.33 20 21 21 22q11.21(1) 22q11.23 22 21q22.13(5) 22 10−6 10−4 −2 1010 −1 10−1 10−2 10−3 10−510−8 10−8 10−5 10−2 10−1 10−1 10−2 10−4 10−7 Q-value Q-value

Figure 1. Overview of study and genetic landscape of patients with NSCLC. A, Overview of study design. B, Barplot shows the number of somatic SNV located on different genomic features in each sample. C, Barplot shows the counts of samples harboring different somatic SNVs and indels in known genes related to cancer. D, Barplot shows the signature contribution in each sample. E, Dendrogram represents the results of hierarchical clustering of samples according to four de novo mutational signatures. F, Line chart shows significant copy number gains (orange) and losses (blue; FDR < 0.1). LUAD, lung adenocarcinoma.

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4843

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

Open chromatin signature of NSCLC stage [q-value < 0.02, Dunn (1964) Kruskal–Wallis To identify open chromatin regions, we applied ATAC-seq on test; Fig. 2C; Supplementary Table S6]. We identified 279 eachtumorsample(seeMaterialsandMethods).Thenumbers open chromatins peaks that were specific to the "early stage" of open chromatin peaks in each sample ranged from 3,989 to related subcluster (see Materials and Methods). The genes 211,042, showing high (Supplementary Fig. S2A). We associated with these peaks were enriched in the integrin then filtered out six samples whose ATAC-seq peak numbers signaling pathway, which played important roles in the were less than 20,000, leaving 40 ATAC-seq data for the attachment of cells to the extracellular matrix (Supplementary following analysis. Because tumor samples were known to be Table S7). For instance, among the "early stage," specificopen heterogeneous (13), we evaluated the intersample heterogene- chromatin genes were ITGAV and ITGA6, which interacted ity at open chromatin level using the Jaccard index score (30). with vitronectin and laminin, respectively. These two genes Comparing with ATAC-seq peaks in stem cells and sorted blood were reported to be prognostic indicators in multiple cancers, lineage cells (9), which were considered to be homogenous, the including NSCLC, whose high expression correlated to longer ATAC-seqpeaksfromNSCLCsampleshadsignificantly smaller survival time (32). On the other hand, we identified 6,664 Jaccard index scores (Fig. 2A), indicating that NSCLC samples chromatin regions that were specifically closed in the "early contained more heterogeneous open chromatin regions across stage" cluster. Their related genes were enriched in "spongio- samples. Unsurprisingly, the open chromatin peaks that were trophoblast layer development" and "glomerular visceral shared across NSCLC samples mainly associated with house- epithelial cell differentiation" pathways (Supplementary Table keeping functions, such as gene regulation, translation, and S7). Overexpression of genes in these pathways led to elevated DNA damage checkpoint (Supplementary Table S4). To explore cell mobility, cancer cell invasion, and tumor malignant pro- whether sample-specific peaks could reveal function specificto gression, which were considered later stage cancer pheno- the corresponding context, we then sorted the samples based types (33). To further confirm that open chromatin distribution on the percentage of minority peaks (peaks appeared in less is a potential marker to classify lung cancer samples, we than 20% samples). Interestingly, the percentage of minority incorporated 76 QC-passed ATAC-seq data of NSCLC samples peaks was significantly correlated with pathologic types (P ¼ from 38 patients with TCGA (12) and 5 ATAC-seq data of 0.0003693, Mann–Whitney U test), where LUSC samples small-cell lung cancer (SCLC) cell lines (34) with our data. tended to contain more minority peaks than lung adenocarci- Consistent with the previous analysis, most of NSCLC samples noma samples. In addition, the percentage of minority peaks could be classified into three groups (Supplementary Fig. S2D). were also significantly correlated with higher tumor stage (P ¼ Interestingly, the major group of LUSC (cluster II, including 27 0.01338, Spearman correlation test), history of smoking (P ¼ samples from 16 patients) and lung adenocarcinoma (cluster 0.006324, Mann–Whitney U test), and female (P ¼ 0.0108, III, including 50 samples from 33 patients) were classified more Mann–Whitney U test), but not significantly correlated with clearly (Supplementary Fig. S2D; Supplementary Table S6). other factors such as peak number, presence of metastasis, and Cluster I consisted of nine LUSC samples from 5 patients and age (Fig. 2B). 20 lung adenocarcinoma samples from 15 patients. These To examine the potential of global open chromatin distri- samples have higher intersample heterogeneity and more var- bution as markers to classify NSCLC samples, we clustered iant peak number comparing with cluster II and cluster III. the samples based on ATAC-seq signals (see Materials and Patients in this group had a distinct open chromatin pattern Methods). The NSCLC samples aggregated into three clusters that is different from typical LUSC or lung adenocarcinoma. A (Fig. 2C). Cluster I consisted of four LUSC and four lung small number of NSCLC samples, six LUSC and four lung adenocarcinoma samples. Samples in cluster I contained adenocarcinoma, were clustered together with SCLC samples significantly lower number of open chromatin peaks [q- in cluster IV. These samples were more likely to have lower data value < 2.930761e-04; Dunn (1964) Kruskal–Wallis test] and quality, considering that they contained the lowest open chro- had significantly higher intersample heterogeneity [Fig. 2D; q- matin peak numbers and high intersample heterogeneity. value < 3.517155e-10; Dunn (1964) Kruskal–Wallis test] than Therefore, we excluded these samples in further analysis. Inter- the other two clusters, exhibiting patient-specificfeatures estingly, comparing with SCLC, we identified 6,965 NSCLC- rather than cancer type features. Cluster II mainly consisted specific open peaks enriched around genes that associated with of LUSC samples (five LUSC and one lung adenocarcinoma cell adhesion functions, such as focal adhesion and cell- samples). We identified 310 genes within 2.5 kb up and substrate adherens junction (Supplementary Fig. S2E). downstream of gene TSS of cluster II–specific open chromatin To better understand the intratumor heterogeneity, we per- and with upregulated gene expression (Supplementary Fig. formed single-cell ATAC-seq on 1 patient with LUSC (Supple- S2B; Supplementary Table S5; see Materials and Methods). mentary Table S1). In total, we detected 50,486 peaks in These genes were enriched for the keratinization process this patient (Supplementary Materials and Methods). After (Fig. 2E; Supplementary Fig. S2C), which was a phenomenon removing low-quality nuclei, we subjected 1,651 cells to t- specific to LUSC and not in lung adenocarcinoma (31). Clus- distributed stochastic neighbor embedding (Supplementary ter III mainly consisted of lung adenocarcinoma samples (25 Fig. S3A; Supplementary Table S2) and identified seven major lung adenocarcinoma and one LUSC samples), which could clusters of cells (Supplementary Materials and Methods). We be further classified into three subclusters (Fig. 2C). The categorized each cluster (Supplementary Fig. S3B and S3C), samples from the three subclusters contained distinct open and found that cells in cluster III were likely to come from chromatin patterns and different stages of tumor development tumor cells (Supplementary Materials and Methods). As (Fig. 2C; Supplementary Table S6). In particular, one of the expected, the open chromatin profile of cluster III showed subclusters, which contained fewer peaks than the other two high concordance with bulk NSCLC samples, 97% peaks in subclusters, were significantly associated with early tumor cluster III appeared in bulk samples (Fig. 3; Supplementary Fig.

4844 Cancer Res; 79(19) October 1, 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

A D Blood lineage 15 CD4T CD8T 10 CMP GMP HSC 0.5 5 LMPP MPP 0

Density 15 0.4 LUAD NSCLC 10 LUSC

5 0.3

0 Jaccard score 0.0 0.2 0.4 0.6 Jaccard index score 0.2 ns B 0%~20% 20%~40% 40%~60% 60%~80% 80%~100%

100 ***

75 Cluster I Cluster II Cluster III

50

1 kb 0.5 kb 1 kb Percentage 25 E

0 Peak number** Tumor type ** Metastasis Stage ** Smoke **

Age Cluster II Sex ** Staging Smoke Ages Sex

IA IB IIA IIB IIIA IIIB Yes No 75 40 MF Peak number Tumor type Metastasis

2e+05 0.5e+05 LUAD LUSC Yes No

C I II III Cluster S3 S2 S1 Subcluster Peak number Tumor Type Metastasis Staging Smoke Years Sex Cluster III 1 Jaccard score 0.2 0.4 0.6 0.8

LCE1E LCE3D SPRR3

Figure 2. Integration of chromatin landscapes in NSCLC. A, Density plots show Jaccard index distribution between blood lineage cells and NSCLC samples. CD4T (n ¼ 5); CD8T (n ¼ 5); CMP, common myeloid progenitor (n ¼ 8); GMP, granulocyte–macrophage progenitor cells (n ¼ 7); HSC, hematopoietic stem cells (n ¼ 7); LMPP, lymphoid-primed multipotent progenitor cells (n ¼ 3); MPP, multipotent progenitor cells (n ¼ 6); LUAD, lung adenocarcinoma (n ¼ 30); and LUSC (n ¼ 10). B, Barplot shows the percentage of peaks with different sample frequency in NSCLC samples. Bottom bars indicate peak number; tumor type (purple, lung adenocarcinoma; light blue, LUSC); having remote metastasis (orange) or not (blue); tumor staging; smokers (black) or nonsmokers (green); patient age; and female (orange) or male (blue). , P < 0.01. C, Heatmap shows hierarchical clustering of open chromatin region of NSCLC samples. D, Boxplot shows sample similarity (Jaccard index) of open chromatin region within three NSCLC clusters. ns, not significant; , P < 0.005 [Dunn (1964) Kruskal–Wallis test]. E, Browser plot shows normalized ATAC-seq profiles at keratinization process–related genes (orange, cluster II NSCLC samples; blue, cluster III NSCLC samples).

S3D). At the single cell level, we observed a high degree of profile (r ¼ 0.74, P < 2.2e-16, Spearman correlation test; heterogeneity at some of the open chromatin regions (Fig. 3). Supplementary Fig. S3E), suggesting highly heterogeneous Interestingly, there appeared to be a high correlation between open chromatin regions were potentially important to the intersample and intratumor heterogeneity of open chromatin evolution of tumor cells.

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4845

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

25 kb

LUSC : Bulk ATAC-seq (n = 10)

LWQC Aggregate scATAC-seq (cluster III : 283 single cells)

CDH3 CDH1

1 0

Cluster III 283 single cels

Figure 3. Overview of open chromatin signal of tumor single cells. Browser plot displays ATAC-seq signal around the CDH3 and CDH1 gene loci from the aggregate of all 10 LUSC samples and 283 tumor single cells that classified into cluster III.

Broad open chromatin peaks associated with NSCLC key genes included several key driver genes of NSCLC, like EGFR, jun proto- Thepreviousstudyhasprovedthat the width of epigenetic oncogene (JUN), Erb-B2 receptor tyrosine kinase 3 (ERBB3), Wnt marker peaks associated with cell type–specific key genes (30). family member 9A (WNT9A), and FAT atypical cadherin 1 We next focused on broad open chromatin peaks and associ- (FAT1). As an extension of these studies, we noted that the other ated genes by incorporating published leukemia ATAC-seq NSCLC-specific genes were significantly enriched for biological datasets (Supplementary Fig. S4A and S4B; Supplementary functions in the cancer-associated pathway, which has been Materials and Methods; ref. 9). Among the broad open chro- reported in the majority of solid malignant tumors (Supplemen- matin peak–associated genes, a total of 310 genes were NSCLC tary Table S8; refs. 35, 36). specific; 368 genes were Leukemia specific; and 337 genes were To study the effect of broad open chromatin peaks on gene shared between the two cancer types (Fig. 4A). About 10% of expression of NSCLC, we compared the expression levels of the genes in each of the three categories were COSMIC anno- 647 genes between samples that overlapped with broad open tated genes that related with at least one type of cancer, which chromatin peaks and not overlapped with broad open chro- was not surprising because both broad open chromatin peak– matin peaks. For genes that overlapped with broad open associated genes and COSMIC genes tend to be involved in chromatin peaks, their average expression levels were higher critical pathways. Interestingly, however, the proportion of in samples with broad peaks than in those without peaks NSCLC-related COSMIC genes was significantly higher (P < (Fig. 4C). However, only 38 of the 647 genes showed signif- 3.44e-30, x2 test; Supplementary Fig. S4C) in NSCLC-specific icant differential expression (FDR < 0.05, Mann–Whitney U broad open chromatin peak–associated genes than the other test; Fig. 4C). We then compared the variance of expression two categories, which suggested that the broad open chromatin levels between broad open chromatin peak overlapped and peaks might regulate genes essential to NSCLC functions. We nonoverlapped samples. On average, the variance of broad further noticed that the NSCLC-specific broad open chromatin open chromatin peak overlapped genes was higher than non- peak–associated genes were enriched in pathways that were overlapped genes (Fig. 4D). Compared with random genes, the previously reported in NSCLC, such as ECM receptor interac- fold change of variance between broad open chromatin peak tion and proteoglycans in cancer; whereas the leukemia-specific overlapped and nonoverlapped genes were significantly higher broad open chromatin peak–associated genes were enriched in regardless of gene expression levels (Mann–Whitney U test; P < pathways such as platelet activation (Supplementary Fig. S4D; 0.05;Fig.4D),whichsuggestedthatbroadopenchromatin see Materials and Methods). peaks were related to misregulation of genes in NSCLC. Besides, Meanwhile, many known NSCLC driver genes were covered by we also found that wide open regions are more likely to overlap broad open chromatin peaks. Strikingly, in three of the samples, with regions of copy-number gain/amplification, comparing the open chromatin regions were so broad that the whole EGFR with random genomic regions (P < 0.05 in 31/33 samples, gene body was covered (Fig. 4B). Besides, there were other NSCLC permutation test; Supplementary Fig. S4E). For example, in driver genes covered by broad open chromatin peaks, including three cases we mentioned above (Fig. 4B), both wide open JUN, ERBB3, WNT9A, and FAT1. We ranked genes according to chromatin regions and the gain of CNV covered the whole gene the width of broad open peaks in descending order and a carrier body of EGFR. This result suggested that there was an associ- frequency of associated broad open regions in ascending order. ation between somatic CNVs, wide open chromatin regions, Genes that frequently affected by broad open chromatin peaks and gene expression.

4846 Cancer Res; 79(19) October 1, 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

A B 50 kb NSCLC Leukemia COSMIC Gene constitution EPHB3 ERBB3

FAM174B 37.0% HECTD1 Leukemia CASD 22.2% PTPN14

NSCLC Specific 40.7% EGFR NSCLC

43.6% Figure 4. The broad open chromatin region 10.3% Others in NSCLC. A, Heatmap (left) shows 46.2% broad peak–associated genes in Leukemia specific NSCLC and leukemia samples. Broad open region association is NSCLC & leukemia color coded as shown at the bottom. Pie charts (right) show the percentage of COSMIC-annotated 48.6% genes associated with the broad 2.7% open region. Different types of PIK3CA tumors are color coded on the Common JUN 45.9% 2.7% right. B, Browser plot shows ATAC- seq signal around EGFR in 33

NSCLC samples. EGFR promoter Affected No affected with and without broad open C Nonsignificant Significant region association is zoomed out 2.0 at the bottom. Sample types are color coded at the bottom. C, Volcano plot shows mean gene 1.5 expression of broad open samples EGFR (FDR)

and nonbroad opened samples. 10 1.0 Differentially expressed genes are -Log coded in orange (FDR < 0.05, Mann–Whitney U test). D, Boxplots 0.5

show log2-fold change of variable coefficient in broad open genes 0.0 (broad open gene against -4 -2 0 2 4 nonbroad open gene; orange), and Log fold change nonbroad open genes (random D

gene expression comparison; blue). 6

4

2

0

- 2

- 4

Log fold change of variance Background - 6 With broad peak Non top1% broad open chromatin peak

To t a l Top1% broad open chromatin peak FPKM < 1 FPKM >= 20 1 <= FPKM <3 3<= FPKM < 5 5 <= FPKM10 < 10<= FPKM < 20

Open chromatin on somatic CNV regions associated with gene less likely to be located on intergenic regions (Supplementary Fig. expression S5A). For each sample, within 2.5 kb up- and downstream of gene Somatic CNV of gene regulatory regions could potentially affect body, an average of 79.26% expressed genes in our dataset target gene expression. To study the relationship between somatic (13,257 genes) contained open chromatin peaks, an average of CNV and open chromatin peaks in NSCLC, we intersected somat- 11.80% of the genes contained sCOP, and an average of 7.96% ic CNV regions with open chromatin peaks in each sample. genes contained neither open chromatin peak nor somatic CNV Interestingly, the intensity of ATAC-seq signal positively correlat- region (Supplementary Fig. S5B). ed with the copy number of somatic CNV regions (Fig. 5A), which The previous study has shown the association between somatic suggested that the regulatory elements were gained or lost togeth- CNV and gene expression (37). To examine the relationship er with the genomic regions they were located on. The somatic between somatic CNV and gene expression in our data, we CNV fragment that carried open chromatin peaks (sCOP) were identified 6,981 genes that overlapped with somatic CNV gain more likely to be located on gene promoters and gene body, but fragment and 759 genes that overlapped with somatic CNV loss

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4847

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

A D 6 Renal cell carcinoma Non−small cell lung cancer 4 Prostate cancer Pathways in cancer 2 Glioma Glycosylphosphatidylinositol(GPI)−anchor biosynthesis Intensity 0 Pancreatic cancer Toll−like receptor signaling pathway -2 Acute myeloid leukemia Chronic myeloid leukemia 1 876543 9 10 >10 Endocytosis Copy number Fc gamma R−mediated phagocytosis B Melanoma Adherens junction 6 Gain Colorectal cancer ErbB signaling pathway

4 Endometrial cancer T cell receptor signaling pathway Gene number ( P )

10 Insulin signaling pathway 10 2 Proteasome 20

−Log Small cell lung cancer 30 Spliceosome 40 0 MAPK signaling pathway 50 −4 04 VEGF signaling pathway 60 Log2 (fold change) Regulation of actin cytoskeleton Neurotrophin signaling pathway Fold Enrichment 3.5 3 B cell receptor signaling pathway 3.0 Valine, leucine and isoleucine degradation Amino sugar and nucleotide sugar metabolism 2.5 2 Pathogenic Escherichia coli infection 2.0 ( P ) Fc epsilon RI signaling pathway 1.5 10 mTOR signaling pathway 1

−Log Huntington's disease Aldosterone−regulated sodium reabsorption 0 Epithelial cell signaling in Helicobacter pylori infection −4 04 234 P Log2 (fold change) −Log10 ( ) C E

[0% – 42%] Somatic CNV gain frequency Ras signaling pathway

Somatic CNV gain PIK3CG

chr7: 44 Mb 52 Mb 60 Mb 100% _

Open frequency

Loss sCOP associated DEG

Gain sCOP associated DEG 3.03% _ DEG-Related gain sCOP Calcium signaling pathway EGFR EGFR PI3K-Akt signaling EGFR pathway EGFR EGFR EGFR EGFR EGFR EGFR-AS1 GM12878 ChromHMM H1-hESC ChromHMM p53 Signaling K562 ChromHMM pathway HepG2 ChromHMM HUVEC ChromHMM Cell cycle Cell cycle HMEC ChromHMM HSMM ChromHMM NHEK ChromHMM

NHLF ChromHMM G1–S progression 55,200,000 55,400,000

Figure 5. The association of somatic CNV, open chromatin peaks, and gene expression. A, Boxplot represents the correlation of the copy number of somatic CNV regions and the intensity of ATAC-seq signal of open chromatin peaks that overlapped corresponding somatic CNV regions. B, Volcano plots illustrate DEGs between samples with and without sCOPs (orange, DEGs; blue, non-DEGs). C, Aligned tracks show the somatic CNV gain frequency, somatic CNV gain, open frequency, and DEG-related gain sCOP within 1,000 bp from EGFR (bright red, active promoter; light red, weak promoter; purple, inactive/poised promoter; orange, strong enhancer; yellow, weak/poised enhancer; blue, insulator; dark green, transcriptional transition; light green, weak transcribed; gray, polycomb repressed; light gray, heterochromatin, repetitive/copy number variation). D, Point plot shows the top 35 most significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of gain of sCOP-associated DEG, sorted by P value in reverse order. The color represents the fold of enrichment, and the size of the point represents the number of DEGs. E, The KEGG pathway view represents the NSCLC pathway. DEGs are manually highlighted (orange boxes, gain sCOP– associated DEGs; blue boxes, loss sCOP–associated DEGs).

4848 Cancer Res; 79(19) October 1, 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

fragment within 2.5 kb up- and downstream of gene body in at copy of regulatory elements (38). We analyzed all 33,303 sCOPs least three samples. For each gene, we then compared the expres- (29,433 local sCOPs and 4,395 distant sCOPs) and their related sion levels between samples in which the gene associated with DEG pairs. We found that 99.7% of the sCOP-DEG pairs were somatic CNV fragment and samples in which the gene did not carried by the same somatic CNV fragment, which indicated that associate with somatic CNV fragment. Compared with genes that in most of the cases the CNV affects the copy number of genes did not associate with somatic CNV fragment, the majority of the together with their regulatory elements. genes associated with somatic CNV gain showed higher expres- Many known important NSCLC-related genes were affected by sion levels, whereas the majority of the genes associated with sCOP in our data (Supplementary Table S10). For example, we somatic CNV loss showed lower expression levels (Supplemen- identified 40 gain of sCOPs around EGFR gene. The EGFR gene tary Fig. S5C). In particular, 21.7% genes associated with somatic expressed significantly higher in samples with a gain of sCOPs CNV gain were differentially expressed genes (DEG), whereas than samples without sCOP (P < 0.05, Wilcoxon Rank Sum 14.5% genes associated with somatic CNV loss were DEGs (P < test; Fig. 5C). As another example, we identified NFKBIA, which 0.05, two-tailed Mann–Whitney U test; Supplementary Fig. S5C). harbored the largest number of gain of sCOP around the gene We next focused on the effect that sCOP exerted on gene (Supplementary Fig. S5E). The sCOP-associated DEGs were expression. In total, we identified 229,554 sCOPs with somatic enriched in multiple cancer pathways and signaling pathways CNV gain and 5,443 sCOPs with somatic CNV loss in at least three (Fig. 5D; Supplementary Materials and Methods). Among all the samples. We then compared the expression levels of sCOP- enriched cancer pathways, NSCLC pathway ranked among the associated genes between samples with and without sCOPs top. We annotated the sCOP-associated DEGs in the NSCLC (Supplementary Materials and Methods). The majority of genes pathway (Fig. 5E) and discovered that they were likely to position associated with gain of sCOP exhibited significantly higher on the EGFR-SOS2-BRAF-MAPK1 cascade. sCOP-associated expression levels, whereas the majority of genes associated with DEGs were also significantly enriched in the ErbB signaling loss of sCOP exhibited lower expression levels, compared with the pathway, which played essential roles in cancer development and same genes associated with peaks that did not overlap with progression (Supplementary Fig. S5F). In the ErbB signaling somatic CNV (Fig. 5B; Supplementary Table S10). Furthermore, pathway, multiple sCOP-associated DEGs fell on PI3K/AKT cas- 34.34% of genes associated with gain of sCOP were DEGs, cade, which was known to be dysregulated in cancer (39). whereas 16.89% of genes associated with loss of sCOP were DEGs (P < 0.05, Wilcoxon Rank Sum test; Fig. 4B; Supplementary Germline SNV on open chromatin associated with gene Table S9). DEG-related sCOPs were enriched in promoter, expression of NSCLC enhancer, and insulator, whereas distal DEG-related sCOPs (dis- SNV could potentially affect epigenetic states on regulatory tance between DEG and sCOP 1,000 bp) were more signifi- elements and alter the expression of target genes (40). To study cantly enriched in enhancers and insulators than local DEG- how germline SNV regulates gene expression through open chro- related sCOP (distance between DEG and sCOP <1,000 bp; matin regions in NSCLC, we first called germline SNVs from our Supplementary Fig. S5D). It has been shown that CNV would samples, and then identified cis eQTLs (within 1,000 kb of gene, affect gene expression through changing gene dosage (37) or the only considered SNVs within ATAC-seq peaks) and cis ATAC-QTLs

A C Background eQTL In-open-region SNP Open region Gene 1.5

ATAC-QTLs 1.0

Figure 6. eQTLs Local Distal

Density (1e-06) 0.5 Landscape of ATAC-QTLs and eQTLs in NSCLC. A, Graphical summary of ATAC-QTLs, eQTLs, 0 and association between ATAC-seq SNP: 76 SNP: 21 SNP: 20 SNP: 11244 -1,000 -500 0 500 1,000 Distance (kb) open region and gene. B, Manhattan Gene: 32 Open region: 21 Open region: 127 Open region: 10982 plot of eQTLs. Orange dots, eQTLs; B D orange triangles, joint-QTLs. 60 C, Density plot shows the distance MON2 Genome 8 SHARPIN between eQTLs and TSS of CHN2 All tested germline SNPs HES4 HORMAD1 RHPN1 associated gene (orange), and ZC3H3 GSTT1 eQTLs FBXL6 NTS CYP4F3 GLI4 ALOX15B APOL2 distance between all pairs of SNV GSTM1 ROS1 KIAA1875 XRCC6BP1 PPP1R37 40 SCRIB ARL17A SCARNA2 C4orf48 PRKAR1B and TSS of all genes used for the 6 DYSF ZNF707 RHCG CTU1 RPTN MAPK15 CYP4F11 eQTLs test are drawn as RP11-273G15.2 ( P )

background (blue). D, Barplot 10 Proportion (%) shows the genomic feature 20 − Log 4 distribution of the human genome, all tested germline SNVs and eQTLs.

2 0

1234567891011121315171921 TSS TES 3’UTR 5’UTR Promoter Exon Intergenic Intron Chromosome Feature

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4849

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

4850 Cancer Res; 79(19) October 1, 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

(within 1,000 kb of ATAC-seq peaks, only considered SNVs by previous studies. For example, GSTM1, one of the genes within ATAC-seq peaks) from our data (Fig. 6A; Supplementary associated with joint-QTL, was involved in the detoxification Materials and Methods). Under the threshold of FDR < 0.1, we of many carcinogens (45), and multiple studies had reported identified 76 eQTLs associated with 32 genes (Fig. 6B). These the correlation between the malfunction of GSTM1 and eQTLs were not overlapped with reported eQTLs in normal increased risk of lung cancer (46). The SNP, rs10857795, tissues (41), suggesting they were likely NSCLC-specificeQTLs. located in the intron of GSTM1, was previously annotated as The majority (62.5%) of the genes were associated with only no effect (47). However, we found this SNP to be joint-QTL one eQTL. Conversely, 12 genes (37.5%) were associated with that correlated with both the gene expression level of GSTM1 multiple eQTLs (median ¼ 3.5; Supplementary Fig. S6A). In gene and the ATAC-seq peak intensity that span the promoter agreement with previous reports (42), the eQTLs tended to fall of this gene in NSCLC (Fig. 7C). This joint-QTL could poten- close to the related genes (Fig. 6C). About 69.23% of eQTLs tially disrupt the Meis1(Homeobox) motif, which strongly located within 100 kb of their related genes. The eQTLs were suggested that SNP rs10857795 might play a regulatory role highly enriched in the promoter, exon, and intron regions to GSTM1 gene and consequently affect the risk of NSCLC. (Fig.6D).Forexample,weidentified 17 eQTLs associated with We further intersected the identified QTLs with lung cancer– HORMAD1 gene, which played a key role in cell-cycle regula- related genome-wide association study (GWAS) significant loci tion and has been reported to be a potential marker in multiple (Supplementary Materials and Methods). A total of seven eQTLs, cancers (43, 44). These 17 eQTLs were highly correlated, 83 ATAC-QTLs, and three joint-QTLs were located within the dividing the samples into two haplotypes, and significantly linkage disequilibrium (LD) blocks of lung cancer–related GWAS higher HORMAD1 expression was found in samples with significant loci (Supplementary Table S12). Compared with SNP- mutant genotype (Fig. 7A). containing open regions, ATAC-QTL–containing open regions Apart from eQTLs, we identified 11,265 ATAC-QTLs associated overlapped with a higher percentage of LD blocks associated with with 11,000 open chromatin regions (Fig. 6A; Supplementary squamous cell lung carcinoma (P ¼ 0.024, permutation test), Materials and Methods). As expected, the ATAC-QTLs tended to urinary 1,3-butadiene metabolite levels in smokers (an important locate close to their related open chromatin regions (Supplemen- carcinogen in tobacco smoke; P ¼ 0.001, permutation test), tary Fig. S6B). ATAC-QTLs were enriched in the 30UTR, 50UTR, response to taxane treatment (docetaxel; P ¼ 0.002, permutation promoter, and exon regions (Supplementary Fig. S6C). A total of test), and multiple cancers (lung cancer, noncardiac gastric cancer, 41 ATAC-QTLs were located within their associated ATAC-seq and esophageal squamous cell carcinoma; P ¼ 0.021, permuta- peaks, which we termed as local-ATAC-QTLs. The remaining tion test; Supplementary Fig. S6E; Supplementary Materials and 11,244 ATAC-QTLs were correlated with distal ATAC-seq peaks, Methods). Similarly, eQTL-containing open regions overlapped which we termed as distal-ATAC-QTLs (Fig. 6A). Interestingly, 20 with a higher percentage of LD blocks associated with lung cancer of the 41 local-ATAC-QTLs also correlated with distal ATAC-seq (P ¼ 0.009, permutation test), lung carcinoma (P ¼ 0.001, peaks, which suggested likely long- regulatory permutation test), cancer (pleiotropy; P ¼ 0.001, permutation (Fig. 6A). To establish the relationship between eQTLs and ATAC- test), and urinary 1,3-butadiene metabolite levels in smokers (P ¼ QTLs in NSCLC, we intersected the two types of QTL and iden- 0.001, permutation test; Supplementary Fig. S6F; Supplementary tified 21 joint quantitative trait loci (joint-QTL), accounting for Materials and Methods). These GWAS-associated QTLs could 27% of all eQTLs (Supplementary Fig. S6D). eQTLs were enriched potentially affect gene regulatory networks in NSCLC through in ATAC-QTLs (P ¼ 2.509e-13; fold enrichment ¼ 10.12, Fisher genomic alterations, which could only be unveiled through the ), suggesting the possible mediating role of open chro- combination of GWAS and multiomics sequencing. A case in matin regions in gene regulation. The joint-QTL might potentially point is the joint-QTL, rs140313, which was located within the LD affect gene expression through their associated open chromatin block of the significant loci associated with 1,3-butadiene metab- regions. The 21 joint-QTLs were associated with 362 ATAC-seq olism and detoxification phenotype (48). This joint-QTL corre- peaks and 16 genes (Fig. 7B; Supplementary Table S11). The joint- lated with the intensity of four ATAC-seq peaks upstream of the QTLs were likely to be functionally relevant loci that were ignored GSTT1 gene and the expression levels of the GSTT1 gene (Fig. 7D).

Figure 7. Regulation network in patients with lung cancer. A, Browser plot shows overlaid ATAC-seq signals of mutation-harboring samples (Alt, blue) and mutation-free samples (Ref; orange). Red boxes, eQTLs associated with HOMARD1. Heatmap shows the genotype of two groups. Boxplot shows gene expression of two groups. B, Circos plot shows QTLs and associated genes. The innermost circle (track 1) shows linkage disequilibrium blocks of lung cancer–related traits. The

middle two circles (track2 and track3) are circular Manhattan plots of e-QTLs and ATAC-QTLs. Y-axis denotes –log10(P) of QTLs. All three tracks are color coded as shown on the left. Orange shaded regions highlight joint-QTLs that are associated with genes marked in the center of the circos. The outermost circle shows an ideogram of the human chromosomes. C, Browser plot shows overlaid ATAC-seq signals of the two groups (GG, reference genotype; AA, mutant genotype). Red, open chromatin region containing joint-QTL; green, joint-QTL associated open chromatin regions. Position weight matrix for Meis1 (Homeobox) is shown in the middle; the 8th position corresponded to the location of the motif altering SNV (rs10857795, joint-QTL). The proportion of samples with two genotypes is shown in the , and gene expression of two genotypes is shown in the boxplot. D, The fan chart shows the zoomed-in eQTL and ATAC-QTL analysis of chr22 with a binning size of 700 kb. The regulatory network of this joint-QTL is shown at the top right (gray, GWAS lead SNV; orange, joint-QTL SNV; green, associated gene). The distance between joint-QTL and its associated gene or open chromatin regions are marked near ledges. Individuals were separated according to two genotypes (reference genotype CC and mutant genotype TT). Boxplot in the middle shows the gene expression of the two groups. Browser plot in bottom shows overlaid ATAC-seq signals of the two groups. E, The fan chart shows the zoomed-in eQTL and ATAC-QTL analysis of chr6 with a binning size of 220 kb. The regulatory network of NSCLC-derived joint-QTL is shown at the bottom right (gray, GWAS lead SNV; orange, joint-QTL SNV; green, associated gene). Distance between joint-QTLs and their associated genes or open chromatin regions are marked near ledges. Individuals were separated according to two genotypes (reference genotype AA and mutant genotype AG). Boxplot in the middle shows the gene expression of the two groups. Browser plot in the top shows overlaid ATAC-seq signals of the two groups.

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4851

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

While GSTT1 gene was reported to be associated with lung cancer It is rational to hypothesize that broad open chromatin peaks risk (48), and previous studies had reported the correlation in NSCLC could be associated with critical genes, as they between GSTT1 deletion and carcinogenesis detoxification and indicate more active gene regulatory relationship. We then antineoplastic drug therapy response (45, 47), our analysis identified genes associated with broad ATAC-seq peaks high suggested a regulatory risk loci link to this gene. To validate frequency in our samples. These genes were indeed enriched in the regulatory interaction of the predicted joint-QTL-to-gene known NSCLC-related genes, which suggested that broad links, we applied the CRISPR/Cas9 cassette method (Supple- ATAC-seq peaks could potentially serve as markers to identify mentary Materials and Methods; ref. 49) to replace the key genes in NSCLC. On the other hand, we observed some sequence of the QTL with the alternative genotype (Supple- extremely wide open chromatin regions, such as open chro- mentary Fig. S7A). Consistent with the analysis, CRISPR/Cas9 matin regions that extended the whole gene body of EGFR. The cassette treatment of joint-QTL, rs10857795, in which cells regions could be the consequence of large genomic rearrange- gained G>A mutation (Supplementary Fig. S7B), led to signif- ment (i.e., CNV) and cause severe misregulation of target genes icant upregulation of GSTM1 gene expression in H1299 NSCLC in NSCLC. These extreme abnormal genomic open chromatin cell line (Supplementary Fig. S7C). Similarly, CRISPR/Cas9 signals could also indicate malfunction components of the cassette treatment of joint-QTL, rs140313, where cells gained gene regulatory network in NSCLC. C>T mutation (Supplementary Fig. S7D), led to significant Importantly, through the integration of genome sequence, upregulation of GSTT1 gene expression in H1299 NSCLC cell ATAC-seq, and transcriptome data, we could directly study how line (Supplementary Fig. S7E). Besides, in another example, we the alteration of genome sequence on regulatory elements identified two joint-QTLs (rs17079286 and rs73766221), affected the gene regulatory network of NSCLC. To this end, which were located within the LD blocks of GWAS significant we analyzed two types of genomic variations, that is, somatic loci associated with lung cancer (rs9387478; ref. 50) and lung CNV and germline SNV. Both ATAC-seq peak intensity and adenocarcinoma (rs9387479; ref. 51), correlated to the inten- gene expression levels positively correlated with the copy sity of an ATAC-seq peak upstream of ROS1 gene and the number of CNV region they were located on, indicating that expression levels of ROS1 gene (Fig. 7E). The ROS1 gene was active regulatory elements might gain or loss together with a known NSCLC driver gene, whose mutation, rearrangement, CNV. While previous studies proposed that CNV might alter and fusion were reported to be well associated with gene expression by affecting gene dosage or gene regulatory NSCLC (50). Our analysis suggested that the genomic mutation elements, we found that in 99.7% of the cases, CNV fragments on the regulatory elements, which affect the expression of ROS1 carried gene body together with their regulatory elements. In gene, could also be an important factor in NSCLC network terms of germline SNV, we identified 76 eQTLs and about 30% malfunction. of them were also ATAC-QTLs. The expression and ATAC joint- QTLs were more likely to be causal SNVs that affected gene expression through regulatory elements. We further intersected Discussion the QTLs with known lung cancer–related GWAS significant Previous study on NSCLC has extensively characterized geno- loci and identified three likely causal SNVs that were the mic features, such as sequence mutation, gene expression, and previous challenging to pinpoint. epigenetic markers. However, the chromatin accessibility of To sum up, we generated multiomics data from the tumor NSCLC has not been characterized as clearly so far. In this study, tissue of NSCLC population and showed the potential of this type we analyzed ATAC-seq profiles, together with WGS and transcrip- of data in elucidating gene regulatory networks of NSCLC. The tome sequencing in the primary tumor tissue of 50 patients. We sample size still limited the power of our analysis. In the future, found high heterogeneity of chromatin accessibility among the study would be improved with a larger population or by patients, which reflected the high interpatient heterogeneity of incorporating more types of omics data. NSCLC genome sequence. Given the heterogeneity of chromatin accessibility, however, the ATAC-seq profiles could, to some Disclosure of Potential Conflicts of Interest extent, serve as molecular markers to classify NSCLC samples. No potential conflicts of interest were disclosed. On the basis of global open chromatin peaks, we identified lung adenocarcinoma- and LUSC-specific clusters and further stratified Authors' Contributions lung adenocarcinoma samples into three subclusters. Although Conception and design: W. Luo, S. Dai, Q. Zhou, D. Xie, W. Li we found a correlation between the clusters and clinical para- Development of methodology: S. Dai, D. Xie meters such as tumor stage and smoking, it was challenging to Acquisition of data (provided animals, acquired and managed patients, conclude due to small sample size and potential provided facilities, etc.): Z. Wang, W. Luo, J. Tang, Y. Zhou, J. Zhang, F. Cao, S. Dai, P. Tian, Y. Wang, L. Liu, G. Che, Q. Zhou effects. Beyond that, we performed single-cell ATAC-seq technol- Analysis and interpretation of data (e.g., statistical analysis, , ogy in 1 patient, which identified different cell types in tumor computational analysis): K. Tu, L. Xia, K. Luo, K. Lu, X. Hu, Y. He, W. Qiao, microenvironment. We discovered a positive correlation between S. Dai, D. Xie intra- and intertumor heterogeneity, suggesting that it is possible Writing, review, and/or revision of the manuscript: Z. Wang, K. Tu, L. Xia, to explore tumor heterogeneity intra- and interpatient using open K. Luo, S. Dai, Y. Wang, Q. Zhou, D. Xie chromatin signature. Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): W. Luo, S. Dai, P. Tian, L. Liu, G. Che Other than the position of open chromatin peaks on the Study supervision: S. Dai, Q. Zhou, D. Xie, W. Li genome, the width of the open chromatin peak could also indicate key genes in NSCLC. Previous study has proven that Acknowledgments broad open chromatin peaks of active regulatory histone mod- We thank Dr. Xin He from the University of Chicago for the valuable ification regions associated with cell type–specific active genes. discussion. This work was supported by grant nos. 91631111, 31571327,

4852 Cancer Res; 79(19) October 1, 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

and 31771426 from Chinese National Natural Science Foundation to D. Xie; The costs of publication of this article were defrayed in part by the payment of the National Natural Science Foundation of China (81871890) and Trans- page charges. This article must therefore be hereby marked advertisement in formation Projects of Sci-Tech Achievements of Sichuan Province accordance with 18 U.S.C. Section 1734 solely to indicate this fact. (2016CZYD0001) to W. Li; China Postdoctoral Science Foundation (2017M623043) to Z. Wang; and Sci-Tech Support Program of Science and Received December 1, 2018; revised April 12, 2019; accepted June 10, 2019; Technology Department of Sichuan Province (2016SZ0073) to P. Tian. published first June 17, 2019.

References 1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin 22. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, et al. 2018;68:7–30. Control-FREEC: a tool for assessing copy number and allelic content using 2. Sakashita S, Sakashita M, Sound Tsao M. Genes and pathology of non- next-generation sequencing data. 2012;28:423–5. small cell lung carcinoma. Semin Oncol 2014;41:28–39. 23. Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. 3. Berger AH, Brooks AN, Wu X, Shrestha Y, Chouinard C, Piccioni F, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of High-throughput phenotyping of lung cancer somatic mutations. focal somatic copy-number alteration in human cancers. Genome Biol Cancer Cell 2017;32:884. 2011;12:R41. 4. Widschwendter M, Jones A, Evans I, Reisel D, Dillner J, Sundstrom K, et al. 24. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Epigenome-based cancer risk prediction: rationale, opportunities and et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008; challenges. Nat Rev Clin Oncol 2018;15:292–309. 9:R137. 5. Brzezianska E, Dutkowska A, Antczak A. The significance of epigenetic 25. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al. alterations in lung carcinogenesis. Mol Biol Rep 2013;40:309–25. GREAT improves functional interpretation of cis-regulatory regions. 6. Park SM, Choi EY, Bae M, Kim S, Park JB, Yoo H, et al. Histone variant Nat Biotechnol 2010;28:495–501. H3F3A promotes lung cancer cell migration through intronic regulation. 26. Gehring JS, Fischer B, Lawrence M, Huber W. SomaticSignatures: inferring Nat Commun 2016;7:12914. mutational signatures from single-nucleotide variants. Bioinformatics 7. Anastasiadou E, Jacob LS, Slack FJ. Non-coding RNA networks in cancer. 2015;31:3673–5. Nat Rev Cancer 2018;18:5–18. 27. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, 8. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition et al. Clock-like mutational processes in human somatic cells. Nat Genet of native chromatin for fast and sensitive epigenomic profiling of open 2015;47:1402–7. chromatin, DNA-binding proteins and nucleosome position. Nat Methods 28. Cancer Genome Atlas Research NetworkComprehensive molecular pro- 2013;10:1213–8. filing of lung adenocarcinoma. Nature 2014;511:543–50. 9. Corces MR, Buenrostro JD, Wu B, Greenside PG, Chan SM, Koenig JL, 29. Cancer Genome Atlas Research NetworkComprehensive genomic charac- et al. Lineage-specific and single-cell chromatin accessibility charts terization of squamous cell lung cancers. Nature 2012;489:519–25. human hematopoiesis and leukemia evolution. Nat Genet 2016;48: 30. Benayoun BA, Pollina EA, Ucar D, Mahmoudi S, Karra K, Wong ED, et al. 1193–203. H3K4me3 breadth is linked to cell identity and transcriptional consistency. 10. Beekman R, Chapaprieta V, Russinol N, Vilarrasa-Blasi R, Verdaguer-Dot N, Cell 2014;158:673–88. Martens JHA, et al. The reference epigenome and regulatory chromatin 31. Travis WD, Brambilla E, Nicholson AG, Yatabe Y, Austin JHM, Beasley MB, landscape of chronic lymphocytic leukemia. Nat Med 2018;24:868–80. et al. The 2015 World Health Organization classification of lung tumors: 11. Rendeiro AF, Schmidl C, Strefford JC, Walewska R, Davis Z, Farlik M, et al. impact of genetic, clinical and radiologic advances since the 2004 classi- Chromatin accessibility maps of chronic lymphocytic leukaemia identify fication. J Thorac Oncol 2015;10:1243–60. subtype-specific epigenome signatures and transcription regulatory net- 32. Linhares MM, Affonso RJ Jr, Viana Lde S, Silva SR, Denadai MV, de works. Nat Commun 2016;7:11938. Toledo SR, et al. Genetic and immunohistochemical expression of 12. Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, et al. The integrins ITGAV, ITGA6, and ITGA3 as prognostic factor for colorectal chromatin accessibility landscape of primary human cancers. Science cancer: models for global and disease-free survival. PLoS One 2015;10: 2018;362:eaav1898. doi: 10.1126/science.aav1898. e0144333. 13. Lambrechts D, Wauters E, Boeckx B, Aibar S, Nittner D, Burton O, et al. 33. JiaY,YingX,ZhouJ,ChenY,LuoX,XieS,etal.ThenovelKLF4/PLAC8 Phenotype molding of stromal cells in the lung tumor microenvironment. signaling pathway regulates lung cancer growth. Cell Death Dis 2018; Nat Med 2018;24:1277–89. 9:603. 14. Puram SV, Tirosh I, Parikh AS, Patel AP, Yizhak K, Gillespie S, et al. Single- 34. Park JW, Lee JK, Sheu KM, Wang L, Balanis NG, Nguyen K, et al. Repro- cell transcriptomic analysis of primary and metastatic tumor ecosystems in gramming normal human epithelial tissues to a common, lethal neuro- head and neck cancer. Cell 2017;171:1611–24. endocrine cancer lineage. Science 2018;362:91–5. 15. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for 35. Ciardiello F, Tortora G.EGFR antagonists in cancer treatment. N Engl J Med Illumina sequence data. Bioinformatics 2014;30:2114–20. 2008;358:1160–74. 16. Li H, Durbin R.Fast and accurate short read alignment with Burrows- 36. Jaiswal BS, Kljavin NM, Stawiski EW, Chan E, Parikh C, Durinck S, et al. Wheeler transform. Bioinformatics 2009;25:1754–60. Oncogenic ERBB3 mutations in human cancers. Cancer Cell 2013;23: 17. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The 603–17. sequence alignment/map format and SAMtools. Bioinformatics 2009;25: 37. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, et al. 2078–9. Relative impact of nucleotide and copy number variation on gene expres- 18. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, sion phenotypes. Science 2007;315:848–53. et al. The genome analysis toolkit: a MapReduce framework for analyzing 38. DeBoever C, Li H, Jakubosky D, Benaglio P, Reyna J, Olson KM, et al. Large- next-generation DNA sequencing data. Genome Res 2010;20:1297–303. scale profiling reveals the influence of genetic variation on gene expression 19. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, in human induced pluripotent stem cells. Cell Stem Cell 2017;20:533–46. et al. Sensitive detection of somatic point mutations in impure and 39. Yuan TL, Cantley LC.PI3K pathway alterations in cancer: variations on a heterogeneous cancer samples. Nat Biotechnol 2013;31:213–9. theme. Oncogene 2008;27:5497–510. 20. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. 40. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. VarScan 2: somatic mutation and copy number alteration discovery in Systematic localization of common disease-associated variation in regu- cancer by exome sequencing. Genome Res 2012;22:568–76. latory DNA. Science 2012;337:1190–5. 21. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic 41. GTEx Consortium. Human genomics. The genotype-tissue expression variants from high-throughput sequencing data. Nucleic Acids Res 2010; (GTEx) pilot analysis: multitissue gene regulation in humans. Science 38:e164. 2015;348:648–60.

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4853

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

42. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, et al. 47. Moyer AM, Salavaggione OE, Hebbring SJ, Moon I, Hildebrandt MA, Understanding mechanisms underlying human gene expression variation Eckloff BW, et al. Glutathione S-transferase T1 and M1: gene sequence with RNA sequencing. Nature 2010;464:768–72. variation and functional genomics. Clin Cancer Res 2007;13:7207–16. 43. Shahzad MM, Shin YH, Matsuo K, Lu C, Nishimura M, Shen DY, et al. 48. Boldry EJ, Patel YM, Kotapati S, Esades A, Park SL, Tiirikainen M, et al. Biological significance of HORMA domain containing protein 1 Genetic determinants of 1,3-butadiene metabolism and detoxification in (HORMAD1) in epithelial ovarian carcinoma. Cancer Lett 2013; three populations of smokers with different risks of lung cancer. 330:123–9. Cancer Epidemiol Biomarkers Prev 2017;26:1034–42. 44. Watkins J, Weekes D, Shah V, Gazinska P, Joshi S, Sidhu B, et al. Genomic 49. Hsu PY, Hsu HK, Hsiao TH, Ye Z, Wang E, Profit AL, et al. Spatiotemporal complexity profiling reveals that HORMAD1 overexpression contributes to control of estrogen-responsive transcription in ERa-positive breast cancer homologous recombination deficiency in triple-negative breast cancers. cells. Oncogene 2016;35:2379–89. Cancer Discov 2015;5:488–505. 50. Lan Q, Hsiung CA, Matsuo K, Hong YC, Seow A, Wang Z, et al. Genome- 45. McIlwain CC, Townsend DM, Tew KD. Glutathione S-transferase wide association analysis identifies new lung cancer susceptibility loci in polymorphisms: cancer incidence and therapy. Oncogene 2006;25: never-smoking women in Asia. Nat Genet 2012;44:1330–5. 1639–48. 51. McKay JD, Hung RJ, Han Y, Zong X, Carreras-Torres R, Christiani DC, et al. 46. Ford JG, Li Y, O'Sullivan MM, Demopoulos R, Garte S, Taioli E, et al. Large-scale association analysis identifies new lung cancer susceptibility Glutathione S-transferase M1 polymorphism and lung cancer risk in loci and heterogeneity in genetic susceptibility across histological sub- African-Americans. Carcinogenesis 2000;21:1971–5. types. Nat Genet 2017;49:1126–32.

4854 Cancer Res; 79(19) October 1, 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

The Open Chromatin Landscape of Non−Small Cell Lung Carcinoma

Zhoufeng Wang, Kailing Tu, Lin Xia, et al.

Cancer Res 2019;79:4840-4854. Published OnlineFirst June 17, 2019.

Updated version Access the most recent version of this article at: doi:10.1158/0008-5472.CAN-18-3663

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2020/06/18/0008-5472.CAN-18-3663.DC1

Cited articles This article cites 51 articles, 10 of which you can access for free at: http://cancerres.aacrjournals.org/content/79/19/4840.full#ref-list-1

Citing articles This article has been cited by 2 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/79/19/4840.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at Subscriptions [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/79/19/4840. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research.