Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663 Cancer Genome and Epigenome Research

The Open Chromatin Landscape of Non–Small Cell Lung Carcinoma Zhoufeng Wang1, Kailing Tu2, Lin Xia2, Kai Luo2,Wenxin Luo1, Jie Tang2, Keying Lu2, Xinlei Hu2, Yijing He2, Wenliang Qiao3, Yongzhao Zhou1, Jun Zhang2, Feng Cao2, Shuiping Dai1, Panwen Tian1, Ye Wang1, Lunxu Liu4, Guowei Che4, Qinghua Zhou3, Dan Xie2, and Weimin Li1

Abstract

Non–small cell lung carcinoma (NSCLC) is a major cancer of important NSCLC genes. In addition, we identified 21 type whose epigenetic alteration remains unclear. We ana- joint-quantitative trait loci (joint-QTL) that correlated to lyzed open chromatin data with matched whole-genome both assay for transposase accessible chromatin sequencing sequencing and RNA-seq data of 50 primary NSCLC cases. peak intensity and gene expression levels. Finally, we iden- We observed high interpatient heterogeneity of open chro- tified 87 regulatory risk loci associated with lung cancer– matin profiles and the degree of heterogeneity correlated to related phenotypes by intersecting the QTLs with genome- several clinical parameters. Lung adenocarcinoma and lung wide association study significant loci. In summary, this squamous cell carcinoma (LUSC) exhibited distinct open compendium of multiomics data provides valuable insights chromatin patterns. Beyond this, we uncovered that the and a resource to understand the landscape of open chro- broadest open chromatin peaks indicated key NSCLC genes matin features and regulatory networks in NSCLC. and led to less stable expression. Furthermore, we found that the open chromatin peaks were gained or lost together with Significance: This study utilizes state of the art genomic somatic copy number alterations and affected the expression methods to differentiate lung cancer subtypes.

Introduction histone modifications (6), and noncoding RNA (7) have been characterized in NSCLC. Until now, however, the open chromatin Lung cancer is one of the leading causes of cancer-related death landscape of NSCLC remains undetermined. worldwide (1). Non–small cell lung carcinoma (NSCLC) Recently, the highly efficient assay for transposase accessible accounts for approximately 85% of lung cancer cases, with lung chromatin sequencing (ATAC-seq) approach (8) has successfully adenocarcinoma and lung squamous cell carcinoma (LUSC) mapped genome-wide open chromatin patterns in multiple being the two major histologic types (2). Recent genome sequenc- human cell types and provided valuable insights into the under- ing efforts have identified millions of somatic mutations in lying regulatory mechanisms (9). Several works have profiled the NSCLC (3), which further led to the discovery of "driver muta- open chromatin state in lymphocytic leukemia (10, 11). To date, a tions" in key oncogenes. While the genetic factor only accounts for few studies have characterized open chromatin state in primary parts of the interpersonal variability in NSCLC risk, the epigenetic NSCLC samples. A recent work published by Corces and collea- contributions to this disease are becoming increasingly para- gues cataloged the open chromatin states of 23 cancer types (12), mount (4). The epigenetic profiles, such as DNA methylation (5), including NSCLC. However, none has associated the open chro- matin variations to genomic alterations among patients with 1Department of Respiratory and Critical Care Medicine, State Key Laboratory of NSCLC. It is known that the profile of open chromatin in primary Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China. NSCLC samples is particularly difficult partially because of cancer 2 National Frontier Center of Disease Molecular Network, State Key Laboratory of tissues confounded by cell-type heterogeneity (13). Recent work Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China. 3Lung Cancer Center, West China Hospital Sichuan University, Chengdu, Sich- has shown that using negative cell isolation to deplete immune fi fi uan, China. 4Department of Thoracic Surgery, West China Hospital, Sichuan cells and broblasts could signi cantly increase the purity of University, Chengdu, Sichuan, China. cancer cells from primary tumor samples (14), enabling mean- Note: Supplementary data for this article are available at Cancer Research ingful epigenomic analysis. Online (http://cancerres.aacrjournals.org/). The integrative analysis combining whole-genome sequencing (WGS) and RNA-seq data with open chromatin state from the Z. Wang, K. Tu, L. Xia, K. Luo, and W. Luo contributed equally to this article. same patients could potentially delineate the effects of genomic Corresponding Authors: Dan Xie, West China Hospital of Sichuan University, alterations on the gene regulatory network in NSCLC. Notably, we Chengdu, Sichuan 610000, China. Phone: 136-9346-2346; Fax: 028-85164165; can explore the effects of genomic mutations and structure varia- E-mail: [email protected]; and Weimin Li, Phone: 189-8060-1009; E-mail: [email protected] tions on open regulatory elements and understand how it is associated with NSCLC transcriptome. In this study, we generated Cancer Res 2019;XX:XX–XX matched ATAC-seq, WGS, and RNA-seq data from 50 primary doi: 10.1158/0008-5472.CAN-18-3663 NSCLC cases. The comprehensive open chromatin landscape of 2019 American Association for Cancer Research. NSCLC provided important resources that allowed us to identify

www.aacrjournals.org OF1

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

key regulatory elements in this disease. The integration of mul- assessed at the Bioanalyzer 2100 DNA Chip 7500 (Agilent Tech- tiple-omics datasets revealed novel insights into the gene regu- nologies), and samples with an RNA integrity number of over 7 latory mechanism of NSCLC. were further analyzed by RNA-seq. RNA sequencing libraries were generated using the rRNA-depleted RNA by NEBNext Ultra Direc- tional RNA Library Prep Kit for Illumina following the manu- Materials and Methods facturer's recommendations. The products were purified (AMPure Patients and clinical information XP system) and library quality was assessed on the Agilent Patients with NSCLC were staged according to the American Bioanalyzer 2100 system. The libraries were sequenced on an Joint Committee on Cancer version 6 and initially diagnosed with Illumina HiSeq 4000 platform, and 150 bp paired-end reads were lung cancer at West China Hospital of Sichuan University generated. (Chengdu, China) from September 2016 to December 2018. Information, including patients' age, gender, ethnicity, patholo- WGS data processing gy, and tumor stage was collected for these 51 patients (Supple- Raw pair-end WGS reads were subjected to adapter and low- mentary Table S1). All of them received surgical treatment, and quality sequence trimming by using Trimmomatic (version 0.36; none of them underwent neoadjuvant therapy before surgery. ref. 15) with default parameters. We mapped trimmed pair-end Tumors and matched distal normal lung tissues were obtained reads to human reference build hg19 by using BWA mem (version during surgery. All samples were evaluated by two pathologists to 0.7.13-r1126; ref. 16). BAMs were sorted and indexed using determine the pathologic diagnosis and tumor cellularity. Only SAMtools (version 1.3; ref. 17), and marking duplicates using tumor tissues containing at least 80% of tumor cells were includ- Picard (version 2.2.1; http://broadinstitute.github.io/picard.). ed. This study was approved by the Institutional Review Board of The Genome Analysis Toolkit (GATK, version 3.6; ref. 18) was West China Hospital of Sichuan University (Chengdu, China; used for local realignment and base quality recalibration, proces- project identification code: 2017.114) and all patients provided sing tumor/normal pairs independent. written informed consent. Germline mutation detection ATAC-seq library preparation and sequencing Using default parameters, GATK HaplotypeCaller (18) was To profile open chromatin, ATAC-seq was performed as used to detect germline single-nucleotide variant (SNV) and described previously (8). A total of 1 105 cell pellets were indels. The known sites' files used for germline SNVs and indels washed once with PBS and cells were pelleted by centrifugation calling were downloaded from ftp://gsapubftp-anonymous@ftp. using the previous settings. Cell pellets were resuspended in 50 mL broadinstitute.org/bundle/. of lysis buffer, and nuclei were pelleted by centrifugation for 10 minutes at 500 g,4C. The supernatant was discarded, and Somatic mutation detection nuclei were resuspended in 50 mL reaction buffer containing 2.5 Somatic SNVs and indels were predicted using MuTect2 (19) mL of Tn5 transposase and 22.5 mL of TD buffer (Nextera Illu- and VarScan2 (20) with default parameters. Somatic SNVs and mina). The reaction was incubated at 37C for 30 minutes. indels identified by VarScan2 (20) were retained if all of the Tagmented DNA was isolated by MinElute PCR Purification Kit following criteria were met: (i) P value of the reported somatic (Qiagen). Libraries were amplified for 10 cycles and purified using SNV 0.05; (ii) the natural frequency of the reported somatic 2 SPRI Cleanup (Agencourt). Library sizes were determined SNV 5%; (iii) tumor frequency of the reported somatic SNV using LabChip GXII Touch HT (PerkinElmer), and 2 75 paired- 10%; and (iv) the count of reads that supported the reported end sequencing performed on NextSeq500 (Illumina) to yield on somatic SNV 2. Only somatic SNVs and indels identified by average 50 M reads/sample. both MuTect2 (19) and VarScan2 (20) were retained for further analysis. All somatic SNV and indels were functionally annotated DNA-seq library preparation and sequencing by ANNOVAR (version 20160201; ref. 21). WGS was performed with DNA extracted from fresh-frozen tumor and normal material. DNA was extracted from fresh-frozen Somatic copy-number variation detection tissues, using the Gentra Puregene DNA Extraction Kit (Qiagen) Somatic copy-number variations (CNV) were called for 42 following the protocol of the manufacturer. DNA isolates were tumor samples with paired adjacent tissue samples using Con- hydrated in TE buffer and confirmed to be of high molecular trol-FreeC (version 9.5; parameters: ploidy ¼ 2, breakPointThres- weight (10 kb) by agarose gel electrophoresis. The concentration hold ¼ 0.8, breakPointType ¼ 0.8, coefficientOfVariation ¼ 0.05, was measured using Qubit 2.0 Fluorometer (Life Technologies). contamination ¼ 0, contaminationAdjustment ¼ FALSE, min- Short insert DNA libraries were prepared with the TruSeq Nano MappabilityPerWindow ¼ 0.85, window ¼ 5000, numberOfPro- DNA HT Sample Prep Kit (Illumina), the DNA libraries were cesses ¼ 10, mateOrientation ¼ FR; ref. 22). The GISTIC2 (version sequenced on Illumina HiSeq X platform, and 150 bp paired-end 2.0.22; ref. 23) was used to identify regions of the genome that reads were generated. Human DNA libraries were sequenced to are significantly amplified or deleted across 37 samples (5 sam- obtain coverage of minimum 30 or 50 for both tumor and ples that had abnormal somatic mutation count were removed; matched normal. parameters: -genegistic 1 -smallmem 1 -broad 1 -brlen 0.98 -conf 0.90 -savegene 1). RNA-seq library preparation and sequencing For RNA extractions, tissue sections were first lysed and homog- ATAC-seq data processing enized with the TissueLyser (Qiagen). Subsequent RNA extrac- Paired-end ATAC-seq fragments of human samples were tions were performed with the Qiagen RNeasy Mini Kit according aligned to human reference build hg19 using BWA mem com- to the instructions of the manufacturer. The RNA quality was mand with -M parameter. PCR duplicates were removed by

OF2 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

Picard (version 1.119). Then, reads mapped to mitochondria the WGS depth exceeded 30; and the RNA-seq data generated were discarded. Only uniquely mapped paired-end reads 80.6165.6 M (median ¼ 95.63 M) reads (Supplementary with fragment length less than 2,000 bp and mapping quality Table S2). > 30 were kept for further analysis. ATAC-Seq peak regions of We characterized the genomic variations, including somatic each sample were called using MACS2 (version 2.1.0; ref. 24) SNV, indel, and CNV for each tumor sample (see Materials and with parameters: –nomodel –nolambda –call-summits -q 0.001. Methods). The majority of the somatic SNVs and indels were Blacklisted regions (http://hgdownload.cse.ucsc.edu/golden located in intergenic regions and introns, whereas only a small Path/hg19/encodeDCC/wgEncodeMapability/) were excluded proportion of somatic SNVs were located in exons (Fig. 1B). In from called peaks. Open chromatin regions from The Cancer accordance with previous report (3), we identified similar somat- Genome Atlas (TCGA) were converted to hg19 genome using ic SNVs in driver genes of NSCLC, including TP53 in 26.92% of liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). To gener- patients with lung adenocarcinoma and 33.33% of patients with ate a consensus set of unique peaks, we scanned the genome using LUSC, EGFR in 19.23% of patients with lung adenocarcinoma, 200 bp sliding windows and 100 bp step size. Adjacent open CSMD3 in 33.3% of patients with LUSC, and NFE2L2 in 33.3% windows that appeared in at least one sample were combined as a of patients with LUSC (Fig. 1C). The transition to transversion consensus open region list. Jaccard index, which captures the ratioofsomaticSNVrangedfrom0.22to2.1inlungadeno- proportion of open chromatin regions that are shared by two carcinoma and from 0.04 to 0.69 in LUSC (Supplementary Fig. samples, was used to measure the similarity between samples. S1A). The proportion of C>A transition, which was reported to Using all ATAC-seq data from NSCLC samples, we constructed a be associated with smoking and lung cancer, was significantly symmetric similarity matrix based on the presence of the open higher in NSCLC samples than benign samples (P ¼ 0.04878; region. This distance matrix was clustered using hierarchical Mann–Whitney U test; Supplementary Fig. S1B). We further clustering algorithm from the "pheatmap" R package. Cluster- performed mutation signature analysis on the somatic SNV (26) specific accessible peaks were identified with Dunn (1964) Krus- and identified four enriched mutational signatures in all kal–Wallis test. All ATAC-seq peaks that were no more than 2.5 kb NSCLC samples (S1–S4, Supplementary Fig. S1C; Supplemen- away from annotated gene TSS (GENCODE v19) were selected as tary Materials and Methods). Three of the four enriched sig- promoter ATAC-seq peaks. GREAT functional enrichment anal- natures (S2–S4) had high cosine similarity to cancer-related ysis (version 3.0.0; ref. 25) was used to catalog the function of signatures cataloged in the COSMIC database (Supplementary peak sets with default parameter. Materials and Methods; ref. 27). The signature S2 was analo- gous to COSMIC signature 9 (cosine similarity: 0.57). COSMIC RNA-seq processing signature 9 has been found in chronic lymphocytic leukemia, as Raw RNA-seq reads were aligned to the hg19 genome assembly well as malignant B-cell lymphomas and nerized by a pattern of using Hisat2, and FPKM of gene expression were quantified by mutation caused by polymerase h. The signature S3 resembled StringTie. Genes with mean FPKM greater than 1 in either lung COSMIC signature 5 (cosine similarity: 0.65) that existed in all adenocarcinoma group or LUSC group were retained. In addition, kinds of cancers and most cancer samples. The signature S4 only genes annotated as protein_coding, lincRNA, snRNA, matchedtoCOSMICsignature3(cosinesimilarity:0.73).This miRNA, and snoRNA were kept and used for xQTL association COSMIC signature was observed in breast, ovarian, and pan- analysis, leaving 12,776 genes for eQTL analysis. creatic cancers and associated with failure of DNA double- strand break repair by homologous recombination. Interest- Data access ingly, all LUSC samples were strongly associated with S3, Processed data of NSCLC samples in this study, including whereas most of the lung adenocarcinoma samples were asso- ATAC-seq peaks, gene expression matrix, and genomic altera- ciated with S2 (Fig. 1D). We clustered all the samples based on tions, are available via the data center of precision medicine in the contributions of these four signatures. The three pathologic West China Hospital (https://pms.cd120.com/download.html; types of samples (lung adenocarcinoma, LUSC, and BSPN) Sichuan, China). could be well classified, with only one lung adenocarcinoma sample being misclassified to the LUSC cluster (Fig. 1E). As is supported by previous TCGA study (28), many known Results NSCLC driver genes were found to have significant amplifica- Generation of matched ATAC-seq, genome, and transcriptome tion such as EGFR and NKX2-1 (FDR < 10e-3), or deletion such data in NSCLC samples as CDKN2A (FDR < 10e-5) in lung adenocarcinoma samples. To reveal the landscape of open chromatin in NSCLC and Besides, we also observed the amplification of other known elucidate the association between the open chromatin, geno- lung cancer–related genes (Supplementary Table S3), including mic variations, and gene expression profiles in NSCLC, we DDR2, TPM3, TPR,andLRIG3 (FDR < 0.1). Furthermore, collected primary tumors from 51 patients, including 34 significant deletions of other known cancer-related genes were patients with lung adenocarcinoma, 13 patients with LUSC, also identified (see Materials and Methods), like PIK3R1, IL6ST, and 4 patients with benign solitary pulmonary nodules (BSPN; and MAP3KQ (FDR < 0.1; Fig. 1F; Supplementary Table S3). In Supplementary Table S1; see Materials and Methods). We terms of somatic CNV in LUSC, we found significant amplifi- also collected paired adjacent tissues from 40 patients with cation of SOX2,BCL11A,REL,WHSC1L1,andFGFR genes, and NSCLC and 2 patients with nonmalignant nodules for significant deletion of CDKN2A. These results were consistent somatic mutation calling (see Materials and Methods). For with the TCGA study (29). Also, we found significant deletions each sample, we performed ATAC-seq, WGS, and RNA-seq of other known genes that are frequently mutated in lung (Fig. 1A; see Materials and Methods). Each of the ATAC-seq cancer, such as MYCL, ROS1,andEZR (FDR < 10e-2;Fig.1F; data generated 61.72243.83 M (median ¼ 103.47 M) reads; Supplementary Table S3).

www.aacrjournals.org Cancer Res; 2019 OF3

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

A Open ATAC-seq chromatin peaks

WGS Cancer regulatory network

Gene Mutations Tumor Adjacent tissue RNA-seq expression (SNV,CNV)

B C

LUSC LUAD 50,000 Exonic Exonic;splicing EZR STRN Frameshift deletion SMARCA4 Nonsynonymous SNV 40,000 Intergenic HIP1 Intronic RBM10 Stopgain Synonymous SNV ncRNA Exonic DROSHA 30,000 RFWD3 ncRNA Exonic;splicing EML4 NRG1 ncRNA Intronic

Count EPHA3 FAM47C 20,000 ncRNA Splicing Splicing FAM135B FAM135B 10,000 CUL3 3’ UTR NFE2L2 5’ UTR CSMD3 CSMD3 0 5’ UTR;3’ UTR EGFR TP53 TP53

0123 0246 LUAD4 LUAD5 LUAD1 LUAD2 LUAD6 LUAD7 LUAD8 LUAD9 LUSC3 LUSC2 LUSC1 LUSC4 LUSC5 BSPN1 BSPN3 LUAD13 LUAD24 LUAD11 LUAD14 LUAD19 LUAD20 LUAD22 LUAD23 LUAD28 LUAD29 LUAD30 LUAD31 LUAD12 LUAD25 LUAD27 LUAD32 LUAD50 LUAD51 LUSC11 LUSC13 LUSC14 LUSC10 Counts of mutated cases Counts of mutated cases D E S1 S2 S3 0.04 0.75 S4

0.50

0.02 0.25

Signature contribution 0.00 0.00 LUAD1 LUAD5 LUAD8 LUAD6 LUAD9 LUSC2 LUAD7 LUAD4 LUSC5 LUSC4 LUSC1 LUAD2 LUSC3 BSPN1 BSPN3 LUAD22 LUAD23 LUAD24 LUAD25 LUAD28 LUAD19 LUAD31 LUAD32 LUAD29 LUAD51 LUAD14 LUAD30 LUAD27 LUSC10 LUSC11 LUAD11 LUSC14 LUAD50 LUAD13 LUAD12 LUAD1 LUAD8 LUAD2 LUAD4 LUAD5 LUAD6 LUAD9 LUAD7 LUSC2 LUSC4 LUSC5 LUSC1 LUSC3 BSPN1 BSPN3 LUAD11 LUAD13 LUAD19 LUAD23 LUAD32 LUAD24 LUAD25 LUAD28 LUAD30 LUAD51 LUAD12 LUAD14 LUAD22 LUAD27 LUAD29 LUAD31 LUAD50 LUSC11 LUSC10 LUSC14

F LUAD LUSC 1q21.2(1) 1p34.2(1) MYCL 1 1 DDR2, TPM3, TPR 1q21.3(26) 2p21(4) 1q44(4) 2p15(3) 2 2 BCL11A, REL 2q14.2

3 2q37.3 3 3q25.1(4) 3q27.1(2) 5p15.33(2) 4q13.2 SOX2 4 5p13.2 4 PIK3R1, IL6ST, MAP3K1 5q13.2(3) 6p22.1(1) 5 6p21.32 5 6p21.32 7p22.3 6p21.32(37) ROS1, EZR 6 7p11.2(1) 6 EGFR 7q11.21 7 8q21.11 7 8p22(1) PCM1 WHSC1L1, FGFR1 8p11.23(2) 8q21.13 8 8 8q21.2(1) 8p11.22 9p21.3(1) CDKN2A 9p21.3(1) CDKN2A 9 8q22.3(2) 9 8q24.22(1) 10 8q24.3 10 11 12q14.1(4) 11 12 LRIG3 12q14.2 12 12q15 12q12(1) 13 12q21.1 13 14 NKX2-1 14q13.3(1) 14 14q21.1 15 15 14q23.3(1) 16 14q32.2 16 17p11.2 17 14q32.31 17 17q22 18 18 19p13.3 19p13.3 19 19q13.11(2) 19 20 20q13.33 20 21 21 22q11.21(1) 22q11.23 22 21q22.13(5) 22 10−6 10−4 −2 1010 −1 10−1 10−2 10−3 10−510−8 10−8 10−5 10−2 10−1 10−1 10−2 10−4 10−7 Q-value Q-value

Figure 1. Overview of study and genetic landscape of patients with NSCLC. A, Overview of study design. B, Barplot shows the number of somatic SNV located on different genomic features in each sample. C, Barplot shows the counts of samples harboring different somatic SNVs and indels in known genes related to cancer. D, Barplot shows the signature contribution in each sample. E, Dendrogram represents the results of hierarchical clustering of samples according to four de novo mutational signatures. F, Line chart shows significant copy number gains (orange) and losses (blue; FDR < 0.1). LUAD, lung adenocarcinoma.

OF4 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

Open chromatin signature of NSCLC stage [q-value < 0.02, Dunn (1964) Kruskal–Wallis To identify open chromatin regions, we applied ATAC-seq on test; Fig. 2C; Supplementary Table S6]. We identified 279 eachtumorsample(seeMaterialsandMethods).Thenumbers open chromatins peaks that were specific to the "early stage" of open chromatin peaks in each sample ranged from 3,989 to related subcluster (see Materials and Methods). The genes 211,042, showing high variance (Supplementary Fig. S2A). We associated with these peaks were enriched in the integrin then filtered out six samples whose ATAC-seq peak numbers signaling pathway, which played important roles in the were less than 20,000, leaving 40 ATAC-seq data for the attachment of cells to the extracellular matrix (Supplementary following analysis. Because tumor samples were known to be Table S7). For instance, among the "early stage," specificopen heterogeneous (13), we evaluated the intersample heterogene- chromatin genes were ITGAV and ITGA6, which interacted ity at open chromatin level using the Jaccard index score (30). with vitronectin and laminin, respectively. These two genes Comparing with ATAC-seq peaks in stem cells and sorted blood were reported to be prognostic indicators in multiple cancers, lineage cells (9), which were considered to be homogenous, the including NSCLC, whose high expression correlated to longer ATAC-seqpeaksfromNSCLCsampleshadsignificantly smaller survival time (32). On the other hand, we identified 6,664 Jaccard index scores (Fig. 2A), indicating that NSCLC samples chromatin regions that were specifically closed in the "early contained more heterogeneous open chromatin regions across stage" cluster. Their related genes were enriched in "spongio- samples. Unsurprisingly, the open chromatin peaks that were trophoblast layer development" and "glomerular visceral shared across NSCLC samples mainly associated with house- epithelial cell differentiation" pathways (Supplementary Table keeping functions, such as gene regulation, translation, and S7). Overexpression of genes in these pathways led to elevated DNA damage checkpoint (Supplementary Table S4). To explore cell mobility, cancer cell invasion, and tumor malignant pro- whether sample-specific peaks could reveal function specificto gression, which were considered later stage cancer pheno- the corresponding context, we then sorted the samples based types (33). To further confirm that open chromatin distribution on the percentage of minority peaks (peaks appeared in less is a potential marker to classify lung cancer samples, we than 20% samples). Interestingly, the percentage of minority incorporated 76 QC-passed ATAC-seq data of NSCLC samples peaks was significantly correlated with pathologic types (P ¼ from 38 patients with TCGA (12) and 5 ATAC-seq data of 0.0003693, Mann–Whitney U test), where LUSC samples small-cell lung cancer (SCLC) cell lines (34) with our data. tended to contain more minority peaks than lung adenocarci- Consistent with the previous analysis, most of NSCLC samples noma samples. In addition, the percentage of minority peaks could be classified into three groups (Supplementary Fig. S2D). were also significantly correlated with higher tumor stage (P ¼ Interestingly, the major group of LUSC (cluster II, including 27 0.01338, Spearman correlation test), history of smoking (P ¼ samples from 16 patients) and lung adenocarcinoma (cluster 0.006324, Mann–Whitney U test), and female (P ¼ 0.0108, III, including 50 samples from 33 patients) were classified more Mann–Whitney U test), but not significantly correlated with clearly (Supplementary Fig. S2D; Supplementary Table S6). other factors such as peak number, presence of metastasis, and Cluster I consisted of nine LUSC samples from 5 patients and age (Fig. 2B). 20 lung adenocarcinoma samples from 15 patients. These To examine the potential of global open chromatin distri- samples have higher intersample heterogeneity and more var- bution as markers to classify NSCLC samples, we clustered iant peak number comparing with cluster II and cluster III. the samples based on ATAC-seq signals (see Materials and Patients in this group had a distinct open chromatin pattern Methods). The NSCLC samples aggregated into three clusters that is different from typical LUSC or lung adenocarcinoma. A (Fig. 2C). Cluster I consisted of four LUSC and four lung small number of NSCLC samples, six LUSC and four lung adenocarcinoma samples. Samples in cluster I contained adenocarcinoma, were clustered together with SCLC samples significantly lower number of open chromatin peaks [q- in cluster IV. These samples were more likely to have lower data value < 2.930761e-04; Dunn (1964) Kruskal–Wallis test] and quality, considering that they contained the lowest open chro- had significantly higher intersample heterogeneity [Fig. 2D; q- matin peak numbers and high intersample heterogeneity. value < 3.517155e-10; Dunn (1964) Kruskal–Wallis test] than Therefore, we excluded these samples in further analysis. Inter- the other two clusters, exhibiting patient-specificfeatures estingly, comparing with SCLC, we identified 6,965 NSCLC- rather than cancer type features. Cluster II mainly consisted specific open peaks enriched around genes that associated with of LUSC samples (five LUSC and one lung adenocarcinoma cell adhesion functions, such as focal adhesion and cell- samples). We identified 310 genes within 2.5 kb up and substrate adherens junction (Supplementary Fig. S2E). downstream of gene TSS of cluster II–specific open chromatin To better understand the intratumor heterogeneity, we per- and with upregulated gene expression (Supplementary Fig. formed single-cell ATAC-seq on 1 patient with LUSC (Supple- S2B; Supplementary Table S5; see Materials and Methods). mentary Table S1). In total, we detected 50,486 peaks in These genes were enriched for the keratinization process this patient (Supplementary Materials and Methods). After (Fig. 2E; Supplementary Fig. S2C), which was a phenomenon removing low-quality nuclei, we subjected 1,651 cells to t- specific to LUSC and not in lung adenocarcinoma (31). Clus- distributed stochastic neighbor embedding (Supplementary ter III mainly consisted of lung adenocarcinoma samples (25 Fig. S3A; Supplementary Table S2) and identified seven major lung adenocarcinoma and one LUSC samples), which could clusters of cells (Supplementary Materials and Methods). We be further classified into three subclusters (Fig. 2C). The categorized each cluster (Supplementary Fig. S3B and S3C), samples from the three subclusters contained distinct open and found that cells in cluster III were likely to come from chromatin patterns and different stages of tumor development tumor cells (Supplementary Materials and Methods). As (Fig. 2C; Supplementary Table S6). In particular, one of the expected, the open chromatin profile of cluster III showed subclusters, which contained fewer peaks than the other two high concordance with bulk NSCLC samples, 97% peaks in subclusters, were significantly associated with early tumor cluster III appeared in bulk samples (Fig. 3; Supplementary Fig.

www.aacrjournals.org Cancer Res; 2019 OF5

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

A D Blood lineage 15 CD4T CD8T 10 CMP GMP HSC 0.5 5 LMPP MPP 0

Density 15 0.4 LUAD NSCLC 10 LUSC

5 0.3

0 Jaccard score 0.0 0.2 0.4 0.6 Jaccard index score 0.2 ns B 0%~20% 20%~40% 40%~60% 60%~80% 80%~100%

100 ***

75 Cluster I Cluster II Cluster III

50

1 kb 0.5 kb 1 kb Percentage 25 E

0 Peak number** Tumor type ** Metastasis Stage ** Smoke **

Age Cluster II Sex ** Staging Smoke Ages Sex

IA IB IIA IIB IIIA IIIB Yes No 75 40 MF Peak number Tumor type Metastasis

2e+05 0.5e+05 LUAD LUSC Yes No

C I II III Cluster S3 S2 S1 Subcluster Peak number Tumor Type Metastasis Staging Smoke Years Sex Cluster III 1 Jaccard score 0.2 0.4 0.6 0.8

LCE1E LCE3D SPRR3

Figure 2. Integration of chromatin landscapes in NSCLC. A, Density plots show Jaccard index distribution between blood lineage cells and NSCLC samples. CD4T (n ¼ 5); CD8T (n ¼ 5); CMP, common myeloid progenitor (n ¼ 8); GMP, granulocyte–macrophage progenitor cells (n ¼ 7); HSC, hematopoietic stem cells (n ¼ 7); LMPP, lymphoid-primed multipotent progenitor cells (n ¼ 3); MPP, multipotent progenitor cells (n ¼ 6); LUAD, lung adenocarcinoma (n ¼ 30); and LUSC (n ¼ 10). B, Barplot shows the percentage of peaks with different sample frequency in NSCLC samples. Bottom bars indicate peak number; tumor type (purple, lung adenocarcinoma; blue, LUSC); having remote metastasis (orange) or not (blue); tumor staging; smokers (black) or nonsmokers (green); patient age; and female (orange) or male (blue). , P < 0.01. C, Heatmap shows hierarchical clustering of open chromatin region of NSCLC samples. D, Boxplot shows sample similarity (Jaccard index) of open chromatin region within three NSCLC clusters. ns, not significant; , P < 0.005 [Dunn (1964) Kruskal–Wallis test]. E, Browser plot shows normalized ATAC-seq profiles at keratinization process–related genes (orange, cluster II NSCLC samples; blue, cluster III NSCLC samples).

S3D). At the single cell level, we observed a high degree of profile (r ¼ 0.74, P < 2.2e-16, Spearman correlation test; heterogeneity at some of the open chromatin regions (Fig. 3). Supplementary Fig. S3E), suggesting highly heterogeneous Interestingly, there appeared to be a high correlation between open chromatin regions were potentially important to the intersample and intratumor heterogeneity of open chromatin evolution of tumor cells.

OF6 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

25 kb

LUSC : Bulk ATAC-seq (n = 10)

LWQC Aggregate scATAC-seq (cluster III : 283 single cells)

CDH3 CDH1

1 0

Cluster III 283 single cels

Figure 3. Overview of open chromatin signal of tumor single cells. Browser plot displays ATAC-seq signal around the CDH3 and CDH1 gene loci from the aggregate of all 10 LUSC samples and 283 tumor single cells that classified into cluster III.

Broad open chromatin peaks associated with NSCLC key genes included several key driver genes of NSCLC, like EGFR, jun proto- Thepreviousstudyhasprovedthat the width of epigenetic oncogene (JUN), Erb-B2 3 (ERBB3), Wnt marker peaks associated with cell type–specific key genes (30). family member 9A (WNT9A), and FAT atypical cadherin 1 We next focused on broad open chromatin peaks and associ- (FAT1). As an extension of these studies, we noted that the other ated genes by incorporating published leukemia ATAC-seq NSCLC-specific genes were significantly enriched for biological datasets (Supplementary Fig. S4A and S4B; Supplementary functions in the cancer-associated pathway, which has been Materials and Methods; ref. 9). Among the broad open chro- reported in the majority of solid malignant tumors (Supplemen- matin peak–associated genes, a total of 310 genes were NSCLC tary Table S8; refs. 35, 36). specific; 368 genes were Leukemia specific; and 337 genes were To study the effect of broad open chromatin peaks on gene shared between the two cancer types (Fig. 4A). About 10% of expression of NSCLC, we compared the expression levels of the genes in each of the three categories were COSMIC anno- 647 genes between samples that overlapped with broad open tated genes that related with at least one type of cancer, which chromatin peaks and not overlapped with broad open chro- was not surprising because both broad open chromatin peak– matin peaks. For genes that overlapped with broad open associated genes and COSMIC genes tend to be involved in chromatin peaks, their average expression levels were higher critical pathways. Interestingly, however, the proportion of in samples with broad peaks than in those without peaks NSCLC-related COSMIC genes was significantly higher (P < (Fig. 4C). However, only 38 of the 647 genes showed signif- 3.44e-30, x2 test; Supplementary Fig. S4C) in NSCLC-specific icant differential expression (FDR < 0.05, Mann–Whitney U broad open chromatin peak–associated genes than the other test; Fig. 4C). We then compared the variance of expression two categories, which suggested that the broad open chromatin levels between broad open chromatin peak overlapped and peaks might regulate genes essential to NSCLC functions. We nonoverlapped samples. On average, the variance of broad further noticed that the NSCLC-specific broad open chromatin open chromatin peak overlapped genes was higher than non- peak–associated genes were enriched in pathways that were overlapped genes (Fig. 4D). Compared with random genes, the previously reported in NSCLC, such as ECM receptor interac- fold change of variance between broad open chromatin peak tion and proteoglycans in cancer; whereas the leukemia-specific overlapped and nonoverlapped genes were significantly higher broad open chromatin peak–associated genes were enriched in regardless of gene expression levels (Mann–Whitney U test; P < pathways such as platelet activation (Supplementary Fig. S4D; 0.05;Fig.4D),whichsuggestedthatbroadopenchromatin see Materials and Methods). peaks were related to misregulation of genes in NSCLC. Besides, Meanwhile, many known NSCLC driver genes were covered by we also found that wide open regions are more likely to overlap broad open chromatin peaks. Strikingly, in three of the samples, with regions of copy-number gain/amplification, comparing the open chromatin regions were so broad that the whole EGFR with random genomic regions (P < 0.05 in 31/33 samples, gene body was covered (Fig. 4B). Besides, there were other NSCLC permutation test; Supplementary Fig. S4E). For example, in driver genes covered by broad open chromatin peaks, including three cases we mentioned above (Fig. 4B), both wide open JUN, ERBB3, WNT9A, and FAT1. We ranked genes according to chromatin regions and the gain of CNV covered the whole gene the width of broad open peaks in descending order and a carrier body of EGFR. This result suggested that there was an associ- frequency of associated broad open regions in ascending order. ation between somatic CNVs, wide open chromatin regions, Genes that frequently affected by broad open chromatin peaks and gene expression.

www.aacrjournals.org Cancer Res; 2019 OF7

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

A B 50 kb NSCLC Leukemia COSMIC Gene constitution EPHB3 ERBB3

FAM174B 37.0% HECTD1 Leukemia CASD 22.2% PTPN14

NSCLC Specific 40.7% EGFR NSCLC

43.6% Figure 4. 10.3% Others The broad open chromatin region in NSCLC. A, Heatmap (left) shows 46.2% Leukemia specific broad peak–associated genes in NSCLC and leukemia samples. Broad open region association is NSCLC & leukemia color coded as shown at the bottom. Pie charts (right) show the 48.6% percentage of COSMIC-annotated 2.7% genes associated with the broad PIK3CA open region. Different types of Common tumors are color coded on the JUN 45.9% 2.7% right. B, Browser plot shows ATAC- seq signal around EGFR in 33

Affected No affected NSCLC samples. EGFR promoter with and without broad open C Nonsignificant Significant 2.0 region association is zoomed out at the bottom. Sample types are color coded at the bottom. C, 1.5 Volcano plot shows mean gene expression of broad open samples EGFR (FDR)

10 1.0 and nonbroad opened samples. Differentially expressed genes are -Log coded in orange (FDR < 0.05, 0.5 Mann–Whitney U test). D, Boxplots

show log2-fold change of variable coefficient in broad open genes 0.0 -4 -2 0 2 4 (broad open gene against Log fold change nonbroad open gene; orange), and D nonbroad open genes (random

6 gene expression comparison; blue).

4

2

0

- 2

- 4

Log fold change of variance Background - 6 With broad peak Non top1% broad open chromatin peak

To t a l Top1% broad open chromatin peak FPKM < 1 FPKM >= 20 1 <= FPKM <3 3<= FPKM < 5 5 <= FPKM10 < 10<= FPKM < 20

Open chromatin on somatic CNV regions associated with gene less likely to be located on intergenic regions (Supplementary Fig. expression S5A). For each sample, within 2.5 kb up- and downstream of gene Somatic CNV of gene regulatory regions could potentially affect body, an average of 79.26% expressed genes in our dataset target gene expression. To study the relationship between somatic (13,257 genes) contained open chromatin peaks, an average of CNV and open chromatin peaks in NSCLC, we intersected somat- 11.80% of the genes contained sCOP, and an average of 7.96% ic CNV regions with open chromatin peaks in each sample. genes contained neither open chromatin peak nor somatic CNV Interestingly, the intensity of ATAC-seq signal positively correlat- region (Supplementary Fig. S5B). ed with the copy number of somatic CNV regions (Fig. 5A), which The previous study has shown the association between somatic suggested that the regulatory elements were gained or lost togeth- CNV and gene expression (37). To examine the relationship er with the genomic regions they were located on. The somatic between somatic CNV and gene expression in our data, we CNV fragment that carried open chromatin peaks (sCOP) were identified 6,981 genes that overlapped with somatic CNV gain more likely to be located on gene promoters and gene body, but fragment and 759 genes that overlapped with somatic CNV loss

OF8 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

A D 6 Renal cell carcinoma Non−small cell lung cancer 4 Prostate cancer Pathways in cancer 2 Glioma Glycosylphosphatidylinositol(GPI)−anchor biosynthesis Intensity 0 Pancreatic cancer Toll−like receptor signaling pathway -2 Acute myeloid leukemia Chronic myeloid leukemia 1 876543 9 10 >10 Endocytosis Copy number Fc gamma R−mediated phagocytosis B Melanoma Adherens junction 6 Gain Colorectal cancer ErbB signaling pathway

4 Endometrial cancer T cell receptor signaling pathway Gene number ( P )

10 Insulin signaling pathway 10 2 Proteasome 20

−Log Small cell lung cancer 30 Spliceosome 40 0 MAPK signaling pathway 50 −4 04 VEGF signaling pathway 60 Log2 (fold change) Regulation of actin cytoskeleton Neurotrophin signaling pathway Fold Enrichment 3.5 3 B cell receptor signaling pathway 3.0 Valine, leucine and isoleucine degradation Amino sugar and nucleotide sugar metabolism 2.5 2 Pathogenic Escherichia coli infection 2.0 ( P ) Fc epsilon RI signaling pathway 1.5 10 mTOR signaling pathway 1

−Log Huntington's disease Aldosterone−regulated sodium reabsorption 0 Epithelial cell signaling in Helicobacter pylori infection −4 04 234 P Log2 (fold change) −Log10 ( ) C E

[0% – 42%] Somatic CNV gain frequency Ras signaling pathway

Somatic CNV gain PIK3CG

chr7: 44 Mb 52 Mb 60 Mb 100% _

Open frequency

Loss sCOP associated DEG

Gain sCOP associated DEG 3.03% _ DEG-Related gain sCOP Calcium signaling pathway EGFR EGFR PI3K-Akt signaling EGFR pathway EGFR EGFR EGFR EGFR EGFR EGFR-AS1 GM12878 ChromHMM H1-hESC ChromHMM p53 Signaling K562 ChromHMM pathway HepG2 ChromHMM HUVEC ChromHMM Cell cycle Cell cycle HMEC ChromHMM HSMM ChromHMM NHEK ChromHMM

NHLF ChromHMM G1–S progression 55,200,000 55,400,000

Figure 5. The association of somatic CNV, open chromatin peaks, and gene expression. A, Boxplot represents the correlation of the copy number of somatic CNV regions and the intensity of ATAC-seq signal of open chromatin peaks that overlapped corresponding somatic CNV regions. B, Volcano plots illustrate DEGs between samples with and without sCOPs (orange, DEGs; blue, non-DEGs). C, Aligned tracks show the somatic CNV gain frequency, somatic CNV gain, open frequency, and DEG-related gain sCOP within 1,000 bp from EGFR (bright red, active promoter; light red, weak promoter; purple, inactive/poised promoter; orange, strong enhancer; yellow, weak/poised enhancer; blue, insulator; dark green, transcriptional transition; light green, weak transcribed; gray, polycomb repressed; light gray, heterochromatin, repetitive/copy number variation). D, Point plot shows the top 35 most significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of gain of sCOP-associated DEG, sorted by P value in reverse order. The color represents the fold of enrichment, and the size of the point represents the number of DEGs. E, The KEGG pathway view represents the NSCLC pathway. DEGs are manually highlighted (orange boxes, gain sCOP– associated DEGs; blue boxes, loss sCOP–associated DEGs).

www.aacrjournals.org Cancer Res; 2019 OF9

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

fragment within 2.5 kb up- and downstream of gene body in at copy of regulatory elements (38). We analyzed all 33,303 sCOPs least three samples. For each gene, we then compared the expres- (29,433 local sCOPs and 4,395 distant sCOPs) and their related sion levels between samples in which the gene associated with DEG pairs. We found that 99.7% of the sCOP-DEG pairs were somatic CNV fragment and samples in which the gene did not carried by the same somatic CNV fragment, which indicated that associate with somatic CNV fragment. Compared with genes that in most of the cases the CNV affects the copy number of genes did not associate with somatic CNV fragment, the majority of the together with their regulatory elements. genes associated with somatic CNV gain showed higher expres- Many known important NSCLC-related genes were affected by sion levels, whereas the majority of the genes associated with sCOP in our data (Supplementary Table S10). For example, we somatic CNV loss showed lower expression levels (Supplemen- identified 40 gain of sCOPs around EGFR gene. The EGFR gene tary Fig. S5C). In particular, 21.7% genes associated with somatic expressed significantly higher in samples with a gain of sCOPs CNV gain were differentially expressed genes (DEG), whereas than samples without sCOP (P < 0.05, Wilcoxon Rank Sum 14.5% genes associated with somatic CNV loss were DEGs (P < test; Fig. 5C). As another example, we identified NFKBIA, which 0.05, two-tailed Mann–Whitney U test; Supplementary Fig. S5C). harbored the largest number of gain of sCOP around the gene We next focused on the effect that sCOP exerted on gene (Supplementary Fig. S5E). The sCOP-associated DEGs were expression. In total, we identified 229,554 sCOPs with somatic enriched in multiple cancer pathways and signaling pathways CNV gain and 5,443 sCOPs with somatic CNV loss in at least three (Fig. 5D; Supplementary Materials and Methods). Among all the samples. We then compared the expression levels of sCOP- enriched cancer pathways, NSCLC pathway ranked among the associated genes between samples with and without sCOPs top. We annotated the sCOP-associated DEGs in the NSCLC (Supplementary Materials and Methods). The majority of genes pathway (Fig. 5E) and discovered that they were likely to position associated with gain of sCOP exhibited significantly higher on the EGFR-SOS2-BRAF-MAPK1 cascade. sCOP-associated expression levels, whereas the majority of genes associated with DEGs were also significantly enriched in the ErbB signaling loss of sCOP exhibited lower expression levels, compared with the pathway, which played essential roles in cancer development and same genes associated with peaks that did not overlap with progression (Supplementary Fig. S5F). In the ErbB signaling somatic CNV (Fig. 5B; Supplementary Table S10). Furthermore, pathway, multiple sCOP-associated DEGs fell on PI3K/AKT cas- 34.34% of genes associated with gain of sCOP were DEGs, cade, which was known to be dysregulated in cancer (39). whereas 16.89% of genes associated with loss of sCOP were DEGs (P < 0.05, Wilcoxon Rank Sum test; Fig. 4B; Supplementary Germline SNV on open chromatin associated with gene Table S9). DEG-related sCOPs were enriched in promoter, expression of NSCLC enhancer, and insulator, whereas distal DEG-related sCOPs (dis- SNV could potentially affect epigenetic states on regulatory tance between DEG and sCOP 1,000 bp) were more signifi- elements and alter the expression of target genes (40). To study cantly enriched in enhancers and insulators than local DEG- how germline SNV regulates gene expression through open chro- related sCOP (distance between DEG and sCOP <1,000 bp; matin regions in NSCLC, we first called germline SNVs from our Supplementary Fig. S5D). It has been shown that CNV would samples, and then identified cis eQTLs (within 1,000 kb of gene, affect gene expression through changing gene dosage (37) or the only considered SNVs within ATAC-seq peaks) and cis ATAC-QTLs

A C Background eQTL In-open-region SNP Open region Gene 1.5

ATAC-QTLs 1.0

eQTLs Local Distal Figure 6.

Density (1e-06) 0.5 Landscape of ATAC-QTLs and eQTLs in NSCLC. A, Graphical 0 summary of ATAC-QTLs, eQTLs, SNP: 76 SNP: 21 SNP: 20 SNP: 11244 -1,000 -500 0 500 1,000 and association between ATAC-seq Distance (kb) Gene: 32 Open region: 21 Open region: 127 Open region: 10982 open region and gene. B, Manhattan B D plot of eQTLs. Orange dots, eQTLs; 60 orange triangles, joint-QTLs. C, Density plot shows the distance MON2 Genome 8 SHARPIN CHN2 All tested germline SNPs between eQTLs and TSS of HES4 HORMAD1 RHPN1 ZC3H3 GSTT1 eQTLs associated gene (orange), and FBXL6 NTS CYP4F3 GLI4 ALOX15B APOL2 GSTM1 ROS1 KIAA1875 XRCC6BP1 PPP1R37 40 distance between all pairs of SNV SCRIB ARL17A SCARNA2 C4orf48 PRKAR1B 6 DYSF ZNF707 RHCG CTU1 and TSS of all genes used for the RPTN MAPK15 CYP4F11 RP11-273G15.2 eQTLs test are drawn as ( P )

10 background (blue). D, Barplot Proportion (%) 20 shows the genomic feature − Log 4 distribution of the human genome, all tested germline SNVs and eQTLs.

2 0

1234567891011121315171921 TSS TES 3’UTR 5’UTR Promoter Exon Intergenic Intron Chromosome Feature

OF10 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

www.aacrjournals.org Cancer Res; 2019 OF11

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

(within 1,000 kb of ATAC-seq peaks, only considered SNVs by previous studies. For example, GSTM1, one of the genes within ATAC-seq peaks) from our data (Fig. 6A; Supplementary associated with joint-QTL, was involved in the detoxification Materials and Methods). Under the threshold of FDR < 0.1, we of many carcinogens (45), and multiple studies had reported identified 76 eQTLs associated with 32 genes (Fig. 6B). These the correlation between the malfunction of GSTM1 and eQTLs were not overlapped with reported eQTLs in normal increased risk of lung cancer (46). The SNP, rs10857795, tissues (41), suggesting they were likely NSCLC-specificeQTLs. located in the intron of GSTM1, was previously annotated as The majority (62.5%) of the genes were associated with only no effect (47). However, we found this SNP to be joint-QTL one eQTL. Conversely, 12 genes (37.5%) were associated with that correlated with both the gene expression level of GSTM1 multiple eQTLs (median ¼ 3.5; Supplementary Fig. S6A). In gene and the ATAC-seq peak intensity that span the promoter agreement with previous reports (42), the eQTLs tended to fall of this gene in NSCLC (Fig. 7C). This joint-QTL could poten- close to the related genes (Fig. 6C). About 69.23% of eQTLs tially disrupt the Meis1(Homeobox) motif, which strongly located within 100 kb of their related genes. The eQTLs were suggested that SNP rs10857795 might play a regulatory role highly enriched in the promoter, exon, and intron regions to GSTM1 gene and consequently affect the risk of NSCLC. (Fig.6D).Forexample,weidentified 17 eQTLs associated with We further intersected the identified QTLs with lung cancer– HORMAD1 gene, which played a key role in cell-cycle regula- related genome-wide association study (GWAS) significant loci tion and has been reported to be a potential marker in multiple (Supplementary Materials and Methods). A total of seven eQTLs, cancers (43, 44). These 17 eQTLs were highly correlated, 83 ATAC-QTLs, and three joint-QTLs were located within the dividing the samples into two haplotypes, and significantly linkage disequilibrium (LD) blocks of lung cancer–related GWAS higher HORMAD1 expression was found in samples with significant loci (Supplementary Table S12). Compared with SNP- mutant genotype (Fig. 7A). containing open regions, ATAC-QTL–containing open regions Apart from eQTLs, we identified 11,265 ATAC-QTLs associated overlapped with a higher percentage of LD blocks associated with with 11,000 open chromatin regions (Fig. 6A; Supplementary squamous cell lung carcinoma (P ¼ 0.024, permutation test), Materials and Methods). As expected, the ATAC-QTLs tended to urinary 1,3-butadiene metabolite levels in smokers (an important locate close to their related open chromatin regions (Supplemen- carcinogen in tobacco smoke; P ¼ 0.001, permutation test), tary Fig. S6B). ATAC-QTLs were enriched in the 30UTR, 50UTR, response to taxane treatment (docetaxel; P ¼ 0.002, permutation promoter, and exon regions (Supplementary Fig. S6C). A total of test), and multiple cancers (lung cancer, noncardiac gastric cancer, 41 ATAC-QTLs were located within their associated ATAC-seq and esophageal squamous cell carcinoma; P ¼ 0.021, permuta- peaks, which we termed as local-ATAC-QTLs. The remaining tion test; Supplementary Fig. S6E; Supplementary Materials and 11,244 ATAC-QTLs were correlated with distal ATAC-seq peaks, Methods). Similarly, eQTL-containing open regions overlapped which we termed as distal-ATAC-QTLs (Fig. 6A). Interestingly, 20 with a higher percentage of LD blocks associated with lung cancer of the 41 local-ATAC-QTLs also correlated with distal ATAC-seq (P ¼ 0.009, permutation test), lung carcinoma (P ¼ 0.001, peaks, which suggested likely long-range regulatory interaction permutation test), cancer (pleiotropy; P ¼ 0.001, permutation (Fig. 6A). To establish the relationship between eQTLs and ATAC- test), and urinary 1,3-butadiene metabolite levels in smokers (P ¼ QTLs in NSCLC, we intersected the two types of QTL and iden- 0.001, permutation test; Supplementary Fig. S6F; Supplementary tified 21 joint quantitative trait loci (joint-QTL), accounting for Materials and Methods). These GWAS-associated QTLs could 27% of all eQTLs (Supplementary Fig. S6D). eQTLs were enriched potentially affect gene regulatory networks in NSCLC through in ATAC-QTLs (P ¼ 2.509e-13; fold enrichment ¼ 10.12, Fisher genomic alterations, which could only be unveiled through the exact test), suggesting the possible mediating role of open chro- combination of GWAS and multiomics sequencing. A case in matin regions in gene regulation. The joint-QTL might potentially point is the joint-QTL, rs140313, which was located within the LD affect gene expression through their associated open chromatin block of the significant loci associated with 1,3-butadiene metab- regions. The 21 joint-QTLs were associated with 362 ATAC-seq olism and detoxification phenotype (48). This joint-QTL corre- peaks and 16 genes (Fig. 7B; Supplementary Table S11). The joint- lated with the intensity of four ATAC-seq peaks upstream of the QTLs were likely to be functionally relevant loci that were ignored GSTT1 gene and the expression levels of the GSTT1 gene (Fig. 7D).

Figure 7. Regulation network in patients with lung cancer. A, Browser plot shows overlaid ATAC-seq signals of mutation-harboring samples (Alt, blue) and mutation-free samples (Ref; orange). Red boxes, eQTLs associated with HOMARD1. Heatmap shows the genotype of two groups. Boxplot shows gene expression of two groups. B, Circos plot shows QTLs and associated genes. The innermost circle (track 1) shows linkage disequilibrium blocks of lung cancer–related traits. The

middle two circles (track2 and track3) are circular Manhattan plots of e-QTLs and ATAC-QTLs. Y-axis denotes –log10(P) of QTLs. All three tracks are color coded as shown on the left. Orange shaded regions highlight joint-QTLs that are associated with genes marked in the center of the circos. The outermost circle shows an ideogram of the human chromosomes. C, Browser plot shows overlaid ATAC-seq signals of the two groups (GG, reference genotype; AA, mutant genotype). Red, open chromatin region containing joint-QTL; green, joint-QTL associated open chromatin regions. Position weight matrix for Meis1 (Homeobox) is shown in the middle; the 8th position corresponded to the location of the motif altering SNV (rs10857795, joint-QTL). The proportion of samples with two genotypes is shown in the pie chart, and gene expression of two genotypes is shown in the boxplot. D, The fan chart shows the zoomed-in eQTL and ATAC-QTL analysis of chr22 with a binning size of 700 kb. The regulatory network of this joint-QTL is shown at the top right (gray, GWAS lead SNV; orange, joint-QTL SNV; green, associated gene). The distance between joint-QTL and its associated gene or open chromatin regions are marked near ledges. Individuals were separated according to two genotypes (reference genotype CC and mutant genotype TT). Boxplot in the middle shows the gene expression of the two groups. Browser plot in bottom shows overlaid ATAC-seq signals of the two groups. E, The fan chart shows the zoomed-in eQTL and ATAC-QTL analysis of chr6 with a binning size of 220 kb. The regulatory network of NSCLC-derived joint-QTL is shown at the bottom right (gray, GWAS lead SNV; orange, joint-QTL SNV; green, associated gene). Distance between joint-QTLs and their associated genes or open chromatin regions are marked near ledges. Individuals were separated according to two genotypes (reference genotype AA and mutant genotype AG). Boxplot in the middle shows the gene expression of the two groups. Browser plot in the top shows overlaid ATAC-seq signals of the two groups.

OF12 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

While GSTT1 gene was reported to be associated with lung cancer It is rational to hypothesize that broad open chromatin peaks risk (48), and previous studies had reported the correlation in NSCLC could be associated with critical genes, as they between GSTT1 deletion and carcinogenesis detoxification and indicate more active gene regulatory relationship. We then antineoplastic drug therapy response (45, 47), our analysis identified genes associated with broad ATAC-seq peaks high suggested a regulatory risk loci link to this gene. To validate frequency in our samples. These genes were indeed enriched in the regulatory interaction of the predicted joint-QTL-to-gene known NSCLC-related genes, which suggested that broad links, we applied the CRISPR/Cas9 cassette method (Supple- ATAC-seq peaks could potentially serve as markers to identify mentary Materials and Methods; ref. 49) to replace the key genes in NSCLC. On the other hand, we observed some sequence of the QTL with the alternative genotype (Supple- extremely wide open chromatin regions, such as open chro- mentary Fig. S7A). Consistent with the analysis, CRISPR/Cas9 matin regions that extended the whole gene body of EGFR. The cassette treatment of joint-QTL, rs10857795, in which cells regions could be the consequence of large genomic rearrange- gained G>A mutation (Supplementary Fig. S7B), led to signif- ment (i.e., CNV) and cause severe misregulation of target genes icant upregulation of GSTM1 gene expression in H1299 NSCLC in NSCLC. These extreme abnormal genomic open chromatin cell line (Supplementary Fig. S7C). Similarly, CRISPR/Cas9 signals could also indicate malfunction components of the cassette treatment of joint-QTL, rs140313, where cells gained gene regulatory network in NSCLC. C>T mutation (Supplementary Fig. S7D), led to significant Importantly, through the integration of genome sequence, upregulation of GSTT1 gene expression in H1299 NSCLC cell ATAC-seq, and transcriptome data, we could directly study how line (Supplementary Fig. S7E). Besides, in another example, we the alteration of genome sequence on regulatory elements identified two joint-QTLs (rs17079286 and rs73766221), affected the gene regulatory network of NSCLC. To this end, which were located within the LD blocks of GWAS significant we analyzed two types of genomic variations, that is, somatic loci associated with lung cancer (rs9387478; ref. 50) and lung CNV and germline SNV. Both ATAC-seq peak intensity and adenocarcinoma (rs9387479; ref. 51), correlated to the inten- gene expression levels positively correlated with the copy sity of an ATAC-seq peak upstream of ROS1 gene and the number of CNV region they were located on, indicating that expression levels of ROS1 gene (Fig. 7E). The ROS1 gene was active regulatory elements might gain or loss together with a known NSCLC driver gene, whose mutation, rearrangement, CNV. While previous studies proposed that CNV might alter and fusion were reported to be well associated with gene expression by affecting gene dosage or gene regulatory NSCLC (50). Our analysis suggested that the genomic mutation elements, we found that in 99.7% of the cases, CNV fragments on the regulatory elements, which affect the expression of ROS1 carried gene body together with their regulatory elements. In gene, could also be an important factor in NSCLC network terms of germline SNV, we identified 76 eQTLs and about 30% malfunction. of them were also ATAC-QTLs. The expression and ATAC joint- QTLs were more likely to be causal SNVs that affected gene expression through regulatory elements. We further intersected Discussion the QTLs with known lung cancer–related GWAS significant Previous study on NSCLC has extensively characterized geno- loci and identified three likely causal SNVs that were the mic features, such as sequence mutation, gene expression, and previous challenging to pinpoint. epigenetic markers. However, the chromatin accessibility of To sum up, we generated multiomics data from the tumor NSCLC has not been characterized as clearly so far. In this study, tissue of NSCLC population and showed the potential of this type we analyzed ATAC-seq profiles, together with WGS and transcrip- of data in elucidating gene regulatory networks of NSCLC. The tome sequencing in the primary tumor tissue of 50 patients. We sample size still limited the power of our analysis. In the future, found high heterogeneity of chromatin accessibility among the study would be improved with a larger population or by patients, which reflected the high interpatient heterogeneity of incorporating more types of omics data. NSCLC genome sequence. Given the heterogeneity of chromatin accessibility, however, the ATAC-seq profiles could, to some Disclosure of Potential Conflicts of Interest extent, serve as molecular markers to classify NSCLC samples. No potential conflicts of interest were disclosed. On the basis of global open chromatin peaks, we identified lung adenocarcinoma- and LUSC-specific clusters and further stratified Authors' Contributions lung adenocarcinoma samples into three subclusters. Although Conception and design: W. Luo, S. Dai, Q. Zhou, D. Xie, W. Li we found a correlation between the clusters and clinical para- Development of methodology: S. Dai, D. Xie meters such as tumor stage and smoking, it was challenging to Acquisition of data (provided animals, acquired and managed patients, conclude due to small sample size and potential confounding provided facilities, etc.): Z. Wang, W. Luo, J. Tang, Y. Zhou, J. Zhang, F. Cao, S. Dai, P. Tian, Y. Wang, L. Liu, G. Che, Q. Zhou effects. Beyond that, we performed single-cell ATAC-seq technol- Analysis and interpretation of data (e.g., statistical analysis, biostatistics, ogy in 1 patient, which identified different cell types in tumor computational analysis): K. Tu, L. Xia, K. Luo, K. Lu, X. Hu, Y. He, W. Qiao, microenvironment. We discovered a positive correlation between S. Dai, D. Xie intra- and intertumor heterogeneity, suggesting that it is possible Writing, review, and/or revision of the manuscript: Z. Wang, K. Tu, L. Xia, to explore tumor heterogeneity intra- and interpatient using open K. Luo, S. Dai, Y. Wang, Q. Zhou, D. Xie chromatin signature. Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): W. Luo, S. Dai, P. Tian, L. Liu, G. Che Other than the position of open chromatin peaks on the Study supervision: S. Dai, Q. Zhou, D. Xie, W. Li genome, the width of the open chromatin peak could also indicate key genes in NSCLC. Previous study has proven that Acknowledgments broad open chromatin peaks of active regulatory histone mod- We thank Dr. Xin He from the University of Chicago for the valuable ification regions associated with cell type–specific active genes. discussion. This work was supported by grant nos. 91631111, 31571327,

www.aacrjournals.org Cancer Res; 2019 OF13

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Wang et al.

and 31771426 from Chinese National Natural Science Foundation to D. Xie; The costs of publication of this article were defrayed in part by the payment of the National Natural Science Foundation of China (81871890) and Trans- page charges. This article must therefore be hereby marked advertisement in formation Projects of Sci-Tech Achievements of Sichuan Province accordance with 18 U.S.C. Section 1734 solely to indicate this fact. (2016CZYD0001) to W. Li; China Postdoctoral Science Foundation (2017M623043) to Z. Wang; and Sci-Tech Support Program of Science and Received December 1, 2018; revised April 12, 2019; accepted June 10, 2019; Technology Department of Sichuan Province (2016SZ0073) to P. Tian. published first June 17, 2019.

References 1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin 22. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, et al. 2018;68:7–30. Control-FREEC: a tool for assessing copy number and allelic content using 2. Sakashita S, Sakashita M, Sound Tsao M. Genes and pathology of non- next-generation sequencing data. Bioinformatics 2012;28:423–5. small cell lung carcinoma. Semin Oncol 2014;41:28–39. 23. Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. 3. Berger AH, Brooks AN, Wu X, Shrestha Y, Chouinard C, Piccioni F, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of High-throughput phenotyping of lung cancer somatic mutations. focal somatic copy-number alteration in human cancers. Genome Biol Cancer Cell 2017;32:884. 2011;12:R41. 4. Widschwendter M, Jones A, Evans I, Reisel D, Dillner J, Sundstrom K, et al. 24. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Epigenome-based cancer risk prediction: rationale, opportunities and et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008; challenges. Nat Rev Clin Oncol 2018;15:292–309. 9:R137. 5. Brzezianska E, Dutkowska A, Antczak A. The significance of epigenetic 25. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al. alterations in lung carcinogenesis. Mol Biol Rep 2013;40:309–25. GREAT improves functional interpretation of cis-regulatory regions. 6. Park SM, Choi EY, Bae M, Kim S, Park JB, Yoo H, et al. Histone variant Nat Biotechnol 2010;28:495–501. H3F3A promotes lung cancer cell migration through intronic regulation. 26. Gehring JS, Fischer B, Lawrence M, Huber W. SomaticSignatures: inferring Nat Commun 2016;7:12914. mutational signatures from single-nucleotide variants. Bioinformatics 7. Anastasiadou E, Jacob LS, Slack FJ. Non-coding RNA networks in cancer. 2015;31:3673–5. Nat Rev Cancer 2018;18:5–18. 27. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, 8. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition et al. Clock-like mutational processes in human somatic cells. Nat Genet of native chromatin for fast and sensitive epigenomic profiling of open 2015;47:1402–7. chromatin, DNA-binding proteins and nucleosome position. Nat Methods 28. Cancer Genome Atlas Research NetworkComprehensive molecular pro- 2013;10:1213–8. filing of lung adenocarcinoma. Nature 2014;511:543–50. 9. Corces MR, Buenrostro JD, Wu B, Greenside PG, Chan SM, Koenig JL, 29. Cancer Genome Atlas Research NetworkComprehensive genomic charac- et al. Lineage-specific and single-cell chromatin accessibility charts terization of squamous cell lung cancers. Nature 2012;489:519–25. human hematopoiesis and leukemia evolution. Nat Genet 2016;48: 30. Benayoun BA, Pollina EA, Ucar D, Mahmoudi S, Karra K, Wong ED, et al. 1193–203. H3K4me3 breadth is linked to cell identity and transcriptional consistency. 10. Beekman R, Chapaprieta V, Russinol N, Vilarrasa-Blasi R, Verdaguer-Dot N, Cell 2014;158:673–88. Martens JHA, et al. The reference epigenome and regulatory chromatin 31. Travis WD, Brambilla E, Nicholson AG, Yatabe Y, Austin JHM, Beasley MB, landscape of chronic lymphocytic leukemia. Nat Med 2018;24:868–80. et al. The 2015 World Health Organization classification of lung tumors: 11. Rendeiro AF, Schmidl C, Strefford JC, Walewska R, Davis Z, Farlik M, et al. impact of genetic, clinical and radiologic advances since the 2004 classi- Chromatin accessibility maps of chronic lymphocytic leukaemia identify fication. J Thorac Oncol 2015;10:1243–60. subtype-specific epigenome signatures and transcription regulatory net- 32. Linhares MM, Affonso RJ Jr, Viana Lde S, Silva SR, Denadai MV, de works. Nat Commun 2016;7:11938. Toledo SR, et al. Genetic and immunohistochemical expression of 12. Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, et al. The integrins ITGAV, ITGA6, and ITGA3 as prognostic factor for colorectal chromatin accessibility landscape of primary human cancers. Science cancer: models for global and disease-free survival. PLoS One 2015;10: 2018;362:eaav1898. doi: 10.1126/science.aav1898. e0144333. 13. Lambrechts D, Wauters E, Boeckx B, Aibar S, Nittner D, Burton O, et al. 33. JiaY,YingX,ZhouJ,ChenY,LuoX,XieS,etal.ThenovelKLF4/PLAC8 Phenotype molding of stromal cells in the lung tumor microenvironment. signaling pathway regulates lung cancer growth. Cell Death Dis 2018; Nat Med 2018;24:1277–89. 9:603. 14. Puram SV, Tirosh I, Parikh AS, Patel AP, Yizhak K, Gillespie S, et al. Single- 34. Park JW, Lee JK, Sheu KM, Wang L, Balanis NG, Nguyen K, et al. Repro- cell transcriptomic analysis of primary and metastatic tumor ecosystems in gramming normal human epithelial tissues to a common, lethal neuro- head and neck cancer. Cell 2017;171:1611–24. endocrine cancer lineage. Science 2018;362:91–5. 15. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for 35. Ciardiello F, Tortora G.EGFR antagonists in cancer treatment. N Engl J Med Illumina sequence data. Bioinformatics 2014;30:2114–20. 2008;358:1160–74. 16. Li H, Durbin R.Fast and accurate short read alignment with Burrows- 36. Jaiswal BS, Kljavin NM, Stawiski EW, Chan E, Parikh C, Durinck S, et al. Wheeler transform. Bioinformatics 2009;25:1754–60. Oncogenic ERBB3 mutations in human cancers. Cancer Cell 2013;23: 17. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The 603–17. sequence alignment/map format and SAMtools. Bioinformatics 2009;25: 37. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, et al. 2078–9. Relative impact of nucleotide and copy number variation on gene expres- 18. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, sion phenotypes. Science 2007;315:848–53. et al. The genome analysis toolkit: a MapReduce framework for analyzing 38. DeBoever C, Li H, Jakubosky D, Benaglio P, Reyna J, Olson KM, et al. Large- next-generation DNA sequencing data. Genome Res 2010;20:1297–303. scale profiling reveals the influence of genetic variation on gene expression 19. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, in human induced pluripotent stem cells. Cell Stem Cell 2017;20:533–46. et al. Sensitive detection of somatic point mutations in impure and 39. Yuan TL, Cantley LC.PI3K pathway alterations in cancer: variations on a heterogeneous cancer samples. Nat Biotechnol 2013;31:213–9. theme. Oncogene 2008;27:5497–510. 20. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. 40. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. VarScan 2: somatic mutation and copy number alteration discovery in Systematic localization of common disease-associated variation in regu- cancer by exome sequencing. Genome Res 2012;22:568–76. latory DNA. Science 2012;337:1190–5. 21. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic 41. GTEx Consortium. Human genomics. The genotype-tissue expression variants from high-throughput sequencing data. Nucleic Acids Res 2010; (GTEx) pilot analysis: multitissue gene regulation in humans. Science 38:e164. 2015;348:648–60.

OF14 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

42. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, et al. 47. Moyer AM, Salavaggione OE, Hebbring SJ, Moon I, Hildebrandt MA, Understanding mechanisms underlying human gene expression variation Eckloff BW, et al. Glutathione S-transferase T1 and M1: gene sequence with RNA sequencing. Nature 2010;464:768–72. variation and functional genomics. Clin Cancer Res 2007;13:7207–16. 43. ShahzadMM,ShinYH,MatsuoK,LuC,NishimuraM,ShenDY,etal. 48. Boldry EJ, Patel YM, Kotapati S, Esades A, Park SL, Tiirikainen M, et al. Biological significance of HORMA domain containing protein 1 Genetic determinants of 1,3-butadiene metabolism and detoxification in (HORMAD1) in epithelial ovarian carcinoma. Cancer Lett 2013; three populations of smokers with different risks of lung cancer. 330:123–9. Cancer Epidemiol Biomarkers Prev 2017;26:1034–42. 44. Watkins J, Weekes D, Shah V, Gazinska P, Joshi S, Sidhu B, et al. Genomic 49. Hsu PY, Hsu HK, Hsiao TH, Ye Z, Wang E, Profit AL, et al. Spatiotemporal complexity profiling reveals that HORMAD1 overexpression contributes to control of estrogen-responsive transcription in ERa-positive breast cancer homologous recombination deficiency in triple-negative breast cancers. cells. Oncogene 2016;35:2379–89. Cancer Discov 2015;5:488–505. 50. Lan Q, Hsiung CA, Matsuo K, Hong YC, Seow A, Wang Z, et al. Genome- 45. McIlwain CC, Townsend DM, Tew KD. Glutathione S-transferase wide association analysis identifies new lung cancer susceptibility loci in polymorphisms: cancer incidence and therapy. Oncogene 2006;25: never-smoking women in Asia. Nat Genet 2012;44:1330–5. 1639–48. 51. McKay JD, Hung RJ, Han Y, Zong X, Carreras-Torres R, Christiani DC, et al. 46. Ford JG, Li Y, O'Sullivan MM, Demopoulos R, Garte S, Taioli E, et al. Large-scale association analysis identifies new lung cancer susceptibility Glutathione S-transferase M1 polymorphism and lung cancer risk in loci and heterogeneity in genetic susceptibility across histological sub- African-Americans. Carcinogenesis 2000;21:1971–5. types. Nat Genet 2017;49:1126–32.

www.aacrjournals.org Cancer Res; 2019 OF15

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

The Open Chromatin Landscape of Non−Small Cell Lung Carcinoma

Zhoufeng Wang, Kailing Tu, Lin Xia, et al.

Cancer Res Published OnlineFirst June 17, 2019.

Updated version Access the most recent version of this article at: doi:10.1158/0008-5472.CAN-18-3663

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2020/06/18/0008-5472.CAN-18-3663.DC1

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/early/2019/08/08/0008-5472.CAN-18-3663. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research.