Supplementary Information Title
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary material Gut Supplementary information Title: Meta-analysis of genome-wide association studies and functional assays decipher susceptibility genes for gastric cancer in Chinese Caiwang Yan1,2,3†, Meng Zhu1,2†, Yanbing Ding4†, Ming Yang5†, Mengyun Wang6,7†, Gang Li8†, Chuanli Ren9†, Tongtong Huang1, Wenjun Yang10, Bangshun He11, Meilin Wang2,12, Fei Yu1, Jinchen Wang1, Ruoxin Zhang6,7,Tianpei Wang1, Jing Ni1, Jiaping Chen1,2, Yue Jiang1,2, Juncheng Dai1,2, Erbao Zhang1,2, Hongxia Ma1,2, Yanong Wang13, Dazhi Xu13, Shukui Wang2,11, Yun Chen2,14, Zekuan Xu2,15, Jianwei Zhou2,16, Guozhong Ji2,17, Zhaoming Wang18, Zhengdong Zhang2,12, Zhibin Hu1,2,3, Qingyi Wei6,19,20, Hongbing Shen1,2, Guangfu Jin1,2,3* *Corresponding to: Guangfu Jin, Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing, 211166, China. Tel: +86-25-8686-8397, Fax: +86-25-8686-8499. E-mail: [email protected]. Supplementary information includes: Supplementary Text (Pages 2-9) Supplementary Figures 1-24 (Pages 10-35) Supplementary Tables 1-17 (Pages 36-64) 1 Yan C, et al. Gut 2019; 0:1–11. doi: 10.1136/gutjnl-2019-318760 Supplementary material Gut Supplementary Text Subjects of GWASs Onco-GWAS: We recruited gastric cancer cases histopathologically confirmed from hospitals in Jiansu province, China. The cancer-free control subjects were selected from individuals receiving routine physical examination at hospitals or those participating in community screening for non-communicable diseases in Jiangsu province. A total of 1,140 cases and 345 controls were genotyped using the Illumina OncoArray 1, and 708 controls were genotyped using the Illumina OmniZhongHua chips. NJ-GWAS and BJ-GWAS: These two studies recruited participants from Nanjing (565 cases and 1162 controls) and Beijing (468 cases and 1123 controls), respectively. All participants were genotyped by using the Affymetrix Genome-Wide Human SNP Array 6.0. The details of study design and relevant data were reported previously 2. SX-GWAS: A total of 1,625 gastric cancer cases and 2,100 controls were from the Shanxi Upper Gastrointestinal Cancer Genetics Project and the Linxian Nutrition Intervention Trial. All participants were genotyped by using the Illumina 660W Quad chip. The study was reported elsewhere 3, and the public available data was approved and downloaded from dbGap (https://www.ncbi.nlm.nih.gov/projects/gap/, study accession number: phs000361.v1.p1). Quality control procedures for GWASs We performed quality control procedures on genotyping and subjects using the same protocol for all four GWAS datasets. Firstly, the genotyped variants were excluded if they had a call rate of <95%, a P value for Hardy–Weinberg Equilibrium (HWE) in controls ≤1.0×10−6 or a minor allele frequency (MAF) of <1% in controls. The variants were flipped to forward strand by comparing alleles or checking allele frequency, while ambiguous (A/T or C/G with MAF>0.45) were also excluded. Secondly, subjects were removed before imputation if they were with call rates of <95%, outliers (>6 s.d. from the mean) in population stratification analysis and heterozygosity analysis, or duplicated or related individuals (PI_HAT >0.25). Imputation We used SHAPEIT (v. 2.12) for phasing and IMPUTE2 (v. 2.3.1) for imputation, with a merged reference panel (n=2,504) of the 1000 Genomes Project Phase III (October 2014 release). In particular, we re-imputed the HLA region (Chr6:28Mb-34Mb) using SNP2HLA with a Chinese reference panel of 10,689 samples 4. The imputed variants were excluded from subsequent association analysis if they were poorly imputed (info <0.5), had low-frequency (MAF<0.01) or deviated from HWE (P <1×10-4). As a result, 6.23 to 7.34 million qualified variants were retained in individual GWAS dataset, and a total of 6,192,596 shared variants were presented for genetic association analysis. Of note, 661 subjects were also genotyped using the ExomeArray previously 5, and the overall consistence rate was 97.80% for the genotypes of 20,792 variants. Target sequencing We performed targeted region resequencing for chromosomes 1q22 (170kb), 5p13.1 (216kb) and 10q23.33 (162kb) in 200 GC cases and 300 controls. Three regions based on linkage disequilibrium (LD) block were captured using custom-designed probes based on the SureSelect Target Enrichment Platform (Agilent Technologies, Santa Clara, CA). The captured libraries were sequenced on a Genome Analyzer IIx 2 Yan C, et al. Gut 2019; 0:1–11. doi: 10.1136/gutjnl-2019-318760 Supplementary material Gut (Illumina, San Diego, CA). Low-quality reads and adapters were filtered with FASTQ Quality Filter Tool, and the qualified reads were aligned to the human genome reference (hg19) using Burrows Wheeler Aligner (BWA) V.0.6. Duplicates were marked with Picard Tools, and base qualities were recalibrated using Genome Analysis Toolkit (GATK, V.3.7). We called variants using GATK and Freebayes simultaneously and only considered the variants identified by both tools. For quality control, we removed 20 cases and 17 controls because of low depth (< 10× across samples) or low consistence rate of genotypes (<90%) as compared with chips in NJ-GWAS. A total of 3,066 variants (2,860 SNVs and 206 INDELs) were identified through the target sequencing, of which 805 were filtered out because of low call rate (<90%) or deviation from HWE with P< 1.0×10−4 in controls. As a result, a total of 881 variants in 1q22, 722 variants in 5p13.1, and 658 variants in 10q23.33 were available for further genetic association analysis in 180 cases and 283 controls. Functional annotations We performed functional annotation for variants in the flanking region (1 Mb upstream and downstream) of the lead variants rs6897169 at 5p13.1 and rs10509671 at 10q23.33. Genetic variants highly correlated with the lead variants (r2> 0.6, based on the CHB populations from the 1000 Genomes Project Phase 3 data) were first annotated by ANNOVAR 6. Sorting Intolerant from Tolerant (SIFT) 7 and Polymorphism Phenotyping v.2 (PolyPhen-2) 8 were applied to predict the effects of the coding variants on the protein structure and function. ChIP-seq data from stomach mucosa for histone modification markers (H3K4me1 and H3K4me3) were obtained from the Roadmap epigenomics database 9, while ChIP-seq data for H3K4me1, H3K4me3 and H3K27ac in fetal stomach tissues were obtained from the Gene Expression Omnibus (GEO: GSM1102794, GSM1102800 and GSM1102783). DNaseI hypersensitive site (DHS) peaks from DNase-seq data for the fetal stomach were taken from four individuals (DS17963, H-23887, H-24342 and H-24510) of the Encyclopedia of DNA Elements (ENCODE) database. These data were visualized using the University of California Santa Cruz (UCSC) genome browser 10. We performed expression quantitative trait loci (eQTL) analysis using the data from 237 normal stomach tissue samples in the Genotype-Tissue Expression project (GTEx) 11. Tissue samples A total of 114 gastric biopsy tissues were obtained from 30 superficial gastritis, 32 atrophic gastritis, 23 intestinal metaplasia and 29 dysplasia subjects from the Affiliated Hospital of Yangzhou University, Yangzhou, China. The precancerous lesion samples were obtained during endoscopic examination and frozen immediately at −80 °C. None of the individuals used antibiotics within 2months prior to the collection of biopsy samples. Subjects did not take proton pump inhibitors for at least twoweeks before sample collection. All subjects provided an informed consent for obtaining study specimens, and the study was approved by the institutional review board of Nanjing Medical University. These samples were used for determination of PRKAA1 mRNA expression levels. Differential expression analysis We also obtained mRNA expression and genotypic data for the gastric cancer samples from The Cancer Genome Atlas (TCGA) as of April 8, 2015. The normalized expectation-maximization read counts were available for 415 samples, including 32 3 Yan C, et al. Gut 2019; 0:1–11. doi: 10.1136/gutjnl-2019-318760 Supplementary material Gut paired samples (tumors with adjacent normal tissues). The independent sample t-test (415 tumor samples vs 32 normal tissues) and paired sample t-test (32 paired samples) were used to examine differences in gene expression between the tumors and adjacent normal tissues. Luciferase reporter assays Luciferase constructs were generated to include the potential regulatory elements encompassing corresponding variants (chr5: 40755192-40756091 for rs3805495, chr5: 40798425-40799224 for rs59133000, chr5: 40835445-40836094 for rs10065570, chr10: 96052401-96053300 for region 1, chr10:96074801-96075600 for region 2, hg19). Sequences were cloned into the pGL3-Basic or pGL3-Promoter Vector (Promega, Madison, WI, USA). The constructed plasmids were sequenced to confirm the accuracy. 7.5 × 104 cells were plated into 24-well plates and co-transfected the next day with reporter vectors and pRL-SV40 Renilla Luciferase Control Vector (Promega) into immortalized human gastric epithelial cell line GES1 and gastric cancer cell lines BGC823 and SGC7901 using Lipofectamine 2000 reagent (Invitrogen, Carlsbad, CA, USA). After 48 hours of culturing, the cells were lysed, and luciferase activity was measured using the Dual-Luciferase Reporter Assay System (Promega). Relative luminescent signals were calculated by normalizing luciferase signals with Renilla signals. Each cell line was used in 3 independent transfection experiments, and each experiment was performed in triplicate. TNF- treatment BGC823 cells were seeded into 60 mm cell culture dishes at the density of 5 × 105 cells per dish and treated with TNF- at various concentrations (0, 20, 100 or 200 ng/mL, an NF-B inducer, H8916, Sigma) for 1 or 3 h, respectively. Motif analysis Binding site analysis of transcription factors was done with the JASPAR 2018 12 and enhancer element locator (EEL) algorithm 13. To report only the most likely sites, stringent thresholds were applied, namely an 85 % relative profile score threshold for JASPAR set to “Homo sapiens” and an absolute cutoff threshold of 9 for EEL set to matrices downloaded from JASPAR database.