ARTICLE

Received 18 Jan 2016 | Accepted 31 Mar 2016 | Published 5 May 2016 DOI: 10.1038/ncomms11478 OPEN Common genetic variation in ETV6 is associated with colorectal cancer susceptibility

Meilin Wang1,2,3,4,*,**, Dongying Gu1,*, Mulong Du3,*, Zhi Xu1,*, Suzhan Zhang5,*, Lingjun Zhu6,*, Jiachun Lu7, Rui Zhang8, Jinliang Xing9, Xiaoping Miao10, Haiyan Chu3, Zhibin Hu2,11, Lei Yang7, Cuiju Tang1,LeiPan5, Haina Du6, Jian Zhao8, Jiangbo Du11, Na Tong3, Jielin Sun12, Hongbing Shen2,11, Jianfeng Xu12, Zhengdong Zhang2,3,** & Jinfei Chen1,2,**

Genome-wide association studies (GWASs) have identified multiple susceptibility loci for colorectal cancer, but much of heritability remains unexplained. To identify additional susceptibility loci for colorectal cancer, here we perform a GWAS in 1,023 cases and 1,306 controls and replicate the findings in seven independent samples from China, comprising 5,317 cases and 6,887 controls. We find a variant at 12p13.2 associated with colorectal cancer risk (rs2238126 in ETV6, P ¼ 2.67 Â 10 À 10). We replicate this association in an additional 1,046 cases and 1,076 controls of European ancestry (P ¼ 0.034). The G allele of rs2238126 confers earlier age at onset of colorectal cancer (P ¼ 1.98 Â 10 À 6) and reduces the binding affinity of transcriptional enhancer MAX. The mRNA level of ETV6 is significantly lower in colorectal tumours than in paired normal tissues. Our findings highlight the potential importance of genetic variation in ETV6 conferring susceptibility to colorectal cancer.

1 Department of Oncology, Nanjing First Hospital, Nanjing Medical University, Nanjing 210006, China. 2 Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing 211166, China. 3 Department of Genetic Toxicology, Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing 211166, China. 4 State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 210029, China. 5 Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China. 6 Department of Oncology, First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China. 7 Institute for Chemical Carcinogenesis, State Key Lab of Respiratory Disease, Guangzhou Medical University, Guangzhou 510182, China. 8 Department of Colorectal Surgery, Liaoning Cancer Hospital and Institute, Shenyang 110042, China. 9 Department of Cell Biology and Cell Engineering Research Center, State Key Laboratory of Cancer Biology, Xijing Hospital, Fourth Military Medical University, Xi’an 710032, China. 10 Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China. 11 Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, China. 12 Program for Personalized Cancer Care, NorthShore University Health System, Evanston, Illinois 60201, USA. * The authors contributed equally to this work. ** These authors jointly supervised this work. Correspondence and requests for materials should be addressed to J.C. (email: [email protected]) or to Z.Z. (email: [email protected]) or to M.W. (email: [email protected]).

NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478

olorectal cancer is the third most common cancer and the single-nucleotide polymorphisms (SNPs) and individuals, fourth leading cause of cancer-related mortality, 691,326 SNPs in 1,023 cases and 1,306 controls were selected for Ccomprising more than 1.2 million new cases and 0.6 further association analysis. Principal-component analysis (PCA) million deaths each year1. Colorectal cancer is a common revealed that all study subjects were Han Chinese, with modest complex disease caused by environmental and genetic factors evidence of population stratification in the study populations and their interactions. Twin and family studies have shown that (Supplementary Fig. 1). A quantile–quantile plot revealed that the inherited genetic factors play an essential role in the inflation factor (l) was 1.067 (Supplementary Fig. 2). predisposition to colorectal cancer and are responsible for A Manhattan plot for the association between each SNP and B35% of the colorectal cancer risk2. However, less than 5% of colorectal cancer risk is shown in Supplementary Fig. 3. Across total colorectal cancer cases are explained by particular high the genome, multiple loci showed suggestive evidence for penetrance , such as the DNA mismatch repair association, although no SNP exceeded the genome-wide genes, APC, SMAD4 and MUTYH3. Therefore, the remaining significance threshold with a Po5 Â 10 À 8. The association unidentified heritability may be attributable to common variants between known SNPs based on previously reported colorectal with low penetrance. cancer GWAS and colorectal cancer risk was evaluated Genome-wide association studies (GWASs) in populations of for all samples (Supplementary Table 2). Three variants European ancestry have revealed over 20 susceptibility loci (rs6691170, rs16892766 and rs3217810) were not polymorphic associated with colorectal cancer risk4–13.However,manyof among Asians. Among the other SNPs, 11 were significantly these variants show only weak or no effects among Asians, associated with colorectal cancer in the same direction as suggesting the presence of genetic heterogeneity between European described previously (Padditiveo0.05). The reported risk alleles and Asian ethnicities14,15. Recently, a GWAS of colorectal cancer of all those 11 SNPs were associated with increased risk for in East Asians identified 11 novel loci for colorectal cancer risk, colorectal cancer, with odds ratios (ORs) ranging between 1.14 indicating a genetic basis for colorectal cancer in East Asians as and 1.44. well16–19. However, these loci identified thus far account for only To examine the suggestive associations obtained from the B7.7% of the genetic risk of colorectal cancer among East GWAS stage, we selected 53 SNPs for the replication stage based 19 À 3 Asians . Therefore, to search for additional susceptibility regions on the following criteria: (i) the SNPs had Padditiveo1 Â 10 in for colorectal cancer in Asians, we undertook a multistage GWAS the GWAS stage; (ii) only one SNP with the lowest association across eight independent cohorts that included 14,533 Han was selected among multiple SNPs strong linkage disequilibrium Chinese subjects (Fig. 1). Here we report 12p13.2 as a new (LD) of r240.5; (iii) the SNPs displaying strong LD (r240.5) susceptibility locus for colorectal cancer and provide new insights with previously reported associated loci were excluded. The into the genetic aetiology of colorectal cancer. associations between the 53 selected SNPs and colorectal cancer risk are shown in Supplementary Table 3. Except for one SNP, rs929271, all of the selected SNPs were Results successfully genotyped in an additional case–control study New susceptibility locus for colorectal cancer. The chara- comprising 855 cases and 1,258 controls (Nanjing-2, cteristics of the included subjects in each study are summarized in China; Supplementary Table 4). Of the 52 SNPs analysed, Supplementary Table 1. After standard quality control for three SNPs (rs418410, rs3122160, and rs2238126) were nominally

Step 1, GWAS 900,015 SNPs Individual QC Platform: Illumina OmniZhongHua 1,049 cases and 1,315 controls Call rate <95% Sex discrepancies Identify-by-descent analysis

Quality control (QC) SNP QC Missing rate > 5% MAF < 0.05 HWE < 0.001 Non-autosomal chromosomes 691,326 SNPs 1,023 cases and 1,306 controls Principal-component analysis P < 10–3, CLUMP and LD Exclude previously reported loci

Step 2, Replication 1 53 SNPs Platform: Sequenom MassARRAY 855 cases and 1,258 controls

P < 0.05 with the same direction

Step 3, Replication 2 One SNP Platform: TaqMan assay 4,462 cases and 5,629 controls

Step 4, Combined P = 2.67 × 10–10 for rs2238126 meta-analysis 6,340 cases and 8,193 controls

Figure 1 | Summary of the study design and the results. A three-stage GWAS involving 1,049 cases and 1,315 controls was conducted in stage 1 and the most significant SNPs were followed up in two stages of replication including 5,317 cases and 6,887 controls.

2 NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478 ARTICLE significantly associated with colorectal cancer risk at Association analysis in European colorectal cancer GWAS.We Po0.05. However, only rs2238126 at 12p13.2 showed a also evaluated the association between rs2238126 and colorectal significant association consistent in direction with the cancer risk in an European population of 1,046 cases and 1,076 À 3 GWAS stage (Padditive ¼ 4.46  10 ). To confirm the controls from the Ontario Familial Colorectal Cancer Registry. As significance of this association, we genotyped rs2238126 in shown in Table 2, the rs2238126 G allele showed a significant risk additional Han Chinese populations, including 4,462 cases and effect in the same direction among Europeans (OR ¼ 1.19, 5,629 controls from six independent study centres (Wuhan, Padditive ¼ 0.034). The combined analysis of European and Guangzhou, Nanjing-3, Xi’an, Hangzhou and Shenyang). We Asian populations showed a stronger association, with a conducted a combined analysis of the initial GWAS and P value of 2.79  10 À 11. Nevertheless, the G allele frequency replication studies and found that the rs2238126 G allele had of rs2238126 in the European population differed considerably À 10 an increased risk of colorectal cancer (Padditive ¼ 2.67  10 , from that in the Chinese population. OR ¼ 1.17; Table 1). There was no significant heterogeneity 2 among the eight study groups (Phet ¼ 0.626, I ¼ 0; Supplementary Fig. 4). Potential regulatory role of rs2238126 on ETV6. The SNP To further characterize colorectal cancer-associated SNPs at rs2238126 lies in the intron of ETV6. RNA-Seq of 27 normal 12p13.2, we performed an imputation from the 1,000 Genomes tissues demonstrated different expression levels of ETV6 Project as a reference (Fig. 2). We measured the associations (Supplementary Fig. 7). Notably, moderate levels of ETV6 were between imputed SNPs (imputed r240.1, minor allele frequency expressed in colon tissues relative to other normal tissues. (MAF)40.05 and within 400 kb on either side of rs2238126) and Although a search of the ENCODE ChromHMM model from colorectal cancer risk and identified 37 additional SNPs that were GM12878 lymphoblastoid cells revealed weak evidence of significant at Padditiveo0.05 (Supplementary Table 5). However, rs2238126 residing in a regulatory motif (Fig. 2), further rs2238126 showed the strongest association, and no residual examination of chromatin immunoprecipitation (ChIP)- association with other SNPs was detected when controlling for sequencing (ChIP-seq) data suggested possible enhancer activities the effect of rs2238126 in this region. within the region encompassing rs2238126 in colorectal smooth We further investigated the effect of rs2238126 on colorectal muscle and HCT116 cells, on the basis of histone methylation cancer risk by a subgroup analysis (Supplementary Table 6). As marks and MAX binding (Supplementary Fig. 8). To determine shown in Supplementary Fig. 5, we did not observe significant the function of the rs2238126-containing enhancer in ETV6 differences between subgroups in age (Phet ¼ 0.729), sex regulation, we constructed enhancer luciferase reporter vectors (Phet ¼ 0.318), smoking status (Phet ¼ 0.537) or tumour site containing the rs2238126-centred region and the ETV6 promoter. (Phet ¼ 0.567). However, analysis of the age at diagnosis among The rs2238126 A allele revealed a significantly increased enhancer colorectal cancer cases revealed that individuals with the GG activity compared with that of the G allele, and both alleles genotype had a 2.2-year earlier age at diagnosis than those with resulted in significantly stronger activation relative to the ETV6 the AA genotype (Supplementary Fig. 6). Regression analysis promoter, suggesting that the rs2238126-centred region acts as an revealed that the rs2238126 G allele was significantly associated enhancer (Fig. 3a). Enhancer Element Locator (EEL) prediction with earlier age at onset of colorectal cancer (effect ¼À1.007 showed that rs2238126 directly affected a binding site for MAX year per allele, combined P ¼ 1.98  10 À 6; Supplementary (Fig. 3b). We also conducted an electrophoretic mobility shift Table 7). assay (EMSA) to distinguish the differences in binding affinity

Table 1 | Association of rs2238126 at 12p13.2 associated with colorectal cancer among individuals from eight Chinese study centres.

* w z y y || 2 SNP Allele Study group Population Sample size Genotypes MAF OR (95% CI) P-value Phet I Cases Controls Cases Controls Cases Controls rs2238126 A/G GWAS Nanjing-1 1,023 1,306 280/516/ 304/629/ 0.526 0.474 1.25 (1.10–1.43) 7.41 Â 10 À 4 227 373 Replication 1 Nanjing-2 855 1,258 228/425/ 292/615/ 0.523 0.478 1.20 (1.06–1.36) 4.46 Â 10 À 3 189 347 Replication 2 Replication 2a Wuhan 805 1,200 206/399/ 283/585/ 0.504 0.480 1.10 (0.97–1.25) 0.137 200 332 Replication 2b Guangzhou 1,179 1,334 300/620/ 287/682/ 0.517 0.471 1.26 (1.11–1.43) 2.57 Â 10 À 4 259 365 Replication 2c Nanjing-3 612 1,188 156/309/ 293/584/ 0.507 0.477 1.13 (0.98–1.29) 0.093 147 311 Replication 2d Xi’an 643 384 164/325/ 92/183/ 0.508 0.478 1.13 (0.95–1.35) 0.180 154 109 Replication 2e Hangzhou 511 647 146/246/ 154/314/ 0.526 0.481 1.19 (1.02–1.40) 0.032 119 179 Replication 2f Shenyang 712 876 180/358/ 200/ 0.504 0.481 1.08 (0.93–1.25) 0.336 174 443/233 Replication 2 4,462 5,629 0.511 0.477 1.15 (1.08–1.21) 2.72 Â 10 À 6 0.590 0 combined z All combined 6,340 8,193 0.515 0.477 1.17 (1.11–1.23) 2.67 Â 10 À 10 0.626 0

CI, confidence interval; GWAS, genome-wide association study; MAF, minor allele frequency; OR, odds ratio; SNP, single-nucleotide polymorphism. *Major/minor allele. wThe distribution of GG, GA and AA genotypes. zMAF of G allele. yOR, 95% CI and the corresponding P-values were derived from logistic regression analysis under an additive model with adjustment for top eigen, age and sex, where appropriate. ||P value for the heterogeneity. zGWAS and replication stages were combined by meta-analysis under a fixed-effects model.

NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478

Plotted SNPs

2 12p13.2: rs2238126 –10 r Combined P =2.67 × 10 100 10 0.8

0.6 Recombination rate (cM/Mb) 0.4 80 8 0.2

60 6 –value) P ( 10 4 GWAS P =7.41 × 10–4 40 –log

2 20

0 0

ETV6 BCL2L14

LRP6

11.6 11.8 12 12.2

Position on chr12 (Mb) Active promoter Weak promoter Inactive/poised promoter Strong enhancer D′ Strong enhancer Weak enhancer LD Weak enhancer Insulator Transcriptional transition Transcriptional elongation r 2 Weak transcribed Polycomb-repressed Heterochromatin, low signal Repetitive/CNV ChromHMM Repetitive/CNV

rs2238126

Figure 2 | Region association plot of rs2238126 at 12p13.2 for colorectal cancer. In the top panel of the region plot, the association results ( À log10 P)of both genotyped (circle) and imputed (diamond) SNPs in the GWAS samples are shown for SNPs in the region 400 kb upstream and downstream of rs2238126. Imputation was performed on this region using the 1,000 Genomes Project CHB and JPT data as a reference. The genes within the region of interest are indicated by arrows. The right y axis represents the recombination rate between the SNPs. The LD plots (D’ and r2) estimated based on the CHB and JPT populations are shown in the middle panel. The bottom panel represents the chromatin state segmentation track (ChromHMM) from GM12878 lymphoblastoid cells.

Table 2 | Association of colorectal cancer risk with rs2238126 at 12p13.2 in individuals of European and Asian populations combined.

SNP Allele* Study Population Sample size MAFw OR (95% CI)z P-valuez Cases Controls Cases Controls rs2238126 A/G OFCCR European 1,046 1,076 0.179 0.155 1.19 (1.01–2.12) 0.034 This study Asian 6,340 8,193 0.515 0.477 1.17 (1.11–1.23) 2.67 Â 10 À 10 y Meta-analysis 1.17 (1.12–1.23) 2.79 Â 10 À 11

CI, confidence interval; GWAS, genome-wide association study; MAF, minor allele frequency; OFCCR, ontario registry for studies of familial colorectal cancer; OR, odds ratio; SNP, single-nucleotide polymorphism. *Major/minor allele. wMAF of G allele. zAdditive model. 2 yResults were combined by meta-analysis using a fixed-effects model (Pheterogeneity ¼ 0.853, I ¼ 0).

À 3 between the rs2238126 A and G alleles to the transcription factor. (PANOVA ¼ 3.46 Â 10 , Supplementary Fig. 9) and BCL2L14 The results confirmed that the A allele had a higher binding (PANOVA ¼ 0.017) genes in colon tumour tissues but not in activity than the G allele (Fig. 3c). We further performed ChIP normal colon tissues (ETV6, PANOVA ¼ 0.169; BCL2L14, assay in HCT116 cells to verify that the rs2238126-containing PANOVA ¼ 0.578). To further evaluate whether other SNPs at region indeed bound the MAX in vivo (Fig. 3d). 12p13.2 act as eQTL for ETV6, we analysed the association We then performed an expression quantitative trait locus between SNPs surrounding rs2238126 and the expression levels (eQTL) study to determine whether rs2238126 correlates with the of ETV6 (Supplementary Fig. 10). Our analysis showed that 13 mRNA expression levels of nearby genes (500 kb genomic region SNPs were significantly associated with ETV6 expression, of centred on rs2238126), using the Cancer Genome Atlas (TCGA) which rs2855708 was the most significant eQTL SNP À 4 data of 434 colon adenocarcinoma tissues and 41 normal colon (PANOVA ¼ 5.34 Â 10 ). However, this association was no tissues. We found that rs2238126 was an eQTL for the ETV6 longer statistically significant after adjusting for rs2238126.

4 NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478 ARTICLE

a rs2238126 ETV6 b Enhancer Promoter Luciferase 16

90 14 80 * * Promoter 12 A + Promoter 70 10 * G + Promoter 60 * 8 50 6 40 4 30 EEL score of MAX affinity 2 20

Relative luciferase activity luciferase Relative 0 10 A allele G allele 0 HCT116 SW480

cdMarker MAX lgG Input Lane 1 2 3 4 5 6 7 8 Nuclear extract – + + + – + + + 106 bp

40 Shift band

30

20

10

Labelled probe A + + + +– ––– enrichment Relative Labelled probe G – – – – ++++ Unlabelled probe A – – 300× – – ––300× 0 Unlabelled probe G –––– – 300× 300× – MAX lgG

Figure 3 | The rs2238126 alleles affect the activity of enhancer MAX at the 12p13.2 locus. (a) A putative enhancer region flanking rs2238126 (chr12:12,009,241-12,010,241) with A or G alleles was cloned upstream of the ETV6 promoter-luciferase reporter vector. HCT116 and SW480 cells were transiently transfected with each of these constructs and assayed for luciferase activity after 24 h. The P-value was calculated with two-sided t-test. *Po0.001. (b) EEL analysis predicted the binding affinity of MAX to the rs2238126 alleles. (c) EMSA with biotin-labelled rs2238126 A or G probes and HCT116 nuclear extracts. Lanes 1 and 5 represent negative controls with probes only. The biotin-labelled rs2238126 A allele probe (lane 2) produced a much denser band of a specific DNA– complex (arrow) than the G allele probe (lane 6). The specific complex with rs2238126-labelled A probe can be partly competed by 300-fold unlabelled A probe (lane 3) or G probe (lane 4). The complex with the labelled G allele probe can be completely abolished by 300-fold unlabelled A probe (lane 8), but not G probe (lane 7). (d) ChIP and quantitative RT–PCR assays confirm that rs2238126 binds to MAX in HCT116 cells. Relative enrichment was calculated as a ratio of the signals from MAX or IgG to the signals from the input DNA.

Functional analyses of ETV6 in colorectal cancer. We measured whereas knockdown of ETV6 promoted proliferation. However, the ETV6 mRNA and protein expression levels in colorectal high or low ETV6 expression did not induce statistically cancer cell lines and observed that ETV6 expression was not significant cell cycle changes (Pt-test ¼ 0.115 in SW480 cells, detectable in the SW480 cell line (Fig. 4a). Next, we examined the Pt-test ¼ 0.103 in HCT116 cells and Pt-test ¼ 0.059 in HT29 cells for mRNA expression levels of ETV6 in 112 pairs of colorectal cancer G1 phase). Similarly, the of SW480, HCT116 and HT29 tumours and their adjacent normal tissues and found significantly cells was not significantly altered by ETV6 overexpression or decreased ETV6 expression in tumour tissues compared with knockdown (Supplementary Fig. 12). Consistent results were their adjacent normal tissues (PWilcoxono0.001; Fig. 4b). This found after transiently transfecting SW480 cells with the ETV6 result was also supported by the data from the independent overexpression vector (Supplementary Fig. 13). TCGA data, consisting of RNA-Seq of 41 paired colon tissues (Pt-test ¼ 0.034; Fig. 4c). We randomly selected 67 pairs of colorectal cancer patients for immunohistochemical staining for Cumulative effects of colorectal cancer susceptibility loci. Next, ETV6 and found that ETV6 was highly expressed in the we assessed the cumulative effects of SNPs significantly associated cytoplasm in tumours, whereas its expression in normal epithelial with colorectal cancer risk. The risk alleles were normally cells was primarily localized to the nuclei (Fig. 4e). We detected distributed between the colorectal cancer cases and controls, and greater expression of ETV6 in adjacent normal colorectal tissues the distribution of these alleles was significantly different than in corresponding tumour tissues (PWilcoxono0.001; Fig. 4d). (Po0.001; Supplementary Fig. 14). Individuals carrying multiple To characterize the functional mechanism of ETV6 in risk alleles exhibited a gradual increase in the risk of colorectal colorectal cancer, the ETV6 overexpression or short hairpin cancer compared with those carrying 0–15 risk alleles À 24 RNA (shRNA) knockdown vectors were stably transfected into (OR ¼ 1.41–5.09, Ptrend ¼ 2.34 Â 10 ), suggesting a cumulative SW480, HCT116 and HT29 cells. As shown in Supplementary effect of associated genetic variants on colorectal cancer risk Fig. 11, overexpression of ETV6 suppressed cellular growth, (Supplementary Table 8).

NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478

a 0.05 b c 0.20 P < 0.001 2,500 P = 0.034 0.04 2,000 0.15 0.03 1,500 0.02 0.10 1,000 relative expression relative relative expression relative 0.01 expression relative 0.05

ETV6 500 ETV6 0.00 ETV6 0.00 0

HT29 LoVo Tumour Normal Tumour Normal HCT116 SW480 SW620 kDa e ETV6 53 β-Actin 42 HCT116 SW480 HT29 SW620 LoVo

d P < 0.001 ×100 12 11 10 9 T N 8 7 6 5 4 ×200 3 2

Immunohistochemical score Immunohistochemical 1 0 Tumour Normal

Figure 4 | Expression of ETV6 in human colorectal cancer cell lines and clinical specimens. (a)TheETV6 mRNA (top) and protein (bottom) expression levels in five colorectal cancer cell lines. (b)TheETV6 mRNA expression levels were estimated in 112 pairs of colorectal cancer tissues (T) and their adjacent normal tissues (N). The P value was calculated using the Wilcoxon matched-pairs signed-rank test. (c)TheETV6 mRNA expression levels were analysed in paired colon tissues from 41 subjects from TCGA data. The P-values were determined using the paired t-test. (d) Semiquantitative analysis of the immunohistochemical staining intensity of 67 cancer tissues and corresponding adjacent normal tissues. (e) Representative immunohistochemical images of ETV6 protein expression in colorectal cancer tissue (left) and normal epithelial tissue (right). Top, Â 100 magnification; bottom, Â 200 magnification.

Gene relationships across implicated loci (GRAIL) analysis.We experiments on rs2238126 suggested that MAX is a regulatory performed a GRAIL analysis based on pathways previously enhancer transcription factor at the 12p13.2 locus. MAX has been defined in the literature to evaluate the connections between the characterized as a dimerization partner of MYC, which can genes located at all identified loci and the new susceptibility SNP induce cell-cycle progression and apoptosis23,24. MAX has rs2238126 (Supplementary Fig. 15). The identified connections multiple regulatory roles regarding histone decacetylases showed that there was a higher-than-expected degree of con- associated with activators and may participate in the 25,26 nectivity, with a significance of PGRAILo0.05 being observed for tumorigenesis process in colorectal cancer . Therefore, the TGFB1, SMAD7 and BMP4. rs2238126 in ETV6 presented a contribution of rs2238126 to the development of colorectal cancer weaker than expected connection with other genes reported in may result from the rs2238126 A allele preferentially binding previous GWAS of colorectal cancer. MAX over the G allele. The ETV6 protein contains two major domains, the ETS and HLH (helix–loop–helix) domains, which can be retained or lost at Discussion the site of the ETV6 breakpoint. ETV6 is known to act as a strong In this study, we used a three-stage genome-wide approach to transcriptional repressor in biological processes, including the identify associations between genetic variants and the risk of regulation of cell growth and differentiation27–29. In this study, colorectal cancer. We found a new colorectal cancer-associated we found a higher protein expression level of ETV6 in normal genetic locus rs2238126 at 12p13.2 in the Chinese population. colorectal tissues than in corresponding tumour tissues, which The locus has not been identified in previous colorectal cancer was consistent with the ETV6 mRNA expression results. The GWAS. Our study findings suggest that genetic variants at eQTL analysis from TCGA data also revealed that rs2238126 was 12p13.2 contribute to the development of colorectal cancer. an eQTL for the ETV6 and BCL2L14 genes in colon tumour. In The SNP rs2238126 at 12p13.2 is located in intron 4 of ETV6 addition to ETV6, rs2238126 at 12p13.2 lies 214 kb upstream of (also known as TEL), an ETS family transcription factor that is BCL2L14, which belongs to the BCL2 family and acts as anti- or essential for haematopoietic processes20,21. This ETS family pro-apoptotic regulators in a wide variety of cellular activities30. has been identified as a potential prognostic marker of colorectal Therefore, the possibility that rs2238126 affects the BCL2L14 gene cancer invasiveness and metastasis22. Functional annotations and is related to colorectal cancer risk cannot be completely revealed that rs2238126 mapped to a transcriptional enhancer- excluded. However, we failed to find an eQTL for the ETV6 and binding site for MAX. Reporter gene assay, EMSA and ChIP BCL2L14 genes in normal colon tissues. This result may be

6 NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478 ARTICLE explained by sample size limitation or other factors, such as Familial Colorectal Cancer4, which are part of the Genetics and Epidemiology mutations or loss of heterozygosity in tumour tissues compared Colorectal Cancer Consortium. The cases were confirmed incident colorectal with normal tissues. cancer cases ages 20–74 years, residents of Ontario identified through comprehensive registry and diagnosed between July 1997 and June 2000. Previous studies have identified that upregulation of ETV6 Population-based controls were randomly selected among Ontario, and matched attenuates proliferation and suppresses Ras-induced trans- by sex and 5-year age groups. Written informed consents were provided by all formation31. Consistently, our results revealed that the subjects. The study protocol was performed in accordance with the Institutional overexpression of ETV6 dramatically inhibited cell growth. Review Board of Nanjing Medical University. Based on these data, the ETV6 gene can be considered to be a susceptibility gene for colorectal cancer, although the detailed SNP genotyping and data quality controls in the GWAS stage. Genomic DNA molecular mechanisms underlying a regulatory role of ETV6 in was derived from EDTA-venous blood by using the Qiagen Blood Kit (Qiagen). colorectal cancer remain to be further elucidated. Genotyping for the GWAS stage was conducted using Illumina Human Omni ZhongHua Bead Chips for 900,015 SNPs. We used a uniform quality control We compared the genotypes of rs2238126 among 14 protocol to filter the samples and the SNPs. Four subjects who failed to reach a populations from the 1000 Genome Project (Supplementary genotype call rate of 95% were excluded. No sample was excluded because of sex Table 9). The MAF of rs2238126 was found to be heterogeneous discrepancies. An additional 27 samples were removed because of unexpected across these 14 populations (Supplementary Fig. 16). For duplications or genetic relatedness. SNPs were excluded based on the SNPs (i) did example, the frequency of minor allele G was 0.477 in the Han not map on an autosomal ; (ii) showed a MAFo0.05 in either the cases or the controls; (iii) displayed low call rate (o 95%) in all subjects or (iv) Chinese population, whereas the frequency of the G allele was violated from Hardy–Weinberg equilibrium (Pw2-testo0.001) in the controls. We only 0.212 in the CEU population. The difference in MAFs may assessed the population stratification and outliers using a PCA method. In total, have an effect on patterns of LD for index association SNPs and 30,456 common genotyped SNPs (MAF40.25) with relatively low LD (r2o0.1) causal SNPs between Asian and European individuals. Further were used to estimate the outliers based on PCA (4 samples were identified). studies among different ethnic groups are warranted to validate our findings. SNP selection and genotyping in the replication stages. To further confirm We further selected all identified loci associated with colorectal suggestive association in the GWAS stage, a subset of SNPs was selected for cancer reported by previous GWAS for GRAIL analysis replication by using CLUMP analysis implemented in PLINK. The selected SNPs required Po1 Â 10 À 3 in the GWAS stage and LD r2o0.5 between SNPs among (Supplementary Fig. 15). GRAIL analysis revealed 19 regions our samples. In total, 53 SNPs were retained in the replication 1 stage. with a significant score, including the strongest connections with Subjects in the replication 1 stage were genotyped using the Sequenom iPLEX TGF-b signalling pathway genes such as TGFB1, SMAD7 and MassARRAY assay. For quality control of genotyping, blinded duplicate samples BMP4, thus suggesting a pivotal role of this pathway in colorectal from two subjects and two negative control (water) samples were included in each cancer development. Notably, rs2238126 in ETV6 was not related plate. In the replication 2 stage of rs2238126 analysis, the samples were genotyped by TaqMan assays using the ABI 7900HT Real-time PCR System (Applied to previously implicated genes, thus supporting the role of ETV6 Biosystems). Quality control samples were also used in the TaqMan assays, as a potentially independent risk factor for colorectal cancer. including one negative control (water) and two duplicates to which investigators These results suggest that genetic markers can be useful in risk were blinded. All of the primers for the Sequenom assay are presented in prediction for colorectal cancer and that they are potential Supplementary Table 10. The genotyping cluster patterns for rs2238126 were examined to check high quality (Supplementary Fig. 17). Genotyping procedures therapeutic targets. were repeated by randomly selecting 5% of the participants, and the concordance In summary, we identified a previously unknown colorectal rate was 100%. cancer susceptibility locus in the ETV6 gene at 12p13.2. The observed consistent association of rs2238126 in European Imputation and regional association plotting. We imputed the non-genotyped populations provides convincing evidence for the novel colorectal SNPs based on the 1000 Genomes Project (Phase I, version 3, 1092 individuals) cancer locus. The SNP rs2238126 G allele may attenuate the using IMPUTE2 (ref. 37). A series of filtering criteria for the imputed SNPs were regulation of ETV6, which in turn is associated with increased implemented. Imputed SNPs were removed if they had (i) MAFo0.05; (ii) call risk of colorectal cancer, most likely by altering the binding rateo95% or (iv) Hardy–Weinberg equilibrium Po0.001. The association between affinity of transcriptional enhancer MAX (Fig. 5). Further genotype dosage data for imputed SNPs and colorectal cancer risk were analysed by the SNPTEST 2.5 program. Regional associations based on the results of the functional studies are warranted to clarify the biological role of genotyped and imputed SNPs were plotted using LocusZoom 1.1. this region in the pathogenesis and aetiology of colorectal cancer.

Functional annotation of rs2238126. We queried available ENCODE ChIP-seq Methods data from colorectal smooth muscle and HCT116 cells for histone modification Study subjects. All study subjects were Han Chinese population. We performed a markers (H3K4me1, H3K4me3 and H3K27ac) and transcription regulator markers three-stage GWAS for colorectal cancer. In the first GWAS stage, 1,049 colorectal to determine whether rs2238126 fell within putative transcriptional regulatory cancer cases were enrolled from the Cancer Center of Nanjing Medical University elements. Transcription factor ChIP-seq data in HCT116 cells showed significant and 1,315 controls were from the same districts of Nanjing beginning in September binding of MAX around rs2238126. We also used the EEL algorithm to investigate 2010 (Nanjing-1). The subjects in replication 1 included 855 colorectal cancer cases whether rs2238126 directly affected the MAX-binding site38. Further close and 1,258 controls from the Nanjing First Hospital also from the same districts of examination of histone modifications was performed using the chromatin-state Nanjing. The replication 2 sample sets were from six independent research centres segmentation track (ChromHMM) from the GM12878 lymphoblastoid cells. The in Wuhan (replication 2a, 805 cases and 1,200 controls), Guangzhou (replication ENCODE data were visualized using the University of California Santa Cruz 2b, 1,179 cases and 1,334 controls), Nanjing-3 (replication 2c, 612 cases and 1,188 (UCSC) genome browser. controls), Xi’an (replication 2d, 643 cases and 384 controls), Hangzhou (replication 2e, 511 cases and 647 controls), Shenyang (replication 2f, 712 cases and 876 controls). The cases were diagnosed and histopathologically confirmed at the Luciferase activity. The 1,000-bp containing rs2238126 A or G alleles of the hospitals, and the controls were genetically unrelated to the cases. The controls in enhancer sequence (chr12: 12,009,241-12,010,241) and ETV6 promoter region the GWAS stage were randomly selected from 25,000 subjects who participated in a (chr12: 11,801,788-11,802,787) were synthesized and cloned into the pGL3-basic community-based physical examination for noninfectious diseases in the same vector (Promega) using the NheI and XhoI restriction sites. All constructs were region. Additional controls in the replications were collected from those seeking confirmed by DNA sequencing. medical care in local hospitals. Exclusions included participants who had been For luciferase assays, HCT116 and SW480 cells were plated onto 24-well plates diagnosed with other colorectal disease, such as hereditary colorectal cancer (3 Â 105 cells per well) and transfected with reporter plasmids using Lipofectamine syndromes. The participation rate of the eligible cases and controls exceeded 90%. 2000 (Invitrogen). As an internal standard, all plasmids were co-transfected with Individuals who had smoked daily for more than one year were defined as smokers; 10 ng pRL-SV40, which contained the Renilla luciferase gene. All transfections were otherwise, the subjects were considered as non-smokers. All of the subjects performed in triplicate for each experiment. After transfection for 24 h, cells were recruited for the three-stage study were evaluated with the same criteria32–36. collected and measured for the luciferase activity with a Dual-Luciferase Reporter We included 1,046 colorectal cancer cases and 1,076 controls from dbGaP Assay System (Promega). Relative luciferase activity was normalized to Renilla (phs000779.v1.p1). All the subjects were from the Ontario Registry for Studies of luciferase and statistically analysed with two-sided t-test.

NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478

rs2238126 in ETV6 ETV6 expression Colorectal cancer risk MAX binding

ETV6 5′ 3′ Low risk

rs2238126 A

MAX binding

ETV6 5′ 3′ High risk

rs2238126 G Controls Cases

Figure 5 | A schematic model of our findings. The ETV6 gene expression is regulated by the SNP rs2238126. The rs2238126 G allele is associated with an increased risk of colorectal cancer because of decreased MAX binding, resulting in downregulating ETV6 expression.

Electrophoretic mobility shift assay. Synthetic 30 biotin-labelled 23-bp samples were subjected to SDS–polyacrylamide gel electrophoresis and transferred oligonucleotides and HCT116 cell nuclear extracts were incubated by using the to semi-dry blotted polyvinylidene difluoride membrane (Millipore). The primary LightShift Chemiluminescent EMSA Kit (Thermo Scientific). The oligonucleotide antibodies used for the protein analyses were monoclonal rabbit anti-ETV6, 1:1,000 sequences are shown in Supplementary Table 10. For each gel shift sample (10 ml), (ab151698, Abcam); and rabbit anti-b-actin, 1:1,000 (13E5, Cell Signaling a total of 1 mg nuclear extract was combined with 20 fmol labelled probes. For Technology). The secondary antibody used for protein analyses was anti-rabbit competition assays, unlabelled probes at 300-fold excess were added to the reaction HRP, 1:1,000 (BS13278, Bioworld Technology). The immune complexes were before addition of labelled probes. Binding reactions were separated on a 4.5% detected by enhanced chemiluminescence (Cell Signaling Technology). Uncropped polyacrylamide gel and detected by a chemiluminescent reaction with stabilized blots are shown in Supplementary Fig. 20. Streptavidin-horseradish peroxidase conjugate. In total, 67 paired surgical colorectal cancer specimens were fixed in formalin, routinely processed and embedded in paraffin. The primary antibody applied for ChIP assay. HCT116 cells were cross-linked with 1% formaldehyde at room immunohistochemical detection of ETV6 protein expression was the same as that temperature for 10 min. Nuclear extracts were sonicated to generate 200–1,000 bp used for western blot. Two experienced pathologists scored the staining results in a chromatin fragments. The fragmented chromatin was immunoprecipitated using a blinded manner. The immunostaining intensity was scored as 0 (negative), 1 ChIP assay kit (Upstate Biotechnology). The antibodies for the ChIP reaction were (weak), 2 (moderate) or 3 (strong; Supplementary Fig. 18), and the percentage of anti-MAX (ab53570, Abcam) and anti-rabbit IgG (2729, Cell Signaling stained cells was scored semiquantitatively as 1 (0–25%), 2 (26–50%), 3 (51–75%) Technology). Enrichment of the immunoprecipitation was assessed using gel or 4 (76–100%). Multiplication of the intensity and percentage scores resulted in a electrophoresis and quantitative RT–PCR assays. The primers for RT–PCR are score ranging from 0 to 12 for each tissue. The difference of scores was assessed included in Supplementary Table 10. Quantification of enrichment was expressed using the Wilcoxon matched-pairs signed-rank test. as a ratio of MAX or IgG over the input control. Data points and error bars represent the mean and standard deviation, respectively, calculated from triplicates. Construction and transfection of overexpression and knockdown of ETV6. For the stable overexpression and knockdown of ETV6 in colorectal cancer cells, one Quantitative RT–PCR and eQTL analysis. Five colorectal cancer cell lines, ETV6 cDNA and three independent shRNAs were designed and cloned into the HCT116, SW620, SW480, HT29 and LoVo, were maintained under standard GV358 and GV248 lentivirus vector (GeneChem), respectively. The plasmid conditions. These cell lines were purchased from Shanghai Institute of sequences were confirmed by DNA sequence analysis. The sequences of three Biochemistry and Cell Biology, Chinese Academy of Sciences (Shanghai, China) shRNAs are shown in Supplementary Table 10. The vectors were transfected using within the past 2 years. Cell line authentication was conducted by China Center for the Polybrene and Enhanced Infection Solution according to the manufacturer’s Type Culture Collection (Wuhan, China) using short-tandem repeat profiling, and protocol (GeneChem). The cells were infected with lentivirus at a multiplicity of À 1 the results were compared with the American Type Culture Collections (ATCC) infection of 10. The transfected cells were further selected with 2 mgml cell bank. No mycoplasma contamination was. All cell lines were tested and puromycin for 2 weeks. The stable effect of ETV6 overexpression and knockdown negative for mycoplasma contamination. Tumour tissue and paired adjacent was determined by quantitative RT–PCR and western blot (Supplementary Fig. 19). normal tissue samples were collected from 112 subjects with colorectal cancer. The For the transiently transfected SW480 cells, the ETV6 overexpression vector was detailed information of patients is summarized in Supplementary Table 11. Total constructed and cloned into the GV230 vector (GeneChem). The plasmid sequences RNA was isolated from cultured cells and tissue samples using TRIzol (Invitrogen) were confirmed by sequencing. For transient transfection, Lipofectamine 2000 and quantified by ultraviolet spectrometry. The relative mRNA expression level of transfection reagent (Invitrogen) was used according to the manufacturer’s protocol. ETV6 and the internal control genes were detected using an ABI 7900 Real-Time PCR system (Applied Biosystems). The mRNA expression levels of ACTINB, Cell proliferation and cell death assays. For cell proliferation analysis, the pro- 18sRNA, HRPT1, UBC and GAPDH were examined to identify the most stably liferation of colorectal cancer cells was evaluated using a Cell Counting Kit-8 (CCK-8; expressed housekeeping genes, and ACTINB and GAPDH were selected as Dojindo) at various time intervals. Cell growth was represented by the absorbance at endogenous controls using geNorm39. The geometric average of ACTINB and an optical density of 450 nm using an Infinite M200 spectrophotometer (Tecan). For GAPDH expression was used as a reference to normalize the ETV6 expression. the cell cycle assay, cells were fixed with 75% ethyl alcohol, stained with propidium The primer sets designed are presented in Supplementary Table 10. iodide and the assay was performed using a FACS Calibur flow cytometer (Beckman The ETV6 mRNA expression levels in normal tissues were measured using Coulter). For apoptosis detection, cells were collected and stained using an Annexin RNA-Seq of 27 different tissues from 95 human individuals, which are available at V-FITC apoptosis detection kit (Invitrogen), and flow cytometry was used to detect Array Express (accession number: E-MTAB-1733)40. The raw reads for each tissue the percentage of apoptotic cells. All experiments were independently performed at were trimmed for low-quality ends. The average fragments per kilobase of exon least three times and data were expressed as a mean and standard deviation. Statistical model per million fragments mapped (FPKM) value of all individual samples was comparisons were analysed with two-sided t-test. used to normalize mRNA expression. Gene expression profiles were downloaded from TCGA project by RNA-Seq Statistical analysis (level 3). In total, 434 colon adenocarcinoma tissues and 41 normal colon samples . The association between each SNP and colorectal cancer risk were included. To control for potential batch effects of mRNA expressions, a series was evaluated under an additive model with adjustment for eigenvectors, age and of normalizations and corrections were applied, as described by Pickrell et al.41. sex using PLINK1.07. The population structure was estimated by a PCA using Briefly, level 3 mRNA expression of each gene was log2 transformed if it was not EIGENSOFT 5.0.1, and the Manhattan plot based on the À log10 P was created by normally distributed, and genes with zero values were removed. PCA was using R 2.15.0. The first two eigenvectors for each individual were plotted. For the performed to correct gene expression, accounting for unmeasured confounders. combined analysis, a meta-analysis of the OR weighted based on the 95% confidence interval was conducted under a fixed-effects model. The measure of We also accessed TCGA individual level 2 SNP data from tissues and blood, which 2 were genotyped with an Affymetrix Wide SNP 6.0 array. SNPs from heterogeneity was tested using Cochran’s Q statistics and I . We used the 500 kb flanking rs2238126 were used to impute genotypes based on the 1,000 Genomes Haploview 4.2 to visualize the LD structure of chromosome 12p13.2. The biological relationships between the genes within the GWAS-reported loci were quantified Project using IMPUTE2. The analysis of variance (ANOVA) model was applied to 42 assess the correlations between SNP genotypes and mRNA expression levels. using GRAIL . Alternatively, the analyses were performed using SAS 9.2 (SAS Institute) or Stata 10.0 (StataCorp LP). Western blot and immunocytochemistry. Western blot assays were performed according to standard procedures. Cell lysates were extracted using a detergent lysis Data availability. The genotyping data has been deposited in the Dryad Digital buffer supplemented with a protease inhibitor. Equal amounts (40 mg) of protein Repository (DOI: 10.5061/dryad.7dj7t) (ref. 43).

8 NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11478 ARTICLE

References 33. Ma, L. et al. A genetic variant in miR-146a modifies colorectal cancer 1. Jemal, A. et al. Global cancer statistics. CA 61, 69–90 (2011). susceptibility in a Chinese population. Archiv. Toxicol. 87, 825–833 (2013). 2. Lichtenstein, P. et al. Environmental and heritable factors in the causation 34. Zhong, R. et al. Genetic variations in the TGFbeta signaling pathway, of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. smoking and risk of colorectal cancer in a Chinese population. Carcinogenesis N. Engl. J. Med. 343, 78–85 (2000). 34, 936–942 (2013). 3. de la Chapelle, A. Genetic predisposition to colorectal cancer. Nat. Rev. Cancer 35. Zheng, J. et al. The protective role of polymorphism MKK4-1304T4Gin 4, 769–780 (2004). nasopharyngeal carcinoma is modulated by Epstein-Barr virus’ infection status. 4. Zanke, B. W. et al. Genome-wide association scan identifies a colorectal cancer Int. J. Cancer 130, 1981–1990 (2012). susceptibility locus on chromosome 8q24. Nature Genet. 39, 989–994 (2007). 36. Dong, G. et al. Potentially functional genetic variants in KDR gene as 5. Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a prognostic markers in patients with resected colorectal cancer. Cancer Sci. 103, susceptibility variant for colorectal cancer at 8q24.21. Nature Genet. 39, 561–568 (2012). 984–988 (2007). 37. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype 6. Broderick, P. et al. A genome-wide association study shows that common imputation method for the next generation of genome-wide association studies. alleles of SMAD7 influence colorectal cancer risk. Nature Genet. 39, 1315–1317 PLoS Genet. 5, e1000529 (2009). (2007). 38. Hallikas, O. et al. Genome-wide prediction of mammalian enhancers 7. Tomlinson, I. P. et al. A genome-wide association study identifies colorectal based on analysis of transcription-factor binding affinity. Cell 124, 47–59 cancer susceptibility loci on 10p14 and 8q23.3. Nature Genet. 40, (2006). 623–630 (2008). 39. Vandesompele, J. et al. Accurate normalization of real-time quantitative 8. Tenesa, A. et al. Genome-wide association scan identifies a colorectal cancer RT-PCR data by geometric averaging of multiple internal control genes. susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nature Genome Biol. 3, RESEARCH0034 (2002). Genet. 40, 631–637 (2008). 40. Fagerberg, L. et al. Analysis of the human tissue-specific expression by 9. Houlston, R. S. et al. Meta-analysis of genome-wide association data identifies four genome-wide integration of transcriptomics and antibody-based proteomics. new susceptibility loci for colorectal cancer. Nature Genet. 40, 1426–1435 (2008). Mol. Cell. Proteomics 13, 397–406 (2014). 10. Houlston, R. S. et al. Meta-analysis of three genome-wide association studies 41. Pickrell, J. K. et al. Understanding mechanisms underlying human gene identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and expression variation with RNA sequencing. Nature 464, 768–772 (2010). 20q13.33. Nature Genet. 42, 973–977 (2010). 42. Raychaudhuri, S. et al. Identifying relationships among genomic disease 11. Dunlop, M. G. et al. Common variation near CDKN1A, POLD3 and SHROOM2 regions: predicting genes at pathogenic SNP associations and rare deletions. influences colorectal cancer risk. Nature Genet. 44, 770–776 (2012). PLoS Genet. 5, e1000534 (2009). 12. Peters, U. et al. Identification of genetic susceptibility loci for colorectal tumors 43. Chen, J. et al. Common genetic variation in ETV6 is associated with colorectal in a genome-wide meta-analysis. Gastroenterology 144, 799–807 e724 (2013). cancer susceptibility. Dryad Digital Repository. doi:10.5061/dryad.7dj7t (2016). 13. Jaeger, E. et al. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nature Genet. 40, 26–28 Acknowledgements (2008). We thank the Ontario Registry for Studies of Familial Colorectal Cancer for providing 14. Xiong, F. et al. Risk of genome-wide association study-identified genetic European colorectal cancer GWAS data. We thank Donald L. Hill (University of variants for colorectal cancer in a Chinese population. Cancer Epidemiol. Alabama at Birmingham, USA) and Qingyi Wei (Duke University School of Medicine Biomarkers Prev. 19, 1855–1861 (2010). and Duke Cancer Institute, USA) for editing and comments. This study was partially 15. Ho, J. W. et al. Replication study of SNP associations for colorectal cancer in supported by the National Natural Science Foundation of China (81272469, 81230068, Hong Kong Chinese. Br. J. Cancer 104, 369–375 (2011). 81373091, 81201570 and 81370057), the National 973 Basic Research Program of China 16. Cui, R. et al. Common variant in 6q26-q27 is associated with distal colon (2013CB911300), Distinguished Young Scholars of Nanjing (JQX13005), the Clinical cancer in an Asian population. Gut 60, 799–805 (2011). Special Project for Natural Science Foundation of Jiangsu Province (BL2012016), 17. Jia, W. H. et al. Genome-wide association analyses in East Asians identify new Nanjing 12th Five-Year key Scientific Project of Medicine (J.C.) and the Project Funded susceptibility loci for colorectal cancer. Nature Genet. 45, 191–196 (2013). by the Priority Academic Program Development of Jiangsu Higher Education Institu- 18. Zhang, B. et al. Genome-wide association study identifies a new SMAD7 risk tions (Public Health and Preventive Medicine). variant associated with colorectal cancer risk in East Asians. Int. J. Cancer 135, 948–955 (2014). 19. Zhang, B. et al. Large-scale genetic study in East Asians identifies six new loci Author contributions associated with colorectal cancer risk. Nature Genet. 46, 533–542 (2014). J.C., Z.Z. and M.W. directed the study, obtained financial support and were responsible 20. Eguchi-Ishimae, M. et al. Leukemia-related transcription factor TEL/ETV6 for study design, interpretation of results and manuscript writing. M.W., D.G., M.D., Z.X. expands erythroid precursors and stimulates hemoglobin synthesis. Cancer Sci. and H.C. recruited study subjects and managed the respective project. M.W., M.D. and 100, 689–697 (2009). H.C. performed statistical analyses, summarized results and drafted the manuscript. S.Z., 21. Maki, K. et al. Leukemia-related transcription factor TEL is negatively regulated L.Z., J.L., R.Z., J.X., X.M. and Z.H. collected subjects and prepared samples. M.D., H.C., through extracellular signal-regulated kinase-induced phosphorylation. Mol. L.Y., C.T., L.P., H.D., J.Z., J.D., and N.T. conducted the genotyping analysis. M.W., M.D. Cell. Biol. 24, 3227–3237 (2004). and Z.X. performed statistical and bioinformatics analysis and carried out experiments. 22. Deves, C. et al. Analysis of select members of the E26 (ETS) transcription J.S., H.S. and J.X. coordinated the project. All of the authors reviewed, approved and factors family in colorectal cancer. Virchows Archiv. 458, 421–430 (2011). contributed to the final version of the manuscript. 23. Adhikary, S. & Eilers, M. Transcriptional regulation and transformation by MYC . Nat. Rev. Mol. Cell Biol. 6, 635–645 (2005). 24. Hurlin, P. J. & Huang, J. The MAX-interacting transcription factor network. Additional information Semin. Cancer Biol. 16, 265–274 (2006). Supplementary Information accompanies this paper at http://www.nature.com/ 25. Glozak, M. A. & Seto, E. Histone deacetylases and cancer. Oncogene 26, naturecommunications 5420–5432 (2007). 26. Toaldo, C. et al. PPARgamma ligands inhibit telomerase activity and hTERT Competing financial interests: The authors declare no competing financial interests. expression through modulation of the Myc/Mad/Max network in colon cancer Reprints and permission information is available online at http://npg.nature.com/ cells. J. Cell. Mol. Med. 14, 1347–1357 (2010). reprintsandpermissions/ 27. Oikawa, T. & Yamada, T. Molecular biology of the Ets family of transcription factors. Gene 303, 11–34 (2003). How to cite this article: Wang, M. et al. Common genetic variation in ETV6 is associated 28. Seth, A. & Watson, D. K. ETS transcription factors and their emerging roles in with colorectal cancer susceptibility. Nat. Commun. 7:11478 doi: 10.1038/ncomms11478 human cancer. Eur. J. Cancer 41, 2462–2478 (2005). (2016). 29. Sementchenko, V. I. & Watson, D. K. Ets target genes: past, present and future. Oncogene 19, 6533–6548 (2000). 30. Guo, B., Godzik, A. & Reed, J. C. Bcl-G, a novel pro-apoptotic member of the This work is licensed under a Creative Commons Attribution 4.0 Bcl-2 family. J. Biol. Chem. 276, 2780–2785 (2001). International License. The images or other third party material in this 31. Rompaey, L. V., Potter, M., Adams, C. & Grosveld, G. Tel induces a G1 arrest article are included in the article’s Creative Commons license, unless indicated otherwise and suppresses Ras-induced transformation. Oncogene 19, 5244–5250 (2000). in the credit line; if the material is not included under the Creative Commons license, 32. Wang, W. et al. MDM2 SNP309 polymorphism is associated with colorectal users will need to obtain permission from the license holder to reproduce the material. cancer risk. Sci. Rep. 4, 4851 (2014). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

NATURE COMMUNICATIONS | 7:11478 | DOI: 10.1038/ncomms11478 | www.nature.com/naturecommunications 9