The Integration of ENCODE Into the Study of the Complexity of Cancer Susceptibility
Total Page:16
File Type:pdf, Size:1020Kb
The Integration of ENCODE into the Study of the Complexity of Cancer Susceptibility Stephen J. Chanock, M.D. Director, Division of Cancer Epidemiology & Genetics July 1, 2015 Etiology of Cancer Environment Genetic Susceptibility Mutations SNPs STR’s Lifestyle Chromosomes (Epigenetics) 100% 100% “Triggers” “Set Point” “Chance” Cancer Genomics: 4 Spaces >115 Cancer Syndromes >25 Moderate Penetrant BRCA1/2 >475 GWAS Loci Lynch Syndrome ACMG “AcConable” Germline TEST: Single MutaBon Somatic Heterogeneity Singe Gene Metastases Panel of “Cancer Genes” ‘Drivers’ Exome ‘ Passengers’ Whole Genome TarGeted Therapy HER2 TCGA/ICGC- Discovery Actionable Clinically EGFR Cosmic Data BRAF600 What Happens When There is More than One Genome? ChallenGe of Cancer Genomics TCGA: Driven by the Numbers But not yet validated in the laboratory…. Value of Frequency in Generating Hypotheses 5 But, further laboratory work is needed… Evidence for Heritability of Cancer 1866 Broca observed heritability based on familial breast cancer Interim Twin/Family/Sibling studies… 1969 Li-Fraumeni observed familial clustering (TP53) 1971 Knudson postulated “two-hit” hypothesis for retinoblastoma 1991 Positional cloning of a familial breast cancer gene (BRCA1) >115 Genes Mutated in Cancer Susceptibility Syndromes SDHB Ascertained in Families MUTYH ALK FANCD2 Rare Mutaon with Strong Effect MSH2 VHL MSH6 XPC POT1 FANCL HRAS MLH1 PMS2 FANCF FANCEOncoGenes & Tumor Suppressors BAP1 TERT EGFR CDKN2A MITF WT1 SDHA FANCG DDB2 PHOXB2 POLH PAX5 EXT2 WRN TMEM127 CDK4 GATA2 SBDS NBS1 GALNT12 SDHAF2 PTPN11 HAX1 XPA RET MEN1 TERC BMPR1A ATM SDHC ERCC3 PTCH PTEN MET FANCC SDHD HRPT2 DIS3L2 KIT SUFO CBL PDGFRA APC TSC1 EXT1 RECQL4 T FH FANCB SLX4 TSC2 TP53 ELANE ERCC4 WRAP53 STK11 FANCN/BRIP1 FLCN TINF2 CEBPA RUNX1 BRCA2 NF1 ERCC2 SMARCB1 FANCM BRCA1 CHEK2 RB1 CYLD FANCJ SMAD4 NF2 MAX BUB1B CDH1 RAD51C Incomplete “Penetrance” FANCI FANCA HOXB13 BLM DICER1 ERCC5 Not all affected deVelop cancer GPC3 Modifiers - genec & environmental DKC1 TCGA: Lessons Learned from the Data Survival Analyses Impact of germline or somatic mutations 8 High Penetrance Mutations & Somatic Alterations Nearly 50% “High Frequency” Somac Mutaons Rahman Nature 2014 Search for Common Variants in Complex Diseases Reproducible TechnoloGy SNP Microarray Chip >5 M genotyped SNPs across genome >30 M imputed SNPs across genome High Concordance > 99.5%/assay ‘Markers’ across the genome Commitment to Mapping Creates a mulple tesng problem Published Cancer GWAS Etiology Hits: July 2015 TARDBP FOXQ1 IRF4 PEX14 KIF1B TRIM31 2p25.2 EXOC2 NEDD9 >490 Disease Loci marked by SNPs 1p36 TAF1B RANBP9 1p36 NOL10 GPX6 For > 29 “cancers” PEX14 GABBR1 HLA-A/B/C DDX1 HLA-F RSPO1 ITPR1 6p22.3 BAT3 GRM4 1 Locus marked by a CNV 2p24.1 DAZL MICB 6p22.1 HLA-DQ C2orf43 NCOA1 SLC4A7 EOMES MICA BAK1 20% InterGenic TSPAN32 2p22.2 MAD1L1 TGFBR2 GSDMB 6p21.33 10p15.1 TERT/ DMRT1 LSP1 11p15 THADA EPAS1 CLPTM1L CCHCR1 SP8 ~8% Shared between cancers= Pleiotropy ITGA9 ULK4 TACC3 HLA-DRB5 BNC2 GATA3 NAT2 9p21.3 LMO1 CCND2 REL IRX4 6p21.32 JAZF1 DNAH11 SLC25A37 9p21 FOXP4 NKX3.1 CDKN2A/ 10p14 ATF7IP 1p13.2 CDKN1A CDKN2B HSD17B12 12p13.1 EHBP1 DAB2 PIP4K2A 3p12.1 PRKAA1 NOTCH4 EBF2 DNAJC1 FGF10 HLA-DRA TNS3 FAM111A ITPR2 GSTM1 GGCX/ IKZF1 10p12.31 OR8U8 deletion VAMP8 EXOC1 5p12 ARMC2 3p11.2 HLA-DRB6 EGFR 8p12 NRG1 OVOL1/ PTHLH SNX32 AFM MARCH8 1p11.2 PDE4D DDX4 UNC5CL BTNL2 TUBA1C 1q21.1 ARID5B 11q13 CCND1 ACOXL SIDT1 COX18 MSMB 12q13 12q13.13 GOLPH3L MAP3K1 RAB3C MYO6 PRDM14 PDLIM5 4q22.3 HNF4G ERG2 ZNF365 1q21.3 2q14.2 ZBTB20 >100 to be reported by ONCOARRAY KRT8 KRT5 6q14.1 8q21.11 POLD3 KCNN3 DUSP12 ADH1B LIN28B LMTK2 8q21.13 ZMIZ1 EEFSEC ARRDC3 10q24.2 11q14 DLG2 KITLG RFX6 6q22.1 FOXE1 SLC25A44 ZEB2 TET2 RGS22 FAS 1q22 (Breast/prostate/lung/colon/oVarian8q22.3 ) NTN4 12q23.1 ZBTB38 ECHDC1/ PLCE1 MMP7 TYR RNF146 HACE1 9q31.2 RAD23B STAT4 LEF1 CENP3 POT1 EIF3H UCK2 CLDN11/ TRIM8 10q25.1 ACAD10 ALDH2 3q25.31 11q23.1 ATM SKIL HBS1L DAB21P ITGA6 4q26 SYNPO2 7q32 MYC CCDC26 TCF7L2 CDCA7 VTI1A SCARB1 LGR6 ZC3H11A TERC PHLDB1 HTR3B DLX2 MYNN 5q31.1 PITX1 ESR1 LINC-PINT 2q31.1 8q24 PVT1 ABO 10q26.12 PRLHR MDS1 GRAMD1B CXCR5 TBX5 MDM4 NR5A2 SPRY4 IPCEF1 ARHGEF5 NABP1 ST6GAL1 CTBP2 11q24.3 3q27.3 EBF1 TAB2 ETS1 RPL6 12q24.21 SLC41A1 CASP8 PSCA TP63 ADAM29 RSG17 FGFR2 10q26 LPP 6q25.1 C12orf51 DUSP10 ERBB4 FAM44B C3orf21 BARD1 PARP1 DIRC3 SLC22A3 RHOU SP140 2q35 UGT1A MLPH SHROOM2 FARP2 C20orf54 TGM3 SMG6 VPS53 BABAM1 ELL BMP2 20p12.3 TBX1 TP53 PTPN2 19p13 XAGE3 CEBPE TNFRSF13B JAG1 CHEK2 GREM1 18q11.2 NUDT11 MIPEP TNFRSF19 NKX2-1 ATAD5 NRIP1 CCNE1 RHPN2 EMID1 ZNRF3 TOX3 PAX9 15q15.1 HEATR3 HNF1Bx2 17q12 CHST9 RALY CDC91L1 AR PDX1 MBIP 19q13 GRIK1 BACH1 C19orf61 XBP1 MTMR3 FERMT2 BMF 17q21.32 SLC14A1 BRCA2 FTO HAP1 PRKD2 ADNP RUNX1 SLC7A DCC 22q13 BMP4 CDH1 SPOP MKL1 15q21.3 ZNF652 SMAD7 KLK2/ LAMA5 GATAS TMPRSS2 MCM3AP Xq13 16q22.3 KLK3 CBX7 22q13.1 SIX1 PHLPP2 BRIP1 15q23 COX11 PMAIP1 BCL2 LILRA3 RTEL1 MX2 TFF1 CDYL2 16q23.1 RAD51L1 17q22 22q13 RAD51B BPTF SALL3 ZGPAT 13q22 CHRNA3/5 IRF8 FAM19A5 13q22.1 TTC9 17q24 CPEB1 MC1R UBAC2 CCDC88C 15q25.1 PRC1 Virtually none are associated with outcomes 13 Basal Cell 14 Bladder 85 Breast 7 Cervical 31 CLL 31 Colorectal 1 Endometrial 17 Esophageal Sq 3 Ewing Sarcoma 1 Gallbladder 4 Gastric 10 Glioma 8 Hodgkins 5 Kidney 7 Liver 20 Lung 14 Melanoma 31 Multiple 6 Multiple Myeloma 11 Nasopharyngeal 9 Neuroblastoma 13 Non-Hodgkin 2 Osteosarcoma 16 Ovary 14 Pancreas 9 Pediatric ALL 99 Prostate 21 Testicular 5 Thyroid 3 Wilms GWAS Signals & Somac Mutaons: No Strong Correlaon Redundant Pathways- ‘NOT DriVers’ GWAS Genes Permutation Genes Number of Mutations (per 100 individuals) M Machiela in Revision 0 0−1 1−2 2−3 3−4 4−5 5< Interpretaon: Correlaon does not imply causaon GWAS designs provide no mechanism to disnguish stascal associaon from causaon Linkage GWAS NGS Balding, Nature Genecs Review 2006 Can we use ENCODE to priorize SNPs for follow-up? GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation Comprehensive Functional Annotation of 77 Prostate Dongjun Chung1,2., Can Yang1,3,4., Cong Li5, Joel Gelernter3,6,7,8, Hongyu Zhao1,5,7,9* Cancer Risk Loci 1 Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America, 2 Department of Public Health Sciences, Medical University Dennis J. Hazelett1*, Suhn Kyong Rhie1, Malaina Gaddis2, Chunli Yan1, Daniel L. Lakeland3, of South Carolina, Charleston, South Carolina, United States of America, 3 Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of 4 5" 6" 5 America, 4 Department of Mathematics, Hong Kong Baptist University, Hong Kong, China, 5 Program in Computational Biology and Bioinformatics, Yale University, New Simon G. Coetzee , Ellipse/GAME-ON consortium , Practical consortium , Brian E. Henderson , Haven, Connecticut, United States of America, 6 VA CT Healthcare Center, West Haven, Connecticut, United States of America, 7 Department of Genetics, Yale School of Houtan Noushmehr4, Wendy Cozen7, Zsofia Kote-Jarai6, Rosalind A. Eeles6,8, Douglas F. Easton9, Medicine, West Haven, Connecticut, United States of America, 8 Department of Neurobiology, Yale School of Medicine, New Haven, Connecticut, United States of America, 5 10 2 1 9 VA Cooperative Studies Program Coordinating Center, West Haven, Connecticut, United States of America Christopher A. Haiman , Wange Lu , Peggy J. Farnham , Gerhard A. Coetzee * 1 Departments of Urology and Preventive Medicine, Norris Cancer Center, University of Southern California Keck School of Medicine, Los Angeles, California, United States of America, 2 Department of Biochemistry and Molecular Biology, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of Abstract America, 3 Sonny Astani Department of Civil and Environmental Engineering, University of Southern California, Los Angeles, California, United States of America, Results from Genome-Wide Association Studies (GWAS) have shown that complex diseases are often affected by many 4 Department of Genetics, University of Sa˜o Paulo, Ribeira˜o Preto, Brazil, 5 Department of Preventive Medicine, Norris Cancer Center, University of Southern California genetic variants with small or moderate effects. Identifications of these risk variants remain a very challenging problem. Keck School of Medicine, Los Angeles, California, United States of America, 6 The Institute of Cancer Research, Sutton, United Kingdom, 7 USC Keck School of Medicine, There is a need to develop more powerful statistical methods to leverage available information to improve upon traditional Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America, 8 Royal Marsden National Health Service (NHS) approaches that focus on a single GWAS dataset without incorporating additional data. In this paper, we propose a novel Foundation Trust, London and Sutton, United Kingdom, 9 Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, United statistical approach, GPA (Genetic analysis incorporating Pleiotropy and Annotation), to increase statistical power to identify Kingdom, 10 Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Department of Biochemistry and Molecular Biology, Keck School of Medicine, risk variants through joint analysis of multiple GWAS data sets and annotation information because: (1) accumulating University of Southern California, Los Angeles, California, United States of America evidence suggests that different complex diseases share common risk bases, i.e., pleiotropy; and (2) functionally annotated variants have been consistently demonstrated to be enriched among GWAS hits. GPA can integrate multiple GWAS datasets and functional annotations to seek association signals, and it can also perform hypothesis testing to test the presence of Abstract pleiotropy and enrichment of functional annotation. Statistical inference of the model parameters and SNP ranking is YES achieved through an EM algorithm that can handle genome-wide markers efficiently.