Germline Determinants of the Somatic Mutation Landscape in 2,642 Cancer Genomes
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/208330; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Germline determinants of the somatic mutation landscape in 2,642 cancer genomes Sebastian M Waszak1*, Grace Tiao2*, Bin Zhu3*, Tobias Rausch1*, Francesc Muyas4,5#, Bernardo Rodríguez-Martín6,7#, Raquel Rabionet4#, Sergei Yakneen1#, Georgia Escaramis4,8, Yilong Li9, Natalie Saini10, Steven A Roberts11, German M Demidov4,5, Esa Pitkänen1; Olivier Delaneau12-14, Jose Maria Heredia-Genestar15, Joachim Weischenfeldt1,16, Suyash S Shringarpure17, Jieming Chen18; Hidewaki Nakagawa19, Ludmil B Alexandrov20, Oliver Drechsel4,5, L Jonathan Dursi21, Ayellet V Segre2, Erik Garrison9, Serap Erkek1, Nina Habermann1, Lara Urban22, Ekta Khurana23, Andy Cafferkey22, Shuto Hayashi24, Seiya Imoto25, Lauri A Aaltonen26, Eva G Alvarez6,7, Adrian Baez-Ortega27, Matthew Bailey33, Mattia Bosio4,5, Alicia L Bruzos6,7, Ivo Buchhalter28, Carlos D. Bustamante29, Claudia Calabrese22, Anthony DiBiase30, Mark Gerstein31-33, Aliaksei Z Holik4,5, Xing Hua3, Kuan-lin Huang34, Ivica Letunic35, Leszek J Klimczak36, Roelof Koster3, Sushant Kumar31, Mike McLellan34, Jay Mashl34, Lisa Mirabello3, Steven Newhouse22, Aparna Prasad4,5, Gunnar Rätsch37, Matthias Schlesner28, Roland Schwarz38, Pramod Sharma30, Tal Shmaya17, Nikos Sidiropoulos16, Lei Song3, Hana Susak4,5, Tomas Tanskanen26, Marta Tojo6,7, David C Wedge39, Mark Wright29, Ying Wu17, Kai Ye40,41, Venkata D Yellapantula40,41, Jorge Zamora6,7, Atul J Butte18, Gad Getz2,42, Jared Simpson43, Li Ding34, Tomas Marques-Bonet15, Arcadi Navarro15, Alvis Brazma22, Peter Campbell44, Stephen J Chanock3, Nilanjan Chatterjee45, Oliver Stegle1, Reiner Siebert46, Stephan Ossowski4,5,47# , Olivier Harismendy48#, Dmitry A Gordenin10#, Jose MC Tubio6,7*, Francisco M De La Vega17,29*, Douglas F Easton49*@, Xavier Estivill4*@, Jan O Korbel1,22*@, on behalf of the PCAWG Germline Working group% and the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Network *these authors contributed equally #these authors contributed equally @To whom correspondence should be addressed: Douglas F. Easton ([email protected]), Xavier Estivill ([email protected]) and Jan O. Korbel ([email protected]) 1 European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany. 2 The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02124, USA. 3 Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Bethesda, Maryland 20892, USA. 4 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain. 5 Universitat Pompeu Fabra, Barcelona, Spain. 6 Mobile Genomes & Disease, The Biomedical Research Centre - CINBIO, University of Vigo, 36310 Vigo, Spain. 7 Department of Biochemistry, Genetics and Immunology, Faculty of Biology, University of Vigo, Vigo 36310, Spain. 8 CIBER Epidemiología y Salud Pública (CIBERESP), Spain. 9 Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. 10 Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, US National Institutes of Health, Research Triangle Park, NC, USA. 11 School of Molecular Biosciences, Washington State University, Pullman, WA 99164, USA. 12 Swiss Institute of Bioinformatics, University of Geneva, 1 Michel Servet, Geneva CH1211, Switzerland. 13 Department of Genetic Medicine and Development, University of Geneva, 1 Michel Servet, Geneva CH1211, Switzerland. 14 Institute of Genetics and Genomics in Geneva, University of Geneva, 1 Michel Servet, Geneva CH1211, Switzerland. 15 Institute of Evolutionary Biology (UPF-CSIC), Department of Experimental and Health Sciences, Pompeu Fabra University, Barcelona Biomedical Research Park, Doctor Aiguader 88, Barcelona, 08003, Spain. 1 bioRxiv preprint doi: https://doi.org/10.1101/208330; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 16 Biotech Research &Innovation Centre (BRIC), Copenhagen University and Finsen Laboratory, Rigshospitalet, Denmark. 17 Annai Systems, 1700 Aviara Parkway 130063, Carlsbad, CA 92013, USA. 18 Institute for Computational Health Sciences and Department of Pediatrics, University of California, San Francisco, California 94143, USA. 19 RIKEN Center for Integrative Medical Sciences, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan. 20 Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA. 21 Ontario Institute for Cancer Research, 661 University Ave, Suite 510, Toronto, ON M5G 0A3, Canada. 22 European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, UK. 23 Sandra and Edward Meyer Cancer Center, Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, New York, USA. 24 Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan. 25 Health Intelligence Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan. 26 Department of Medical and Clinical Genetics, Genome-Scale Biology Research Program, University of Helsinki, Finland. 27 Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, UK. 28 Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany. 29 Departments of Genetics and Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA. 30 Cray Inc. 901 Fifth Avenue, Seattle, WA 98164, USA. 31 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA. 32 Department of Computer Science, Yale University, New Haven, CT, 06520, USA. 33 Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA. 34 The McDonnell Genome Institute, Washington University in St. Louis, Forest Park Avenue, Campus Box 8501, St Louis, Missouri 63108, USA. 35 Biobyte solutions GmbH, Bothestrasse 142, 69126 Heidelberg, Germany. 36 Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, US National Institutes of Health, Research Triangle Park, North Carolina, USA. 37 Department of Computer Science, Swiss Federal Institute of Technology in Zurich, 8092, Switzerland. 38 Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine, Berlin, Germany. 39 Oxford Big Data Institute and Oxford Centre for Cancer Gene Research, Wellcome Trust Centre for Human Genetics, Oxford, UK. 40 McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA. 41 Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA. 42 Harvard Medical School, Boston, Massachusetts 02115, USA. 43 Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada. 44 Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK. 45 Department of Biostatistics, Bloomberg School of Public Health and Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA. 46 Institute of Human Genetics, University of Ulm, Ulm, Germany. 47 Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany. 48 Moores Cancer Center, School of Medicine, University of California San Diego, La Jolla, CA, USA. 49 Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. 2 bioRxiv preprint doi: https://doi.org/10.1101/208330; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Cancers develop through somatic mutagenesis, however germline genetic variation can markedly contribute to tumorigenesis via diverse mechanisms. We discovered and phased 88 million germline single nucleotide variants, short insertions/deletions, and large structural variants in whole genomes from 2,642 cancer patients, and employed this genomic resource to study genetic determinants of somatic mutagenesis across 39 cancer types. Our analyses implicate damaging germline variants in a variety of cancer predisposition and DNA damage response genes with specific somatic mutation patterns. Mutations in the MBD4 DNA glycosylase gene showed association with elevated C>T mutagenesis at CpG dinucleotides, a ubiquitous mutational process acting across tissues. Analysis of somatic structural variation exposed complex rearrangement patterns, involving cycles of templated insertions and tandem duplications, in BRCA1-deficient tumours. Genome-wide association analysis implicated common genetic variation at the APOBEC3 gene cluster with reduced basal levels of somatic mutagenesis attributable to APOBEC cytidine deaminases