bioRxiv preprint doi: https://doi.org/10.1101/267500; this version posted February 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Integrated single-nucleotide and structural variation sig- natures of DNA-repair deficient human cancers Tyler Funnell1, Allen Zhang1, Yu-Jia Shiah2, Diljot Grewal1, Robert Lesurf2, Steven McKinney1, Ali Bashashati1, Yi Kan Wang1, Paul C. Boutros2, Sohrab P. Shah1;3 1Department of Molecular Oncology, BC Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada 2Informatics & Biocomputing, Ontario Institute for Cancer Research, Toronto, Canada 3Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, V6T 2B5, Canada Correspondence and requests for materials should be addressed to S.P.S. (email:
[email protected]) Keywords: mutation signatures, structural variants, single nucleotide variants, cancer genome, ovarian cancer, breast cancer, statistical model, correlated topic models 1 bioRxiv preprint doi: https://doi.org/10.1101/267500; this version posted February 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Mutation signatures in cancer genomes reflect endogenous and exogenous mutational pro- cesses, offering insights into tumour etiology, features for prognostic and biologic stratifica- tion and vulnerabilities to be exploited therapeutically. We present a novel machine learning formalism for improved signature inference, based on multi-modal correlated topic models (MMCTM) which can at once infer signatures from both single nucleotide and structural variation counts derived from cancer genome sequencing data.