bioRxiv preprint doi: https://doi.org/10.1101/761106; this version posted September 8, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. KDML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens Heba Z. Sailem*1,2, Jens Rittscher1,2, Lucas Pelkmans3 1 Institute of Biomedical Engineering, Department of Engineering Science, Old Road Campus Research Building, University of Oxford OX3 7DQ, UK 2 Big Data Institute, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus Research Building, Oxford OX3 7LF, UK 3 Department of Molecular Life Sciences, Winterthurerstrasse 190, 8057, University of Zurich, Switzerland * Address correspondence to:
[email protected] Running title Mapping of phenotypes to gene functions 1 bioRxiv preprint doi: https://doi.org/10.1101/761106; this version posted September 8, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Abstract Characterising context-dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large- scale genetic perturbation screens is based on ad-hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge-Driven Machine Learning (KDML), a framework that systematically predicts multiple functions for a given gene based on the similarity of its perturbation phenotype to those with known function.