bioRxiv preprint doi: https://doi.org/10.1101/630756; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. A domain-resolution map of in vivo DNA binding reveals the regulatory consequences of somatic mutations in zinc finger transcription factors Berat Dogan1,2,4,¶, Senthilkumar Kailasam1,2,¶, Aldo Hernández Corchado1,2, Naghmeh Nikpoor3, Hamed S. Najafabadi1,2,* 1 Department of Human Genetics, McGill University, Montreal, QC, Canada 2 McGill Genome Centre, Montreal, QC, Canada 3 Rosell Institute for Microbiome and Probiotics, Montreal, QC, Canada 4 Current address: Department of Biomedical Engineering, Inonu University, Malatya, Turkey ¶ These authors contributed equally to this work. * Correspondence should be addressed to: H.S.N.,
[email protected] bioRxiv preprint doi: https://doi.org/10.1101/630756; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. ABSTRACT Multi-zinc finger proteins constitute the largest class of human transcription factors. Their DNA-binding specificity is usually encoded by a subset of their tandem Cys2His2 zinc finger (ZF) domains – the subset that binds to DNA, however, is often unknown. Here, by combining a context-aware machine-learning-based model of DNA recognition with in vivo binding data, we characterize the sequence preferences and the ZF subset that is responsible for DNA binding in 209 human multi-ZF proteins.