Representation Learning for Networks in Biology and Medicine: Advancements, Challenges, and Opportunities Michelle M. Li1,3, Kexin Huang2, and Marinka Zitnik3,4,5,∗ 1Bioinformatics and Integrative Genomics Program, Harvard Medical School 2Health Data Science Program, Harvard T.H. Chan School of Public Health 3Department of Biomedical Informatics, Harvard Medical School 4Broad Institute of MIT and Harvard 5Harvard Data Science ∗Correspondence:
[email protected] Abstract With the remarkable success of representation learning in providing powerful predictions and data in- sights, we have witnessed a rapid expansion of representation learning techniques into modeling, analysis, and learning with networks. Biomedical networks are universal descriptors of systems of interacting el- ements, from protein interactions to disease networks, all the way to healthcare systems and scientific knowledge. In this review, we put forward an observation that long-standing principles of network biology and medicine—while often unspoken in machine learning research—can provide the conceptual grounding for representation learning, explain its current successes and limitations, and inform future advances. We synthesize a spectrum of algorithmic approaches that, at their core, leverage topological features to embed networks into compact vector spaces. We also provide a taxonomy of biomedical areas that are likely to benefit most from algorithmic innovation. Representation learning techniques are becoming essential for identifying causal variants underlying complex traits, disentangling behaviors of single cells and their impact on health, and diagnosing and treating diseases with safe and effective medicines. arXiv:2104.04883v1 [cs.LG] 11 Apr 2021 1 Introduction Networks, or graphs, are pervasive in biology and medicine, from molecular interaction maps to dependen- cies between diseases in a person, all the way to populations encompassing social and health interactions.