IPSJ Transactions on Bioinformatics Vol.11 41–47 (Dec. 2018)
[DOI: 10.2197/ipsjtbio.11.41] Original Paper
Predicting Strategies for Lead Optimization via Learning to Rank
Nobuaki Yasuo1,2,a) Keisuke Watanabe1 Hideto Hara3 Kentaro Rikimaru3 Masakazu Sekijima1,4,b)
Received: August 21, 2018, Accepted: September 18, 2018
Abstract: Lead optimization is an essential step in drug discovery in which the chemical structures of compounds are modified to improve characteristics such as binding affinity, target selectivity, physicochemical properties, and tox- icity. We present a concept for a computational compound optimization system that outputs optimized compounds from hit compounds by using previous lead optimization data from a pharmaceutical company. In this study, to predict the drug-likeness of compounds in the evaluation function of this system, we evaluated and compared the ability to correctly predict lead optimization strategies through learning to rank methods.
Keywords: lead optimization, learning to rank, computer-aided drug design, machine learning
computer-aided drug discovery (CADD), which has been utilized 1. Introduction since the 1960s, are also leading current drug discovery. The During drug discovery, enormous attempts are being made to methods of CADD can be combined with various biological data identify better drug candidates. Since the cost of drug discovery including genomic sequence, protein tertiary structure, and chem- has been drastically increased, recently the process of drug dis- ical structure, and can be utilized in various steps in drug discov- covery typically takes 12–14 years [1] and costs approximately ery: target identification, compound screening, and ADME (ab- 2.6 billion USD [2]. The process of drug discovery is sometimes sorption, distribution, metabolism, excretion, toxicity) properties likened to looking for a needle in a haystack; it is the process prediction [9], [10], [11]. To this end, methods in CADD such as of finding out suitable compounds from vast “chemical space.” virtual screening, have been widely applied in drug discovery to First, compounds are screened on the basis of their binding affin- reduce experimental costs [12], [13]. It is expected that CADD ity to a target protein to obtain hit compounds. Then, in hit-to- reduces the cost of drug development by 50% [14]. lead and lead optimization steps, these hits are optimized to ob- Nearly all of the cost of lead optimization originates from the tain drug candidates. Subsequently, the optimized compounds are synthesis of many compounds in an effort to explore the entire designated for preclinical and clinical testing. Compounds that chemical space, but this exploration typically results in only a pass these tests are finally approved as drugs. Lead optimization, few, or if any, potential candidates. A discovery strategy that in which the chemical structures of lead compounds are modi- minimizes the number of compounds synthesized would greatly fied to obtain with improved properties, is an essential step in improve the efficiency of candidate development, since 17% of drug discovery [3], [4]. Properties such as binding affinity, se- total drug discovery cost were invested for lead optimization [1]. lectivity, physicochemical and ADMET (absorption, distribution, However, researches on lead optimization are limited since prac- metabolism, excretion, toxicity) properties are optimized in the tical data of lead optimization have not been published from phar- hit-to-lead and lead optimization steps [5], [6] (Fig. 1). maceutical companies. In order to reduce the cost of these processes, diverse ap- The ultimate research objective in this study was to develop proaches have been developed. Combinatorial chemistry and an in silico compound optimization system to produce optimized high-throughput screening are the key technologies to acceler- compounds from hit compounds (Fig. 2). In this system, two ate the drug discovery from experimental biology [7], [8], while modules are iteratively applied. The first module focuses on the exploration of candidate compounds, and the second evalu- 1 Department of Computer Science, Tokyo Institute of Technology, Yoko- ates the identified candidates. The exploration module is based hama, Kanagawa 226–8503, Japan 2 Research Fellow of Japan Society for the Promotion of Science DC1, on virtual modification of compounds by using matched molecu- Yokohama, Kanagawa 226–8503, Japan lar pairs (MMPs) or chemical reaction-based method. An MMP 3 Shonan Research Center, Takeda Pharmaceutical Company Limited, is a pair of compounds that differing in only in one part of Fujisawa, Kanagawa 251–0012, Japan 4 Advanced Computational Drug Discovery Unit, Tokyo Institute of Tech- their chemical structure [16], and MMPs have previously been nology, Yokohama, Kanagawa 226–8503, Japan used for ADME prediction [17] and compound optimization [18]. a) [email protected] Chemical reaction-based method simulates virtual chemical re- b) [email protected]