Combined Machine Learning and Molecular Modelling Workflow for the Recognition of Potentially Novel Fungicides
Total Page:16
File Type:pdf, Size:1020Kb
molecules Article Combined Machine Learning and Molecular Modelling Workflow for the Recognition of Potentially Novel Fungicides Ozren Jovi´c* and Tomislav Šmuc Ruđer Boškovi´cInstitute, Bijeniˇckacesta 54, 10 000 Zagreb, Croatia; [email protected] * Correspondence: [email protected]; Tel.: +385-1-4561-085 Academic Editor: Maria Cristina De Rosa Received: 20 March 2020; Accepted: 6 May 2020; Published: 8 May 2020 Abstract: Novel machine learning and molecular modelling filtering procedures for drug repurposing have been carried out for the recognition of the novel fungicide targets of Cyp51 and Erg2. Classification and regression approaches on molecular descriptors have been performed using stepwise multilinear regression (FS-MLR), uninformative-variable elimination partial-least square regression, and a non-linear method called Forward Stepwise Limited Correlation Random Forest (FS-LM-RF). Altogether, 112 prediction models from two different approaches have been built for the descriptor recognition of fungicide hit compounds. Aiming at the fungal targets of sterol biosynthesis in membranes, antifungal hit compounds have been selected for docking experiments from the Drugbank database using the Autodock4 molecular docking program. The results were verified by Gold Protein-Ligand Docking Software. The best-docked conformation, for each high-scored ligand considered, was submitted to quantum mechanics/molecular mechanics (QM/MM) gradient optimization with final single point calculations taking into account both the basis set superposition error and thermal corrections (with frequency calculations). Finally, seven Drugbank lead compounds were selected based on their high QM/MM scores for the Cyp51 target, and three were selected for the Erg2 target. These lead compounds could be recommended for further in vitro studies. Keywords: classification; regression; docking; drug repurposing; QM/MM; Fe-N(R)C angle 1. Introduction The investigations of novel antifungal compounds and compounds having a synergistic effect on antifungals have been increasing in recent decades [1–4]. Approximately 215 fungicides have been sorted based on their mode of action (MOA) in the biochemical pathways of plant fungal pathogens in the Fungicide Resistance Action Committee (FRAC) MOA Code List 2019 [1]. For antifungal pesticides, the FRAC grouping is made according to metabolic processes such as respiration and sterol biosynthesis in membranes, with an appropriate intrinsic resistance risk assessment for each corresponding fungicide group [1]. So far, 377 fungicides have been approved for use [2], and only some of them have already been classified according to their MOA. The article of Alejandro Speck-Planche et al. [3] regards the correct classification of fungicides and inactive compounds while taking into account the different resistance risk categories of the corresponding fungicides. To discriminate between fungicides and non-fungicides besides using a fungicide training set, the authors defined and used a set of inactive compounds or non-fungicides [3]. However, since then (year 2011) some compounds previously declared to be non-antifungals have been studied and occasionally possess some antifungal activities. For example, olanzapine was designated as non-fungicide [3] but was determined later and very recently to be an anti-Cryptococcus drug [5]. Another example is verapamil [3], which later had an inhibitory effect on the Candida albicans species [6]. Molecules 2020, 25, 2198; doi:10.3390/molecules25092198 www.mdpi.com/journal/molecules Molecules 2020, 25, 2198 2 of 24 Some other compounds such as Rifampin and Nifedipine, possess a synergistic antifungal effect when combined with some already-established anti-fungal agents [7,8]. Among the 158 used non-fungicides in [3], 27 compounds have been found to possess or might possess some anti-fungal properties (Supplementary Table S1). This might open the door to the question as to what it means to have “a set of non-fungicide compounds”. What is certain is that more and more inactive compounds have been revealed as active compounds toward different species of yeast and/or at least possess a synergistic antifungal effect when combined with already-established fungicides through drug repurposing. Another study of Alejandro Speck-Planche et al. [9] concerns the first multi-species cheminformatics approach for the classification of agricultural fungicide into toxic or nontoxic. That study regards the successful simultaneous assessment of multiple ecotoxicological profiles of agrochemical fungicides or pairs of fungicide-indicator species, of which 81 were fungicides and 20 indicator species [9]. Due to many compounds that have been repurposed very recently as antifungals, in our opinion what is still lacking in the literature is a Drugbank-scaled in silico repurposing study concerning the recognition of novel antifungal agents. This study should establish models based on fungicides0 substructural descriptors that both classifies fungicides into modes of action and also uses these classification models for extrapolation to a large compound data set such as the Drugbank database. This approach still has not been carried out yet to the best of our knowledge. In other words, this research, using machine learning, is primarily focused on the strategy of identifying (i.e., recognizing) already-known chemical compounds as potential novel antifungal agents that haven0t yet been recognized as such. To do so, in the first part (1) of the study, Drugbank database will be filtered and only compounds specifically similar to fungicides will be further considered as potential hit compounds; while in the second part (2) of the research, all these preselected hit compounds from the Drugbank database will be submitted to extensive docking studies. As a final filtering and confirmation step, we will select only those hits that obtain high enough scores in docking simulations with very specific protein targets. In this drug repurposing study, we limit our research on finding novel fungicides to a specific fungicide group called inhibitors of sterol biosynthesis, which is the most abundant MOA group “G”—sterol biosynthesis in membranes [1,10]. The most common target protein of that MOA group is known as lanosterol 14-alpha demethylase Cyp51, and the second most important is Erg2 [1,10]. An antifungal compound binds to a specific protein and prevents sterol biosynthesis, which leads to fungal death. Some of the known inhibitors of Cyp51, the target which catalyzes the demethylation of lanosterol to ergosterol, are fluconazole, ketoconazole, simeconazole, and bromuconazole; but the strongest inhibitors reported to date are posaconazole and oteseconazole [11]. Specific chemical functional groups attributed to this G MOA are mostly triazoles and imidazoles, but there are also tetrazoles, pyrimidines, pyridines, and piperazines for Cyp51 inhibitors [10], and morpholines, piperidines, and spiroketalamines for sterol 8,7-isomerase inhibitors [10]. Regarding sterol 8,7-isomerase inhibitors, the already-established fungicides are: aldimorph, dodemorph, fenpropimorph, fenpropidin, piperalin, spiroxamine, and tridemorph [10]. However, regarding Cyp51 inhibitors, there are 36 fungicides in the FRAC code list [10], plus some other fungicides mostly in the triazole or imidazole functional groups [11]. Taking into account some additional fungicides with known (or at least likely) MOAs, an MOA fungicide set which contains 245 compounds is established in this work as an MOA working set (in the following text “MOAW set”; see MOAW set in Supplementary Table S2). In this research, we rely on such a MOAW set because it contains as much sterol biosynthesis inhibitors as possible and also covers quantitatively enough fungicides classified into different fungicide class groups, although there might be big differences in their activities [1]. The possible objection that the FRAC code list deals only with plant antifungals is not a hurdle in this study, because we are not trying to expend the FRAC code list itself, and there are no antifungals from the other FRAC groups reported to date to inhibit either Cyp51 or Erg2 (except the point that group “K” is generally considered to be fungicides with “multi-site activity”, but contains different chemical functional groups than group G). In addition, some FRAC fungicides (e.g., prothioconazole) are already reported to bind to “non-plant” Molecules 2020, 25, 2198 3 of 24 pathogens such as Candida albicans [12]. The actual goal is to search for the new fungal protein target inhibitors while repurposing the Drugbank compound set; and FRAC code lists essentially serves as the initial point for creating and testing a discriminant model for finding new hit compounds of the Cyp51 target and Erg2 protein target when using molecular descriptors in the prefiltering procedure, i.e., the first part (1) of the study. To do so, in the first part of the study (1), the filtering of Drugbank is not easy without a well-established large dataset of non-antifungal compounds. Since there is no such dataset of non-antifungals sufficiently large for Drugbank diversity, we decided to construct the filtering design with two different approaches. In the first approach (I), a MOAW set was enlarged with a non-fungicide set from [3] and reduced for the compounds that were shown to exhibit fungicide activity. This set was used to train a number of different machine learning models. The filtering was carried out under the following criteria: (a) selected hit compounds have