Submitted to Challenges. Pages 1 - 21. OPEN ACCESS challenges ISSN 2078-1547 www.mdpi.com/journal/challenges Communication An Evolutionary Optimizer of libsvm Models Dragos Horvath 1;*, J.B. Brown 2, Gilles Marcou 1 and Alexandre Varnek 1 1 Laboratoire de Chémoinformatique, UMR 7140 CNRS – Univ. Strasbourg, 1, rue Blaise Pascal, 6700 Strasbourg, France 2 Kyoto University Graduate School of Pharmaceutical Sciences, Department of Systems Bioscience for Drug Discovery, 606-8501, Kyoto, Japan * Author to whom correspondence should be addressed;
[email protected] Version August 7, 2014 submitted to Challenges. Typeset by LATEX using class file mdpi.cls 1 Abstract: This User Guide describes the rationale behind, and the modus operandi of a Unix 2 script-driven package for evolutionary searching of optimal libsvm operational parameters, 3 leading to Support Vector Machine models of maximal predictive power and robustness. 4 Unlike common libsvm parameterizing engines, the current distributions includes the key 5 choice of best-suited sets of attributes/descriptors, next to the classical libsvm operational 6 parameters (kernel choice, cost, etc.), allowing a unified search in an enlarged problem 7 space. It relies on an aggressive, repeated cross-validation scheme to ensure a rigorous 8 assessment of model quality. Primarily designed for chemoinformatics applications, it also 9 supports the inclusion of decoy instances – for which the explained property (bioactivity) is, 10 strictly speaking, unknown, but presumably "inactive". The package supports four different 11 parallel deployment schemes of the Genetic Algorithm-driven exploration of the operational 12 parameter space, for workstations and computer clusters. 13 Keywords: chemoinformatics; QSAR; machine learning; libsvm parameter optimization 14 1.