sensors Article Efficient Feature Selection for Static Analysis Vulnerability Prediction Katarzyna Filus 1 , Paweł Boryszko 1 , Joanna Doma ´nska 1,* , Miltiadis Siavvas 2 and Erol Gelenbe 1 1 Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Baltycka 5, 44-100 Gliwice, Poland; kfi
[email protected] (K.F.);
[email protected] (P.B.);
[email protected] (E.G.) 2 Information Technologies Institute, Centre for Research & Technology Hellas, 6th km Harilaou-Thermi, 57001 Thessaloniki, Greece;
[email protected] * Correspondence:
[email protected] Abstract: Common software vulnerabilities can result in severe security breaches, financial losses, and reputation deterioration and require research effort to improve software security. The acceleration of the software production cycle, limited testing resources, and the lack of security expertise among programmers require the identification of efficient software vulnerability predictors to highlight the system components on which testing should be focused. Although static code analyzers are often used to improve software quality together with machine learning and data mining for software vulnerability prediction, the work regarding the selection and evaluation of different types of relevant vulnerability features is still limited. Thus, in this paper, we examine features generated by SonarQube and CCCC tools, to identify those that can be used for software vulnerability prediction. We investigate the suitability of thirty-three different features to train thirteen distinct machine learning algorithms to design vulnerability predictors and identify the most relevant features that should be used for training. Our evaluation is based on a comprehensive feature selection process based on the correlation analysis of the features, together with four well-known feature selection techniques.