electronics Article Cross-Compiler Bipartite Vulnerability Search Paul Black * and Iqbal Gondal Internet Commerce Security Laboratory (ICSL), Federation University, Ballarat 3353, Australia;
[email protected] * Correspondence:
[email protected] Abstract: Open-source libraries are widely used in software development, and the functions from these libraries may contain security vulnerabilities that can provide gateways for attackers. This paper provides a function similarity technique to identify vulnerable functions in compiled programs and proposes a new technique called Cross-Compiler Bipartite Vulnerability Search (CCBVS). CCBVS uses a novel training process, and bipartite matching to filter SVM model false positives to improve the quality of similar function identification. This research uses debug symbols in programs compiled from open-source software products to generate the ground truth. This automatic extraction of ground truth allows experimentation with a wide range of programs. The results presented in the paper show that an SVM model trained on a wide variety of programs compiled for Windows and Linux, x86 and Intel 64 architectures can be used to predict function similarity and that the use of bipartite matching substantially improves the function similarity matching performance. Keywords: malware similarity; function similarity; binary similarity; machine-learning; bipar- tite matching 1. Introduction Citation: Black, P.; Gondal, I. Cross-Compiler Bipartite Function similarity techniques are used in the following activities, the triage of mal- Vulnerability Search. Electronics 2021, ware [1], analysis of program patches [2], identification of library functions [3], analysis of 10, 1356. https://doi.org/10.3390/ code authorship [4], the identification of similar function pairs to reduce manual analysis electronics10111356 workload, [5], plagiarism analysis [6], and for vulnerable function identification [7–9].