Computing and Informatics, Vol. 32, 2013, 145{174 NOA: AN INFORMATION RETRIEVAL BASED MALWARE DETECTION SYSTEM Igor Santos, Xabier Ugarte-Pedrero Felix Brezo, Pablo G. Bringas S3Lab, DeustoTech { Computing Deusto Institute of Technology, University of Deusto Avenida de las Universidades 24, 48007, Bilbao, Spain e-mail: fisantos, xabier.ugarte, felix.brezo,
[email protected] Jos´eMar´ıa Gomez-Hidalgo´ Optenet, Madrid, Spain e-mail:
[email protected] Communicated by Deepak Gang Abstract. Malware refers to any type of code written with the intention of harming a computer or network. The quantity of malware being produced is increasing every year and poses a serious global security threat. Hence, malware detection is a cri- tical topic in computer security. Signature-based detection is the most widespread method used in commercial antivirus solutions. However, signature-based detec- tion can detect malware only once the malicious executable has caused damage and has been conveniently registered and documented. Therefore, the signature-based method fails to detect obfuscated malware variants. In this paper, a new malware detection system is proposed based on information retrieval. For the representa- tion of executables, the frequency of the appearance of opcode sequences is used. Through this architecture a malware detection system prototype is developed and evaluated in terms of performance, malware variant recall (false negative ratio) and false positives. Keywords: Malware detection, computer security, information retrieval, static analysis Mathematics Subject Classification 2010: 68-00, 68T30, 68U3 146 I. Santos, X. Ugarte-Pedrero, F. Brezo, P. G. Bringas, J. M. G´omez-Hidalgo 1 INTRODUCTION Malware (or malicious software) is defined as computer software that has been ex- plicitly designed to harm computers or networks.