An Active Learning Framework for Efficient Acquisition and Detection of Unknown Malware

Total Page:16

File Type:pdf, Size:1020Kb

An Active Learning Framework for Efficient Acquisition and Detection of Unknown Malware An Active Learning Framework for Efficient Acquisition and Detection of Unknown Malware Thesis submitted in partial fulfillment of the requirements for the degree of “DOCTOR OF PHILOSOPHY” By Nir Nissim Submitted to the Senate of Ben-Gurion University of the Negev 31.12.2015 Beer-Sheva - 1 - - 2 - An Active Learning Framework for Efficient Acquisition and Detection of Unknown Malware Thesis submitted in partial fulfillment of the requirements for the degree of “DOCTOR OF PHILOSOPHY” By Nir Nissim Submitted to the Senate of Ben-Gurion University of the Negev Approved by the advisor: _____________________________ Approved by the Dean of the Kreitman School of Advanced Graduate Studies: _________________ 31.12.2015 Beer-Sheva - 3 - - 4 - This work was carried out under the supervision of Prof. Yuval Elovici at the Department of Information Systems Engineering Faculty of Engineering Sciences, Ben-Gurion University of the Negev. - 5 - Research-Student's Affidavit when Submitting the Doctoral Thesis for Judgment I Mr. Nir Nissim, whose signature appears below, hereby declare that I have written this Thesis by myself, except for the help and guidance offered by my Thesis Advisors. The scientific materials included in this Thesis are products of my own research, culled from the period during which I was a research student. Date: 31.12.2015 Student’s Name: Nir Nissim Signature: __________________ - 6 - Acknowledgements First and foremost, I want to thank God for providing me with the capabilities, wisdom, and blessing of success, during these important years of research and for surrounding me with an outstanding group of colleagues and researchers who were helpful in this research. I would also like to thank my advisor, Prof. Yuval Elovici, for his support, guidance, and the opportunities provided to me, all of which have made these years of research extremely productive and challenging. Thanks also to the National Cyber Bureau of the Israeli Ministry of Science, Technology and Space who partially supported my research. I also wish to thank Clint Feher, Oren Barad, and Aviad Cohen who assisted in the collection and creation of the datasets and Yuval Fledel for his valuable advice regarding the efficient implementation aspects of my research. I would like also to thank Prof. Yuval Shahar for the meaningful discussions we shared and also for his expertise and support in the expansion of this research to additional directions in the biomedical domain. Special thanks both to Dr. Robert Moskovitch and Prof. Lior Rokach for their assistance and helpful advice during the course of my research. Thanks also to Ms. Yehudith Naftalovitch, the administrative and operational manager of our Cyber Security Research Center, who assisted and helped with many administrative matters during these years of research, providing valuable assistance that allowed me to better focus on the research itself. I would like to thank also to Ms. Robin Levy-Stevenson for her devoted assistance, providing much appreciated English editing and proofreading during my Ph.D. studies which helped make my publications more comprehensive and clear. And last but not least, thanks to my dear parents and my special grandparents who supported me in every way they possibly could, ensuring that I would always have the passion, and everything else I would need, to succeed. - 7 - - 8 - Abstract The sheer volume of new malware created every day poses a significant challenge to existing detection solutions. This malware is aimed at compromising nearly every kind of widely used digital device, threatening individuals as well as organizations. Popular types of malware take different forms including computer worms, malicious PC executables, malicious documents (non- executables), and malicious applications aimed at mobile devices. Widely used antivirus software, which is based on manually crafted signatures, is only capable of identifying known malware and their relatively similar variants. To identify new and unknown malwares and keep their antivirus signature repository up-to-date, antivirus vendors must collect new suspicious files on a daily basis for manual analysis by information security experts who label the files as malware or benign. Analyzing suspected files is a time-consuming task, and it is impossible to manually analyze all questionable files. Consequently, antivirus vendors use detection models based on machine learning (ML) algorithms and heuristics in order to reduce the number of suspected files that must be inspected manually. In addition to antivirus software, recent detection solutions have also used machine learning algorithms independently, in order to provide better detection capabilities of new malware, an area in which antivirus software is limited. In light of the mass creation of new files daily, both antivirus and machine learning based detection solutions lack an essential element – they cannot be frequently and efficiently updated with newly created malware – a situation that creates a dangerous time gap between the creation and proliferation of malware and its detection and discovery. This time gap allows new malware to attack many targets before it is identified and thwarted. Therefore, both antivirus and machine learning based solutions must be frequently updated – the antivirus software must be updated with new signatures of malware, and machine learning based solutions require new informative files, both malicious and benign. In this research we introduce a solution for this updatability gap. We present a novel, generic, and efficient active learning (AL) framework and new AL methods that may assist antivirus vendors and machine learning based solutions and may allow them to focus their analytical efforts by acquiring only a small set of new files that are either most likely malicious or informative benign files, a process that enables efficient and frequent enhancement of the knowledge stores of both the detection model and the antivirus software. In addition to intelligent selection of the most contributive files, our framework is also aimed at working under higher rates of granularity in which it can efficiently select only a small number of instances related to the behavior of a specific analyzed file. By doing this, our framework can filter out the misleading and noisy instances of malware’s behavior which is popular among sophisticated and elusive malware and thus improve detection capabilities. Our framework also integrates tailored feature extraction methods for each of the above mentioned types of malware, and these feature extraction methods provide an accurate basis for enhancing the detection capabilities leveraged by our AL methods. - 9 - The main contributions of the study are summarized as follows: first, the experimental results showed that our framework can improve the detection capabilities of antivirus software and machine learning based solutions by frequently and efficiently enhancing the knowledge stores of the detection model and the antivirus software, as our framework outperformed any other existing solution and method. Second, based on the predefined limited number of files acquired daily in our experiments, the existing AL method showed a decrease in the number of new malwares acquired daily, while our AL methods showed an increase and daily improvement in the number of new malwares acquired daily and also acquired more new malwares each day than every other solution. Third, our framework conducts the above mentioned update using only small set of the most informative files (malicious and benign) leading to a significant reduction of security expert labeling efforts associated with manual analysis of the files. Fourth, our framework was also found to be efficient in retrospective acquisition of malware from large stores of files usually found in organizations. Fifth, our framework is able to efficiently improve the detection capabilities by enhancing its robustness by filtering out the presence of misleading malware instances and behavior. Lastly, as a proof of concept for the generality of our AL based framework, we have recently extended the framework's capabilities so it will provide solution in additional domains. We have adapted it to the biomedical informatics domain, in which we successfully enhanced the capabilities of a classification model that is used for condition severity classification while significantly reducing labeling efforts that can result in a substantial savings, both in time and money associated with medical experts. Keywords. Malware, Malicious, Computer Worm, Executable, Android, Document, PDF, Machine Learning, Active Learning, Detection, Acquisition, Antivirus. - 10 - Table of Contents 1. Introduction 1.1. Background and Related Work 1.1.1. Malicious Executables and Computer Worms 1.1.2. Malicious Documents 1.1.3. Malicious Android Applications 1.2. The Problem Statement and Proposed Approach 1.3. Deployment of our Framework 2. Overview of the Core Papers in the Research 2.1. Research Results 2.1.1. Core Papers 3. Summary and Conclusions 4. Future Directions 5. References 6. Appendix 6.1. Additional Accepted Papers in the Malware Detection Domain 6.2. Additional Accepted Papers in the Biomedical Informatics Domain - 11 - 1. Introduction 1.1. Background and Related Work In recent years, the Internet has become an integral part of our lives, particularly with the increased availability of high speed internet connections, cloud computing, and the proliferation of mobile devices which have rapidly
Recommended publications
  • About the Contributors
    336 About the Contributors Burcin Bozkaya is an Associate Professor of Operations Management and the Associate Dean at Sabanci School of Management. Burcin earned his B.S. and M.S. degrees in Industrial Engineering at Bilkent University and his Ph.D. in Manage- ment Science at the School of Business of the University of Alberta. Prior to join- ing Sabanci University in 2004, Burcin worked as a Senior Operations Research Analyst at Environmental Systems Research Institute (ESRI), Inc., in Redlands, California. During this industry work experience, Burcin participated in and led many software development projects specializing in the applications of GIS and Operations Research in transportation and logistics optimization. In one such proj- ect, he received INFORMS Franz Edelman Finalist Award, for a project completed for Schindler Elevator Corporation. Since 2004, Burcin teaches courses on Opera- tions Management, Geographic Information Systems, Customer Relationship Management with Location Intelligence, Quantitative Decision Making and Excel/ VBA Programming. He conducts research on Spatial Analytics, Location Analysis, Transportation Network Planning and Vehicle Routing Optimization, and Combi- national Optimization using Heuristic Algorithms, and has published in various international journals. He is a recipient of IEEE 2008 Best Paper Award, and the 2010 Canadian Operational Research Society’s Best Practice Prize. Burcin is a father of two and enjoys playing his guitar and flute, and traveling in his (not-so- easy-to-find) free time. Vivek Singh is an Assistant Professor at the School of Communication and Information, Rutgers University and a Visiting Assistant Professor at the MIT Media Lab. Vivek obtained his PhD in Information and Computer Science from the University of California, Irvine and spent two years conducting post-doctoral research at the Massachusetts Institute of Technology.
    [Show full text]
  • BGU Data Science Center Application Form
    BGU Data Science Center Application Form A. Data Science Center Members’ List A.1 10 Active Faculty Members in Core Data Science Fields Title, name, department, short description of main research interest(s). No Title First and last name Department Main Research Interest(s) 1 Prof. Chen Avin Communication Systems Social Network Analysis Engineering 2 Prof. Aryeh Kontorovich Computer Science Machine Learning 3 Prof. Michael Elhadad Computer Science Natural Language Processing 4 Prof. Ohad Ben-Shahar Computer Science Computer Vision 5 Dr. Sivan Sabato Computer Science Machine Learning 6 Prof. Boaz Lerner Industrial Engineering and Machine Learning, Deep Management Learning 7 Dr. Jonathan Rosenblatt Industrial Engineering and Statistical Learning Management Theory 8 Prof. Bracha Shapira Software and Information Text Analytics, Systems Engineering Recommender Systems 9 Prof. Lior Rokach Software and Information Machine Learning, Deep Systems Engineering Learning 10 Prof. Mark Last Software and Information Time-Series Analysis, Systems Engineering Natural Language Processing A.2 10 Active Faculty Members in Data Science Related Fields No Title First and last name Department Main Research Interest (s) 1 Bioinformatics and Dr. Tal Shay Biology computational biology 2 Prof. Danny Hendler Computer Science Cyber security 3 Dr. Eitan Rubin Health Sciences Precision medicine 4 Prof. Esti Yeger-Lotem Health Sciences Bioinformatics 5 Prof. Jacob Moran-Gilad Health Sciences Epidemiology 6 Software and Information Systems Prof. Ariel Felner Engineering AI 7 Software and Information Systems Prof. Yuval Elovici Engineering Cyber security 8 Software and Information Systems Biomedical Informatics, Cyber Dr. Robert Moskovitch Engineering Security 9 Software and Information Systems Biomedical Informatics, Cyber Prof. Yuval Shahar Engineering Security 10 Health Sciences and Soroka Dr.
    [Show full text]
  • CURRICULUM VITAE and LIST of PUBLICATIONS May 2019 Personal Details: Name: Shlomi (Shlomo) Dolev
    CURRICULUM VITAE AND LIST OF PUBLICATIONS May 2019 Personal Details: Name: Shlomi (Shlomo) Dolev. Date and place of birth: 5/12/58, Israel. Regular military service: 13/2/77 to 12/8/80 (Cap- tain). Address and telephone number at work: Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel, Tel: 08-6472718, Fax: 08-6477650, Email: [email protected]. Parent of: Noa, Yorai, Hagar and Eden. Short Biography: Shlomi Dolev received his B.Sc. in Engineering and B.A. in Computer Science in 1984 and 1985, and his M.Sc. and D.Sc. in computer Science in 1990 and 1992 from the Technion Israel Institute of Technology. From 1992 to 1995 he was at Texas A&M University as a visiting research specialist. In 1995 he joined the Department of Mathematics and Computer Science at Ben-Gurion University. Shlomi is the founder and the first department head of the Computer Science Department at Ben-Gurion University, established in 2000. After just 15 years, the department has been ranked among the first 150 best departments in the world. He is the author of a book entitled Self-Stabilization published by MIT Press in 2000. His publications, more than three hundred conferences, journals and patents, include papers in JACM, SIAM journal on computing, Nature Photonics, Nature Communications, Physical Review, Journal of the Optical Society of America A, Distributed Computing, IEEE/ACM Trans. on Networking, ACM Trans. on Information and System Security, Journal of Cryptology, Journal of Computer and System Sciences, ACM Trans. on Knowledge Discovery from Data, ACM Trans.
    [Show full text]
  • Sec-Lib: Protecting Scholarly Digital Libraries from Infected Papers Using Active Machine Learning Framework
    Old Dominion University ODU Digital Commons Computer Science Faculty Publications Computer Science 2019 Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework Nir Nissim Aviad Cohen Jian Wu Old Dominion University Andrea Lanzi Lior Rokach See next page for additional authors Follow this and additional works at: https://digitalcommons.odu.edu/computerscience_fac_pubs Part of the Computer Engineering Commons, and the Information Security Commons Original Publication Citation Nissim, N., Cohen, A., Wu, J., Lanzi, A., Rokach, L., Elovici, Y., & Giles, L. (2019). Sec-Lib: Protecting scholarly digital libraries from infected papers using active machine learning framework. IEEE Access, 7, 110050-110073. doi:10.1109/access.2019.2933197 This Article is brought to you for free and open access by the Computer Science at ODU Digital Commons. It has been accepted for inclusion in Computer Science Faculty Publications by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. Authors Nir Nissim, Aviad Cohen, Jian Wu, Andrea Lanzi, Lior Rokach, Yuval Elovici, and Lee Giles This article is available at ODU Digital Commons: https://digitalcommons.odu.edu/computerscience_fac_pubs/141 IEEE Access· Multidi5ciplinary l Rapid Review l OpenAcce5sJournal Received July 8, 2019, accepted July 24, 2019, date of publication August 6, 2019, date of current version August 21, 2019. Digital Object Identifier 10.1109/ACCESS.2019.2933197 Sec-Lib: Protecting Scholarly
    [Show full text]
  • Academic Perspectives on Cyber Security Challenges
    You are cordially invited to attend Yuval Ne'eman Workshop for Science, Technology and Security, the National Cyber Bureau and Blavatnik Interdisciplinary Cyber Research Center (ICRC), Tel Aviv University The 4th Annual International Cybersecurity Conference: Academic Perspectives on Cyber Security Challenges Monday, September 15th, 2014, 10:30 – 13:00, Kes Hamishpat Hall, Trubowicz Building, Tel Aviv University 10:00 - 10:30 Reception & Registration 10:30 - 11:40 Moderator: Dr. Yaniv Harel, Fellow, Blavatnik Interdisciplinary Cyber Research Center, TAU The Global Cyber-Vulnerability Report Prof. V.S Subrahmanian, Professor of Computer Science, University of Maryland and Head of the Center for Digital International Government Implementations of Machine Learning Tools for Detecting Cyber Anomalies Prof. Irad E. Ben-Gal, Head of the Department of Industrial Engineering & Management, TAU Rebuilding Trust in Computing Platforms Dr. Eran Tromer, Senior Lecturer , Blavatnik School of Computer Science, TAU Cyber Threat: Achilles Heel of Space Systems? Dr. Deganit Paikowsky, Senior Researcher, Yuval Ne'eman Workshop for Science, Technology and Security, TAU Ms. Gil Baram , Researcher, Yuval Ne'eman Workshop for Science, Technology and Security, TAU 11:40 - 12:00 Coffee Break 12:00 - 13:00 Air-Gap and Cyber Security Prof. Yuval Elovici, Director, Deutche Telekom Laboratories, Ben-Gurion University of the Negev, Israel Cyber Studies in International Relations: Future Directions and Priorities Dr. Lucas Kello, Research Fellow, Belfer Center for Science and International Affairs, Harvard University Discovering Weaknesses in Virtual Systems Prof. Assaf Schuster, Head of the Technion Center for Computer Engineering & a Professor in the Computer Science Department, Technion, Israel Institute of Technology Cybersecurity Through a Military Revolution Prism Lior Tabansky, Senior Researcher, Yuval Ne'eman Workshop for Science, Technology and Security, TAU Sponsored by: Steven E.
    [Show full text]