Sec-Lib: Protecting Scholarly Digital Libraries from Infected Papers Using Active Machine Learning Framework

Total Page:16

File Type:pdf, Size:1020Kb

Sec-Lib: Protecting Scholarly Digital Libraries from Infected Papers Using Active Machine Learning Framework Old Dominion University ODU Digital Commons Computer Science Faculty Publications Computer Science 2019 Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework Nir Nissim Aviad Cohen Jian Wu Old Dominion University Andrea Lanzi Lior Rokach See next page for additional authors Follow this and additional works at: https://digitalcommons.odu.edu/computerscience_fac_pubs Part of the Computer Engineering Commons, and the Information Security Commons Original Publication Citation Nissim, N., Cohen, A., Wu, J., Lanzi, A., Rokach, L., Elovici, Y., & Giles, L. (2019). Sec-Lib: Protecting scholarly digital libraries from infected papers using active machine learning framework. IEEE Access, 7, 110050-110073. doi:10.1109/access.2019.2933197 This Article is brought to you for free and open access by the Computer Science at ODU Digital Commons. It has been accepted for inclusion in Computer Science Faculty Publications by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. Authors Nir Nissim, Aviad Cohen, Jian Wu, Andrea Lanzi, Lior Rokach, Yuval Elovici, and Lee Giles This article is available at ODU Digital Commons: https://digitalcommons.odu.edu/computerscience_fac_pubs/141 IEEE Access· Multidi5ciplinary l Rapid Review l OpenAcce5sJournal Received July 8, 2019, accepted July 24, 2019, date of publication August 6, 2019, date of current version August 21, 2019. Digital Object Identifier 10.1109/ACCESS.2019.2933197 Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework NIR NISSIM 1,2, AVIAD COHEN1,3, JIAN WU4, ANDREA LANZI5, LIOR ROKACH1,3, YUVAL ELOVICI1,3, AND LEE GILES6 1Malware Lab, Cyber Security Research Center (CSRC), Ben-Gurion University, Beersheba 84105, Israel 2Department of Industrial Engineering and Management, Ben-Gurion University, Beersheba 84105, Israel 3Department of Software and Information Systems Engineering, Ben-Gurion University, Beersheba 84105, Israel 4Computer Science Department, Old Dominion University, Norfolk, VA 23529, USA 5Computer Science Department, University of Milan, 20122 Milan, Italy 6Computer Science and Engineering Department, Pennsylvania State University, State College, PA 16801, USA Corresponding author: Nir Nissim ([email protected]) • • ABSTRACT Researchers from academia and the corporate-sector rely on scholarly digital libraries to access articles. Attackers take advantage of innocent users who consider the articles' files safe and thus open PDF-files with little concern. In addition, researchers consider scholarly libraries a reliable, trusted, and untainted corpus of papers. For these reasons, scholarly digital libraries are an attractive-target and inadvertently support the proliferation of cyber-attacks launched via malicious PDF-files. In this study, we present related vulnerabilities and malware distribution approaches that exploit the vulnerabilities of scholarly digital libraries. We evaluated over two-million scholarly papers in the CiteSeerX library and found the library to be contaminated with a surprisingly large number (0.3-2%) of malicious PDF documents (over 55% were crawled from the IPs of US-universities). We developed a two layered detection framework aimed at enhancing the detection of malicious PDF documents, Sec-Lib, which offers a security solution for large digital libraries. Sec-Lib includes a deterministic layer for detecting known malware, and a machine learning based layer for detecting unknown malware. Our evaluation showed that scholarly digital libraries can detect 96.9% of malware with Sec-Lib, while minimizing the number of PDF-files requiring labeling, and thus reducing the manual inspection efforts of security-experts by 98%. • • INDEX TERMS Scholarly, digital, library, paper, PDF documents, malware, malicious documents, distri- bution. I. INTRODUCTION Researchers also publish their research on their home The number of scholarly documents (English language) pages to increase exposure, reach researchers around accessible on the Web is enormous, estimated at over 114 mil- the world, and gain citations and recognition for their lion PDF documents [5], of which over 27 million (∼24%) work [6], [7]. In order to assist researchers, many scholarly can be easily accessed without payment or subscription [5]; digital libraries and search engines collect and index the since then, the estimated number of scholarly documents author's version. Thus, the papers can be easily downloaded on the Web raised significantly. These documents are freely worldwide. This free collection of scholarly documents is a available in part because researchers publish draft versions of valuable resource for most researchers and academics who their papers on their professional home pages (often within may not have a comprehensive subscription to all publishers' the domains of universities), before the final versions are content. published by the publishers. Figure 1 presents a snapshot of search results for a searched paper using Google Scholar. At the bottom of the The associate editor coordinating the review of this manuscript and page, one can access all 15 versions of the paper, already approving it for publication was Luis Javier Garcia Villalba. indexed by Google Scholar, simply by clicking on the blue 110050 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ VOLUME 7, 2019 N. Nissim et al.: Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework IEEEAccess· 2 [POFJ Detection of malicious pdf fi les based on hierarch ica l document structure even scan them to detect malicious content. In addition, N Srndic, P Laskov - Proceedings of the 20th Annual Network & ...• 2013 - Citeseer Ma licious PO F fi les remai n a rea l threat, in practice, to masses of computer users, even after their reputation as sources of trusted scholarly documents several high-profi le secu rity incidents. In spite of a series of a securi ty patch es issued by A dobe and other vendors, many users still have vuln erable client software inslalled on their makes digital libraries an attractive platform from which computers. The expressiveness of the PDF form at, furthermore, enables attackers to evade to take advantage of and distribute malicious PDF docu- detection w ith little effort. Apart from tra dition al an ti virus products, which are always a step behind atta ckers, few methods are known that can be deploy ed fo r protection of end-user __ ments. Attackers are aware of this chain of trust and use * £1£1 Cited by 102 Related articles All 8 versions t>t> social engineering techniques in which they take advantage FIGURE 1. Google Scholar's search results for a given academic paper, of the heavy use and blind trust of researchers in schol- including 14 additional versions of the paper. arly digital libraries and the papers (PDF documents) they download from them; once one researcher within an orga- IPDFJ Detection of malicious pdf fi les based on hierarchical document structure [PDF] psu.edu N...furulic. ~ • Citeseer nization is infected, it can quickly become a major cyber Malicious ?DF mes rema in a real threa t, in pl'actice, to masses of compu!er users, even after several higl-pl'ofile se<Uity incidents. lnspite of a series of a security pa!ches issued by security incident for the entire organization's computational Adobeandotherv11ndors, manyus11rsstilhav11vuloerab!edientsoftwareinstalledonlheir * IJIJ C~ed hy 102 Related artides W system [32]. Researchers' Web pages have become a target IPOFJ Detection of Malicious PDF Files Based on Hierarchical Document Structure [PDF] psu.edu 3 N Smdic, P Laskov - Citeseer that can be used to launch attacks. In addition, researchers, Malicious ?DF mes rema in a real threa t, in pl'actice, to masses of compu!er users, even after severa l higl-pl'ofile se<Uity inciden1s In spite of a seriesofasecuritypa!chesissuedby Adobeandothervendor$, manyusersstil havevulnerablecfientsoflwareinstalledonlheir .. professors, and research students are naturally attractive can- " didates for attack, because, due to the nature of their work, IPDFJ Detection of Malicious PDF Files Based on Hierarchical Document Structure [PDF] uni-tuebingen.de NSmdic, Plaskov • cogsys.cs.uni-tuebingen.de they have access to confidential and sensitive information, Malicious ?DF files rema in a real threat, in pl'actice, to masses of compu!er user$, even after severa l higl-pl'ofile se<Uityincidents. lnspiteofa series of a securitypa!chesissued by Adobeandothervendors, manyusersstiff havevuloerab!edientsoftwareinstatledonlheir such as nuclear knowledge, medical records, aviation, and " educational records and materials (e.g., student data, exams, IPDFJ Detection of Malicious PDF Files Based on Hierarchical Document Structure [PDF] semanticscholar.org NSmdic, P laskov - pdfa.semanticschola..org Malicious PDF mes rema in a real threa t, in prnclice, to masses of compu!er users , even after etc.). Moreover, some researchers collaborate with govern- severa l higl•l)l'ofilesecurity incidents. lnspiteofaseriesofasecuritypa!chesissuedby Adobeandothervendors, manyusersstilhavevulnerab!edientsoflwareinstalledonlheir mental agencies and industry, which allows them access " [PDFJ Detection of Malicious PDF Files Based on Hierarchical Document Structure [PDF] internetsociety.org to national and confidential information
Recommended publications
  • About the Contributors
    336 About the Contributors Burcin Bozkaya is an Associate Professor of Operations Management and the Associate Dean at Sabanci School of Management. Burcin earned his B.S. and M.S. degrees in Industrial Engineering at Bilkent University and his Ph.D. in Manage- ment Science at the School of Business of the University of Alberta. Prior to join- ing Sabanci University in 2004, Burcin worked as a Senior Operations Research Analyst at Environmental Systems Research Institute (ESRI), Inc., in Redlands, California. During this industry work experience, Burcin participated in and led many software development projects specializing in the applications of GIS and Operations Research in transportation and logistics optimization. In one such proj- ect, he received INFORMS Franz Edelman Finalist Award, for a project completed for Schindler Elevator Corporation. Since 2004, Burcin teaches courses on Opera- tions Management, Geographic Information Systems, Customer Relationship Management with Location Intelligence, Quantitative Decision Making and Excel/ VBA Programming. He conducts research on Spatial Analytics, Location Analysis, Transportation Network Planning and Vehicle Routing Optimization, and Combi- national Optimization using Heuristic Algorithms, and has published in various international journals. He is a recipient of IEEE 2008 Best Paper Award, and the 2010 Canadian Operational Research Society’s Best Practice Prize. Burcin is a father of two and enjoys playing his guitar and flute, and traveling in his (not-so- easy-to-find) free time. Vivek Singh is an Assistant Professor at the School of Communication and Information, Rutgers University and a Visiting Assistant Professor at the MIT Media Lab. Vivek obtained his PhD in Information and Computer Science from the University of California, Irvine and spent two years conducting post-doctoral research at the Massachusetts Institute of Technology.
    [Show full text]
  • BGU Data Science Center Application Form
    BGU Data Science Center Application Form A. Data Science Center Members’ List A.1 10 Active Faculty Members in Core Data Science Fields Title, name, department, short description of main research interest(s). No Title First and last name Department Main Research Interest(s) 1 Prof. Chen Avin Communication Systems Social Network Analysis Engineering 2 Prof. Aryeh Kontorovich Computer Science Machine Learning 3 Prof. Michael Elhadad Computer Science Natural Language Processing 4 Prof. Ohad Ben-Shahar Computer Science Computer Vision 5 Dr. Sivan Sabato Computer Science Machine Learning 6 Prof. Boaz Lerner Industrial Engineering and Machine Learning, Deep Management Learning 7 Dr. Jonathan Rosenblatt Industrial Engineering and Statistical Learning Management Theory 8 Prof. Bracha Shapira Software and Information Text Analytics, Systems Engineering Recommender Systems 9 Prof. Lior Rokach Software and Information Machine Learning, Deep Systems Engineering Learning 10 Prof. Mark Last Software and Information Time-Series Analysis, Systems Engineering Natural Language Processing A.2 10 Active Faculty Members in Data Science Related Fields No Title First and last name Department Main Research Interest (s) 1 Bioinformatics and Dr. Tal Shay Biology computational biology 2 Prof. Danny Hendler Computer Science Cyber security 3 Dr. Eitan Rubin Health Sciences Precision medicine 4 Prof. Esti Yeger-Lotem Health Sciences Bioinformatics 5 Prof. Jacob Moran-Gilad Health Sciences Epidemiology 6 Software and Information Systems Prof. Ariel Felner Engineering AI 7 Software and Information Systems Prof. Yuval Elovici Engineering Cyber security 8 Software and Information Systems Biomedical Informatics, Cyber Dr. Robert Moskovitch Engineering Security 9 Software and Information Systems Biomedical Informatics, Cyber Prof. Yuval Shahar Engineering Security 10 Health Sciences and Soroka Dr.
    [Show full text]
  • CURRICULUM VITAE and LIST of PUBLICATIONS May 2019 Personal Details: Name: Shlomi (Shlomo) Dolev
    CURRICULUM VITAE AND LIST OF PUBLICATIONS May 2019 Personal Details: Name: Shlomi (Shlomo) Dolev. Date and place of birth: 5/12/58, Israel. Regular military service: 13/2/77 to 12/8/80 (Cap- tain). Address and telephone number at work: Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel, Tel: 08-6472718, Fax: 08-6477650, Email: [email protected]. Parent of: Noa, Yorai, Hagar and Eden. Short Biography: Shlomi Dolev received his B.Sc. in Engineering and B.A. in Computer Science in 1984 and 1985, and his M.Sc. and D.Sc. in computer Science in 1990 and 1992 from the Technion Israel Institute of Technology. From 1992 to 1995 he was at Texas A&M University as a visiting research specialist. In 1995 he joined the Department of Mathematics and Computer Science at Ben-Gurion University. Shlomi is the founder and the first department head of the Computer Science Department at Ben-Gurion University, established in 2000. After just 15 years, the department has been ranked among the first 150 best departments in the world. He is the author of a book entitled Self-Stabilization published by MIT Press in 2000. His publications, more than three hundred conferences, journals and patents, include papers in JACM, SIAM journal on computing, Nature Photonics, Nature Communications, Physical Review, Journal of the Optical Society of America A, Distributed Computing, IEEE/ACM Trans. on Networking, ACM Trans. on Information and System Security, Journal of Cryptology, Journal of Computer and System Sciences, ACM Trans. on Knowledge Discovery from Data, ACM Trans.
    [Show full text]
  • An Active Learning Framework for Efficient Acquisition and Detection of Unknown Malware
    An Active Learning Framework for Efficient Acquisition and Detection of Unknown Malware Thesis submitted in partial fulfillment of the requirements for the degree of “DOCTOR OF PHILOSOPHY” By Nir Nissim Submitted to the Senate of Ben-Gurion University of the Negev 31.12.2015 Beer-Sheva - 1 - - 2 - An Active Learning Framework for Efficient Acquisition and Detection of Unknown Malware Thesis submitted in partial fulfillment of the requirements for the degree of “DOCTOR OF PHILOSOPHY” By Nir Nissim Submitted to the Senate of Ben-Gurion University of the Negev Approved by the advisor: _____________________________ Approved by the Dean of the Kreitman School of Advanced Graduate Studies: _________________ 31.12.2015 Beer-Sheva - 3 - - 4 - This work was carried out under the supervision of Prof. Yuval Elovici at the Department of Information Systems Engineering Faculty of Engineering Sciences, Ben-Gurion University of the Negev. - 5 - Research-Student's Affidavit when Submitting the Doctoral Thesis for Judgment I Mr. Nir Nissim, whose signature appears below, hereby declare that I have written this Thesis by myself, except for the help and guidance offered by my Thesis Advisors. The scientific materials included in this Thesis are products of my own research, culled from the period during which I was a research student. Date: 31.12.2015 Student’s Name: Nir Nissim Signature: __________________ - 6 - Acknowledgements First and foremost, I want to thank God for providing me with the capabilities, wisdom, and blessing of success, during these important years of research and for surrounding me with an outstanding group of colleagues and researchers who were helpful in this research.
    [Show full text]
  • Academic Perspectives on Cyber Security Challenges
    You are cordially invited to attend Yuval Ne'eman Workshop for Science, Technology and Security, the National Cyber Bureau and Blavatnik Interdisciplinary Cyber Research Center (ICRC), Tel Aviv University The 4th Annual International Cybersecurity Conference: Academic Perspectives on Cyber Security Challenges Monday, September 15th, 2014, 10:30 – 13:00, Kes Hamishpat Hall, Trubowicz Building, Tel Aviv University 10:00 - 10:30 Reception & Registration 10:30 - 11:40 Moderator: Dr. Yaniv Harel, Fellow, Blavatnik Interdisciplinary Cyber Research Center, TAU The Global Cyber-Vulnerability Report Prof. V.S Subrahmanian, Professor of Computer Science, University of Maryland and Head of the Center for Digital International Government Implementations of Machine Learning Tools for Detecting Cyber Anomalies Prof. Irad E. Ben-Gal, Head of the Department of Industrial Engineering & Management, TAU Rebuilding Trust in Computing Platforms Dr. Eran Tromer, Senior Lecturer , Blavatnik School of Computer Science, TAU Cyber Threat: Achilles Heel of Space Systems? Dr. Deganit Paikowsky, Senior Researcher, Yuval Ne'eman Workshop for Science, Technology and Security, TAU Ms. Gil Baram , Researcher, Yuval Ne'eman Workshop for Science, Technology and Security, TAU 11:40 - 12:00 Coffee Break 12:00 - 13:00 Air-Gap and Cyber Security Prof. Yuval Elovici, Director, Deutche Telekom Laboratories, Ben-Gurion University of the Negev, Israel Cyber Studies in International Relations: Future Directions and Priorities Dr. Lucas Kello, Research Fellow, Belfer Center for Science and International Affairs, Harvard University Discovering Weaknesses in Virtual Systems Prof. Assaf Schuster, Head of the Technion Center for Computer Engineering & a Professor in the Computer Science Department, Technion, Israel Institute of Technology Cybersecurity Through a Military Revolution Prism Lior Tabansky, Senior Researcher, Yuval Ne'eman Workshop for Science, Technology and Security, TAU Sponsored by: Steven E.
    [Show full text]