A Comprehensive Reexamination of Phishing Research from the Security Perspective Avisha Das, Shahryar Baki, Ayman El Aassal, Rakesh Verma, and Arthur Dunbar

REEXAMINING PHISHING RESEARCH 1 SOK: A Comprehensive Reexamination of Phishing Research from the Security Perspective Avisha Das, Shahryar Baki, Ayman El Aassal, Rakesh Verma, and Arthur Dunbar Abstract—Phishing and spear phishing are typical examples of is also available in the form of instant messaging applications, masquerade attacks since trust is built up through impersonation chat services, etc. for the attack to succeed. Given the prevalence of these attacks, The persistent popularity of phishing and spear phishing considerable research has been conducted on these problems along multiple dimensions. We reexamine the existing research on with attackers [1], [2], comes from the fact that they exploit phishing and spear phishing from the perspective of the unique the human element (“weakest link”) [3] and do not require any needs of the security domain, which we call security challenges: actual intrusion into the system or network. Phishing/spear real-time detection, active attacker, dataset quality and base- phishing attacks have been successfully used to bring well- rate fallacy. We explain these challenges and then survey the protected companies to their knees (e.g., the attack on RSA as existing phishing/spear phishing solutions in their light. This viewpoint consolidates the literature and illuminates several described in [4]) and estimates of losses from phishing alone opportunities for improving existing solutions. We organize the run into several hundred million dollars in the US. In addition existing literature based on detection techniques for different to the monetary loss, there is also a loss of time, productivity, attack vectors (e.g., URLs, websites, emails) along with studies on and damage to reputation. Besides stealing sensitive informa- user awareness. For detection techniques we examine properties tion, email attachment and web links in the emails are the most of the dataset, feature extraction, detection algorithms used, and performance evaluation metrics. This work can help guide common way of spreading malware, for example, 9 out of 10 the development of more effective defenses for phishing, spear phishing emails detected in Verizon network on March 2016 phishing and email masquerade attacks of the future, as well as carried ransomware [5]. Previous phishing studies have found provide a framework for a thorough evaluation and comparison. that phishing is “far more successful than commonly thought” and it is the main mechanism for manual account hijacking Index Terms—Phishing, spear phishing, usable security, email, [1], [6]. website, URL, dataset properties, unique challenges of security Despite more than a decade of research on phishing, it still continues to be a serious problem. There could be several reasons for this: the problem itself may be intractable, I. INTRODUCTION the technical approaches so far may have missed important Internet users continue to be plagued by many attacks, parameters of the problem, phishing exploits the human as which include: spam, phishing, spear phishing, masquerade, the weakest link so purely technical approaches may not be and malware delivery. Spam is an advertisement and its most sufficient, or some combination of these. To expose the reasons pernicious effect is the loss of time and productivity. Phishing for the continued success of phishing, we survey the detection and spear phishing are more damaging. In these attacks, the literature and user studies on phishing and spear phishing. attacker impersonates a trusted entity with an intent to steal We found that researchers have tackled phishing/spear sensitive information or the digital identity of the target, e.g., phishing in many papers, and there are several surveys of account credentials, credit card numbers, etc. The difference these attempts. However, we discovered that there are certain arXiv:1911.00953v1 [cs.CR] 3 Nov 2019 between them is that spear phishing is more targeted and methodological issues with phishing detection research. For phishing is more indiscriminate. Both of them are cases of example, we notice: masquerade attacks which involve impersonation. However, • The use of balanced datasets and inappropriate metrics this type of attack (masquerade) tends to have broader objec- • Unreported training and testing times tives. Examples include: planting fake news, sowing divisions • Lack of generalization studies in communities (e.g., the case of WhatsApp in India), and • Also in user studies, we find the multiple comparison swaying opinions (e.g., the case of stealing elections). The issue phrase malware delivery is self-explanatory. Many surveys have missed quite a few of these issues, e.g., Email continues to be one of the most convenient and popu- the dataset diversity issue is never mentioned. Challenges such lar vectors of choice for the above attacks. An email is usually as base-rate fallacy, active attacker and generalization studies, embedded with a poisoned link to a fraudulent website set up which we collectively call “security challenges” are rarely to trick the victim. However, with the growing popularity of mentioned, let alone emphasized. We identify the required set social networks, a new medium for spreading malicious links of challenges in cybersecurity based on those outlined in [2], [7]. All the authors are with the Department of Computer Science, University of Houston, TX, 77204 USA. e-mail: {rverma, adas5, sbaki2, aelaassal}@uh.edu c 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. REEXAMINING PHISHING RESEARCH 2 Our goal is to reinvigorate research on these problems and spear phishing attacks. According to DBLP (Digital Bibliog- reorient it towards solving the urgent, practical needs of the raphy & Library Project), the first phishing papers appeared security domain [2]. Hence, we reexamine the previous litera- in 2004,1 and there are 732 papers from 2004-2017.2 For ture on phishing and spear phishing from the viewpoint of the comparison, a publication search with keyword ‘spam’ on unique needs of the security domain, which we elaborate upon DBLP yields 2,213 papers (the query, ‘spam$’, exact word in Section IV, to determine the appropriateness of the proposed match for spam, yielded 1,785 papers), but this is an estimate, solutions. To our knowledge, a comprehensive evaluation of since it includes opinion/review spam papers and also more the appropriateness of the previous research on phishing and than 100 matches with author last name “Spampinato.” spear phishing from the security perspective has not been done To gather the research papers for this study, at first we used before. A better understanding of the previous studies in this the queries “phish URL, phish link, phish site, phish web, phish light will foster research on effective and practical defenses email/e-mail” independently on four databases: DBLP, ACM for these problems. Such a perspective will also provide a Digital Library, IEEE Xplore, and Google Scholar (allintitle framework for a thorough evaluation of current and future query). We also found many papers that study phishing as solutions. User studies are also important for stopping the part of malicious and malware centric behavior. Therefore, phishing attacks since attackers try to elude computer users. we added the following additional queries (without quotes) - No matter how good the detection system works, end users “malware URL, malicious URL, malware link, malicious link, should be prepared for the different types of attacks. So, malware site, malicious site, malware email/e-mail, malicious we also review the phishing user studies in addition to the email/e-mail.” However, we only consider research specific to detection techniques. We observed that there are quite a few phishing attack vectors – URL, email and websites. Papers existing surveys on phishing emails, significantly fewer on which propose mainly malware detection techniques are be- phishing websites and Uniform Resource Locators (URLs), yond the scope of this survey. To search for relevant literature and none on user studies. However, we did not find any survey on spear phishing, we used the queries “spear phishing” (this on phishing or spear phishing that emphasized the unique query also covers spear-phishing) and “spearphishing.” The needs of the security domain and examined the research from queries “spear phish” and “spearphish” did not yield any this perspective. Our contributions are as follows: additional results. Later, we realized that authors sometimes • We adapt the security challenges in cybersecurity [2], [7] just use “phishing detection” or “phishing attack,” or some to the field of detecting phishing URLs, websites, emails other variation. So we expanded our search using “phish” and and user studies. “phishing” to the above databases. • We collect and review a comprehensive list of literature For this paper, we mainly focus on research published on phishing and spear phishing detection systems as well between the years 2010-2017. We cover papers appearing up as user studies. to March 2018, and any pre-2010 paper that is highly cited • We conduct a systematic review of the phishing detection or appeared in a major security venue. We also cover

Load more