A Keyword-Based Combination Approach for Detecting Phishing Webpages

Total Page:16

File Type:pdf, Size:1020Kb

A Keyword-Based Combination Approach for Detecting Phishing Webpages computers & security 84 (2019) 256–275 Available online at www.sciencedirect.com j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / c o s e A keyword-based combination approach for detecting phishing webpages ∗ Yan Ding a, Nurbol Luktarhan a, , Keqin Li b, Wushour Slamu a a College of Information Science and Engineering, Xinjiang University, Urumqi, China b Department of Computer Science, State University of New York, New Paltz, New York, USA a r t i c l e i n f o a b s t r a c t Article history: In this paper, the Search & Heuristic Rule & Logistic Regression (SHLR) combination detec- Received 16 January 2018 tion method is proposed for detecting the obfuscation techniques commonly used by phish- Accepted 9 November 2018 ing websites and improving the filtering efficiency of legitimate webpages. The method is Available online 23 March 2019 composed of three steps. First, the title tag content of the webpage is input as search key- words to the Baidu search engine, and the webpage is considered legal if the webpage do- Keywords: main matches the domain name of any of the top-10 search results; otherwise, further eval- Heuristic rule uation is performed. Second, if the webpage cannot be identified as legal, then the webpage Machine learning is further examined to determine whether it is a phishing page based on the heuristic rules Phishing defined by the character features. The first two steps can quickly filter webpages to meet Search engine the needs of real-time detection. Finally, a logistic regression classifier is used to assess the URL obfuscation techniques remaining pages to enhance the adaptability and accuracy of the detection method. The experimental results show that the SHLR can filter 61.9% of legitimate webpages and iden- tify 22.9% of phishing webpages based on uniform/universal resource locator (URL) lexical information. The accuracy of the SHLR is 98.9%; thus, its phishing detection performance is high. © 2019 Elsevier Ltd. All rights reserved. users, and the annual economic losses caused by phishing 1. Introduction websites amount to several hundred million dollars. Phishing attacks are cybercrimes that steal user privacy 1.1. Phishing data via social engineering or technical methods. Attackers use e-mail or text messages to send false security warn- In 2017, the “Double 11” shopping carnival experienced a ings or prize information to trick users into clicking on the record high number of sales that amounted to hundreds of phishing webpage and submitting critical personal informa- billions of Chinese e-commerce transactions. The continuous tion. Phishing attacks can lead to a series of serious problems development of e-commerce has played a vital role in pro- for users, such as stolen bank account information, which moting the world economy and satisfying the needs of the can cause huge financial losses. Moreover, the use of iden- population. However, during this period, phishing webpages tity information to substantiate false information will neg- and other online fraud behavior have also shown rapid growth atively affect the users credibility. Phishing webpages and trends. Phishing webpages, a form of cyberattack, seriously their uniform/universal resource locators (URLs) are the pri- threaten the credibility and financial security of Internet mary sources used by phishing attackers to steal private data ∗ Corresponding author. E-mail addresses: [email protected] (Y. Ding), [email protected] (N. Luktarhan), [email protected] (K. Li), [email protected] (W. Slamu). https://doi.org/10.1016/j.cose.2019.03.018 0167-4048/© 2019 Elsevier Ltd. All rights reserved. computers & security 84 (2019) 256–275 257 Fig. 1 – Phishing attack flow chart. from users. According to the 2016 China Internet Security Re- 1. General phishing attacks are attacks that are not targeted port ( Qihoo360, 2017 ) released by the 360 Internet Security to specific subjects. The attacker needs only to send the Center, 1,199,000 new phishing websites were intercepted in phishing URL to the user’s terminal through various means 2016 compared with 2015, representing an increase of 25.5% and does not set a targeted scam plan. Over time, phishing (1,569,000). Over 225 phishing websites are created on average attacks have become easy to conduct because of the evo- every hour. The scale of phishing webpages and their URLs lution of the cybercrime market. An attacker can buy and shows a continuously growing trend. sell tools for phishing kits, and these kits are usually priced With the advent of the big data era, phishing attacks have at only $2–$10. Moreover, buyers do not need specialized become more sophisticated. To increase the probability of expertise to execute the attack. Because obvious flaws are attack success, attackers collect and analyze various data exposed with general phishing attacks, the probability of information that users leave on the Internet, such as their success is relatively low. name, gender and contact information, and even basic infor- 2. Spear phishing attacks are attacks on a specific class of mation of the users’ family members and social relations. As a subjects. Because such users are often the key information result, phishing e-mails appear to be authentic and trustwor- holders of a business or a group, the attack relies on well- thy, making users more likely to fall into the trap of phishing planned social scams. An attacker sends false messages when they receive messages that appear relevant. According to the targets. If the attack does not work within a certain to Symantec’s Internet Security Threat Report ( Symantec, time frame, the attacker will launch other types of exploits 2017 ), business email compromise (BEC) scams using spear against targets with the same background. However, in re- phishing tactics attack more than 400 companies a day and cent years, attackers have begun to change tactics to avoid have stolen more than $3 billion over three years. According exposure after many failures. The number and sophisti- to the Anti-Phishing Working Group (APWG) statistical report cation of spear phishing attacks have increased substan- ( APWG, 2017 ), a total of 255,065 phishing attacks against tially compared with those of general phishing attacks. In different enterprises or entities were detected globally in 2015, 34.9% of all spear phishing e-mails were sent to finan- 2016, representing an increase of 10% from 2015. The effective cial enterprises, and financial industry enterprises had at detection of phishing webpages is a key measure to maintain least an 8.7% likelihood of experiencing at least one attack Internet security. throughout the year ( Symantec, 2016 ). 3. Whale phishing attacks are phishing attacks against corporate CEOs and political leaders. Attacker vectors use infected e-mails, collect and analyze users’ online and 1.2. Phishing categories offline information, and exploit various cyber vulnerabil- ities to develop a detailed attack strategy. Whale phishing As shown in Fig. 1 , phishing attackers need to perform has similarities to spear phishing, with both targeting preparatory work before conducting phishing attacks, includ- specific groups of people; however, the losses caused by ing selecting subjects and methods of dissemination and cre- whale phishing are huge and catastrophic. For example, ating fake webpages. A reasonable classification of phishing in 2016, Walter Stephan, the CEO of the Austrian aircraft attacks can aid the adoption of effective identification mea- parts maker FACC caused a $47 million loss because of a sures ( Aleroud and Zhou, 2017 ). Therefore, this paper classifies phishing e-mail. phishing attacks based on key processes used in the attack. First, phishing attacks can be divided into the following three Building a phishing webpage is a key part of a phishing types according to the target of the attack. attack, and how to disseminate URLs to clients is one issue 258 computers & security 84 (2019) 256–275 that attackers must consider. The methods of disseminating can be divided into four parts, the domain, path, file, phishing webpages can be divided into three categories. and parameters. Many unrelated words are included in phishing URLs; thus, phishing URLs present certain dif- 1. E-mail. Although instant messaging technology is gaining ferences in their characters with respect to legitimate popularity among corporate and personal user groups, e- URLs ( Mcgrath and Gupta, 2008 ). For example, Ma et al. mail still dominates the field of digital communications. (2009, 2011) extracted a number of features, including E-mail represents a simple and affordable dissemination the length of the URL string, the number of domain method to spread phishing pages. In 2016, one out of every labels, and the number of segmented characters to 2596 e-mails was a phishing attempt ( Symantec, 2017 ). detect phishing webpages. 2. Short Message Service (SMS). In 2016, 360 mobile guards 2) Domain. The domain name is a unique identification blocked 17.35 billion spam messages across the China, of that consists of two or more words separated by a which fraud messages ranked second and accounted for period. The domain can be divided into three levels 2.8% of the total spam messages ( Qihoo360, 2017 ). However, based on location, with the word at the far right of because of the development of mobile Internet technology, the domain name representing the top-level domain, users have become reliant on mobile devices, such as mo- which includes country names, such as.cn representing bile phones; therefore, SMS represents a major method of China’s top-level domain, and top-level international delivering phishing URLs. domain names, such as.com (for business) and.org (for 3. Others. Voice phishing is an attack technique that uses non-profit organizations).
Recommended publications
  • Civil Good: a Platform for Sustainable and Inclusive Online Discussion
    Civil Good: A Platform For Sustainable and Inclusive Online Discussion An Interactive Qualifying Project submitted to the faculty of Worcester Polytechnic Institute In partial fulfillment of the requirements for the degree of Bachelor of Science by: Steven Malis (Computer Science), Tushar Narayan (Computer Science), Ian Naval (Computer Science), Thomas O'Connor (Biochemistry), Michael Perrone (Physics and Mathematics), John Pham (Computer Science), David Pounds (Computer Science and Robotics Engineering), December 19, 2013 Submitted to: Professor Craig Shue, WPI Advisor Alan Mandel, Creator of the Civil Good concept Contents 1 Executive Summary1 1.1 Overview of Recommendations......................2 2 Authorship5 3 Introduction 10 3.1 Existing Work - Similar Websites.................... 11 4 Psychology 17 4.1 Online Disinhibition........................... 17 4.2 Format of Discussions.......................... 22 4.3 Reducing Bias with Self-Affirmation................... 28 4.4 Other Psychological Influences...................... 34 5 Legal Issues 38 5.1 Personally Identifiable Information................... 38 5.2 Intellectual Property........................... 42 5.3 Defamation................................ 45 5.4 Information Requests........................... 46 5.5 Use by Minors............................... 49 5.6 General Litigation Avoidance and Defense............... 51 6 Societal Impact 52 6.1 Political Polarization........................... 52 6.2 Minority Opinion Representation.................... 55 6.3 History and Political
    [Show full text]
  • Making Internet Security Accessible to Everyone
    CASE STUDY Making Internet Security Accessible to Everyone The Challenge Vital personal and business information flows over the Web more frequently than ever, and we don’t always know when it’s happening. HTTPS has been around for a long time but according to Firefox telemetry, only ~51% of website page loads used HTTPS at the end of 2016. That number should be 100% if the Web is to provide the level of privacy and security that people expect, and Let’s Encrypt is leading the way. ABOUT LET’S ENCRYPT Let’s Encrypt is a free, automated and open certificate In essence, everyone should use TLS (the successor to SSL) authority, run for the public’s benefit and is supported everywhere to protect their communications over the Web. Every organizationally by The Linux Foundation. The objective browser in every device supports it. Every server in every data of Let’s Encrypt is to help acheive 100% encryption on center supports it. the Web. Let’s Encrypt provides free domain-validated (DV) certificates through a simplified, automated process. However, until Let’s Encrypt there was a potentially significant These unique attributes make Let’s Encrypt ideal for large organizations, who need to alleviate financial burden and cost to administering server certificates. Let’s Encrypt is a free automate deployment at scale. Let’s Encrypt is also ideal certificate authority, built on a foundation of cooperation and for individual users, particularly those in underserved openness, that lets everyone be up and running with basic markets, who may lack funds and technical skill to server certificates for their domains through a simple one-click otherwise deploy HTTPS.
    [Show full text]
  • Dreamhost Refer a Friend
    Dreamhost Refer A Friend Christopher is overflowingly hortative after redivivus Terrence clapboards his destructs tropically. Instructive Christos swigging inequitably. When Stanwood valorising his singing wants not repressively enough, is Sim prevenient? Every time very simple and provide training, can end user reviews to affiliate will get paid to a dreamhost Speeds and upgrades can change your referred by us know about any way if you get more money by that they feel like twitter, where various online. Please share how much does dreamhost does this change a friend connected to refer and referring. This can we may be able to dreamhost was either way is slow, dreamhost refer a friend programs reward arm to a calendar etc to. Best though all the fun, running in very bias and wiki away and on powweb experience of other words, perl or employer pay? Build fun way you refer a friend to make money referring today to refuse all product specs, forced matrix and private. Cares act provisions that this stage is a try enterprise plans with a big way you for the time you know we need to obtain employees are! Want to dreamhost has sent within your friend to join using a small businesses, hostgator myself which was friendly team. How these things like the. Out your behavior is one among others you ever is useful to finish up for when weighed against all in this page? But dreamhost been a friend and refer different portfolios. They refer friends to dreamhost server that speaks spanish. No further options available in new affiliates, as possible to use a lifestyle gaming brand consistency is laid out for.
    [Show full text]
  • Comodo Antispam Gateway Software Version 1.5
    Comodo Antispam Gateway Software Version 1.5 Administrator Guide Guide Version 1.5.082412 Comodo Security Solutions 525 Washington Blvd. Jersey City, NJ 07310 Comodo Antispam Gateway - Administrator Guide Table of Contents 1 Introduction to Comodo Antispam Gateway........................................................................................................................... 4 1.1 Release Notes............................................................................................................................................................. 5 1.2 Purchasing License .................................................................................................................................................... 6 1.3 Adding more Users, Domains or Time to your Account .................................................................................................6 1.4 License Information................................................................................................................................................... 10 2 Getting Started................................................................................................................................................................... 13 2.1 Incoming Filtering Configuration ................................................................................................................................ 13 2.1.1 Configuring Your Mail Server..................................................................................................................................
    [Show full text]
  • 0789747189.Pdf
    Mark Bell 800 East 96th Street, Indianapolis, Indiana 46240 Build a Website for Free Associate Publisher Copyright © 2011 by Pearson Education Greg Wiegand All rights reserved. No part of this book shall be Acquisitions Editor reproduced, stored in a retrieval system, or transmit- Laura Norman ted by any means, electronic, mechanical, photo- copying, recording, or otherwise, without written Development Editor permission from the publisher. No patent liability is Lora Baughey assumed with respect to the use of the information contained herein. Although every precaution has Managing Editor been taken in the preparation of this book, the Kristy Hart publisher and author assume no responsibility for Senior Project Editor errors or omissions. Nor is any liability assumed for Betsy Harris damages resulting from the use of the information contained herein. Copy Editor ISBN-13: 978-0-7897-4718-1 Karen A. Gill ISBN-10: 0-7897-4718-9 Indexer The Library of Congress Cataloging-in-Publication Erika Millen data is on file. Proofreader Williams Woods Publishing Services Technical Editor Christian Kenyeres Publishing Coordinator Cindy Teeters Book Designer Anne Jones Compositor Nonie Ratcliff Trademarks All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Que Publishing cannot attest to the accuracy of this infor- mation. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. Warning and Disclaimer Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. The information provided is on an “as is” basis.
    [Show full text]
  • GUIDE to WEB HOSTING INTRODUCTION the Internet Is a Delicious Sprawl of Fascinating Websites, Catering to Our Every Wish and Whim
    GUIDE TO WEB HOSTING INTRODUCTION The Internet is a delicious sprawl of fascinating websites, catering to our every wish and whim. From the old to the young, we’ve all come to depend on our computers, mobile phones, and other devices. We rely on the Internet for food shopping, banking, finance, and even socializing – safe in the knowledge that our favorite websites will always be available to us, day or night. Such is the security and stability of modern web hosting, that it is often taken for granted. We seldom dwell on the mechanisms behind a website’s operations anymore, largely due to the carefree experience many of us enjoy. Where once servers were susceptible to a range of attacks, ISP issues, and hardware malfunctions, technology has advanced to the point that not only do primary hardware and software systems have a minute chance of failure, but there are plenty of backup systems ready to kick into the gear should the need arise. As with many feats of technology, the silent heroes behind the Internet’s speedy function are forgotten, hidden away in large server banks in tidy stacks. But learning about web hosting is necessary both for aspiring web masters and the average user looking to launch a personal website. There are currently over one billion websites inhabiting the Internet. Flashback to 1996, and this number was a diminutive 100,000 websites. Over two decades, the Internet radically expanded from a nuisance tool for industry professionals to a standard part of ordinary life, serving a range of needs that far surpassed the expectations of its early adopters.
    [Show full text]
  • Making Internet Security Accessible to Everyone
    CASE STUDY Making Internet Security Accessible to Everyone The Challenge Vital personal and business information flows over the Internet more frequently than ever, and we don’t always know when it’s happening. HTTPS has been around for a long time but according to Firefox telemetry only ~40% of websites and ~65% of transactions used HTTPS at the end of 2015. Those numbers should both be 100% if the web is to provide the level of privacy and security that people expect, and Let’s ABOUT LET’S ENCRYPT Encrypt is leading the way. Let’s Encrypt is a free, automated and open certificate authority, run for the public’s benefit and operated by It’s clear at this point that encrypting is something all of us should The Linux Foundation. The objective of Let’s Encrypt and be doing. In essence everyone should use TLS (the successor to the ACME protocol is to make it possible to set up an SSL) everywhere to protect themselves. Every browser in every HTTPS server and have it automatically obtain a browser- device supports it. Every server in every data center supports trusted certificate, without any human intervention. This is accomplished by running a certificate management agent it. However, until Let’s Encrypt there was a challenge and a on the web server. There are two steps to this process. significant cost to administering server certificates. First, the agent proves to the Certificate Authority (CA) that the web server controls a domain. Then, the agent can Let’s Encrypt is a free certificate authority, built on a request, renew, and revoke certificates for that domain.
    [Show full text]
  • Cpanel Restore Sending Request
    Cpanel Restore Sending Request Vaughan propel psychologically while internationalist Sayre interpenetrating unwholesomely or modernises waist-deep. Calcific and gymnastic Verge jargonizes her falchion thiggings or count-downs nary. Inventorial Obie schemes vaporously, he outstays his chanter very sanctifyingly. Regarding your email boxes when necessary restore cPanel backup file kindly. Turns off their values should be implemented yet, game content from an unique password? Changing the time zone in webmail. Changing infrastructure of sending one php executes, cpanel restore sending request scripts and click the accounts that failed transfer first. My registrar is listed as the Administrative Contact for my office name and feminine is preventing my transfer scent from being processed. Php requests and restore it monitors for? DNS settings, SMS for new invoice, found in WHM? Because certainly this, talk you need to sir about bandwidth usage. CPanel Quick Guide Tutorialspoint. We completely updated our website a few months back and believe the page was deleted then. Easily throw an email account by one cPanel server to another. GoDaddy Community cPanel Hosting GoDaddy AE. You must give first generated a CSR certificate signing request in cPanel and orde. Wordpress file manager hack. Create such files for each user that should have a custom file. Hi guys, and must be configured to use destinations that support incremental backups. Hosting Backup Options In cPanel Pickaweb. You may restore a removed file to particular folder origin. Download your backup file and extract it. Will send an additional mount options accurately describes a restoration. Configure and manage hosted email accounts for one office more users.
    [Show full text]
  • University of Florida Dissertation
    WHITE HOODS AND KEYBOARDS: AN EXAMINATION OF THE KLAN AND KU KLUX KLAN WEB SITES By ANDREW G. SELEPAK A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2011 1 © 2011 Andrew G. Selepak 2 To my grandfathers, George Kanala and George Selepak, who spent their lives providing for their families and inspired me to achieve. Also to my parents, Ronald and Josephine, who have supported me in all my decisions, and without their love and guidance, I would never have been able realize the honor of receiving a doctorate. 3 ACKNOWLEDGMENTS First and foremost I would like to thank Dr. Debbie Treise who has been my academic advisor, dissertation chair, mentor, friend, motivator, guide, and the person most responsible for me being able to achieve earning a doctorate. Second, I would like to thank Dr. Belio Martinez, Jr., who has worked with me on numerous projects, been a friend and colleague, and shown me a job is not who a person is but what they do. I would also like to thank Dr. Johanna Cleary who provided personal insight for this study and imparted me with invaluable knowledge of the field of Journalism and Communications. In addition, I would also like to thank Dr. Connie Shehan who has encouraged my diverse areas of research and always been enthusiastic about my topics of study. Finally, I would like to thank Jody Hedge, Kim Holloway, and Sarah Lee for providing untold assistance in helping me graduate.
    [Show full text]
  • Akeeba Backup User's Guide
    Akeeba Backup User's Guide Nicholas K. Dionysopoulos Akeeba Backup User's Guide by Nicholas K. Dionysopoulos Copyright © 2006-2021 Akeeba Ltd Abstract This book covers the use of the Akeeba Backup site backup component for Joomla!™ -powered web sites. It does not cover any other software of the Akeeba Backup suite, including Kickstart and the other utilities which have documentation of their own. Both the free Akeeba Backup Core and the subscription-based Akeeba Backup Professional editions are covered. If you are looking for a quick start to using the component please watch our video tutorials [https:// www.akeeba.com/videos]. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the appendix entitled "The GNU Free Documentation License". Table of Contents I. User's Guide to Akeeba Backup for Joomla!™ ................................................................................ 1 1. Introduction ...................................................................................................................... 6 1. Introducing Akeeba Backup ........................................................................................ 6 2. What can I use Akeeba Backup for? ............................................................................. 6 3. A typical backup/restoration
    [Show full text]
  • WIPO Overview of WIPO Panel Views on Selected UDRP Questions, Third Edition (“WIPO Overview 3.0”)
    WIPO Overview of WIPO Panel Views on Selected UDRP Questions, Third Edition (“WIPO Overview 3.0”) (including additional filing resources) WIPO Arbitration and Mediation Center 34, chemin des Colombettes CH-1211 Geneva 20 Switzerland T + 41 22 338 82 47 www.wipo.int/amc [email protected] © World Intellectual Property Organization – 2017 All Rights Reserved INTRODUCTION.............................................................................................................. 3 FIRST UDRP ELEMENT ............................................................................................... 11 SECOND UDRP ELEMENT.......................................................................................... 33 THIRD UDRP ELEMENT .............................................................................................. 55 PROCEDURAL QUESTIONS ....................................................................................... 81 WIPO LEGAL INDEX OF WIPO UDRP PANEL DECISIONS.................................. 113 DOMAIN NAME DISPUTE RESOLUTION SERVICE FOR COUNTRY CODE TOP LEVEL DOMAINS (“CCTLDS”) ............................................................ 125 UNIFORM DOMAIN NAME DISPUTE RESOLUTION POLICY (“UDRP”)............. 129 RULES FOR UNIFORM DOMAIN NAME DISPUTE RESOLUTION POLICY (“RULES”) .................................................................................................... 135 WIPO SUPPLEMENTAL RULES FOR UNIFORM DOMAIN NAME DISPUTE RESOLUTION POLICY (“WIPO SUPPLEMENTAL RULES”) ................................ 149
    [Show full text]
  • TOP Hosting Companies for WP FREE GUIDE
    FREE GUIDE Choosing Best Web Hosting For Your Wordpress Site TOP Hosting Companies For WP WebMaxFormance Academy www.webmaxformance.com/academy Table of Contents What Is Web Hosting? ...................................................................................................................... 2 Why Do You Even Need A Hosting? .................................................................................................. 3 Choosing Your Hosting - What Type Of Hosting Do I Need? ............................................................. 5 1. What Is A Shared Hosting And Why You Should Probably Use This Type ............................. 5 2. Virtual Private Server (VPS) - Pay As You Go ......................................................................... 7 3. The Advantages Of Having Dedicated Hosting ...................................................................... 9 Other Hosting Types – Honorable Mentions ............................................................................... 10 Choosing Your Domain Name ......................................................................................................... 11 Web Hosting Features You Should Look For ................................................................................... 12 2. Number Of Plan Offerings ................................................................................................... 12 3. Bandwidth / Traffic ............................................................................................................. 13 4. Storage
    [Show full text]