A Mechanism for Classifying and Preventing Phishing Websites

Master Project Proposal

PhishLurk: A Mechanism for Classifying and Preventing Phishing Websites

by: Mohammed Alqahtani

1. Committee Members and Signatures: Approved by Date

______Advisor: Dr. Edward Chow

______Committee member: Dr. Albert Glock

______Committee member: Dr. Chuan Yue Introduction

Phishing is a cybercrime done by person or company to steal highly sensitive information such as usernames, passwords and credit card details. Mostly, phishing attacks come into two types emails and webpages that spoof or lure the user to enter sensitive information. On other words, phishing is directing users to fraudulent web sites in order to get the sensitive information. Users are increasingly using the internet to do their daily task such bills payment, banking, socializing. As result of there are more and more personal information will be used for different purposes which mean expand the surface of target for phishing.

Sample of a phishing website (source: www . phishtank .com)

Phishing has been a major concern in the IT security. In the U.S., companies lose more than $2 billion every year as result of phishing attacks [6]. Phishing works because of many reason, one of the most common reason is the users’ carelessness and the users’ ignorance about how to differentiate whether the website is phishing or not [1]. Moreover, there are long lists of website that are hard to detect. There are many research have been proposed focusing on anti-phishing, using different methods of filtering and detecting such as black list, plugs-in, extensions and toolbars for browsers [2]. Desktop browsers’ Developers try hard to provide a solid protection such as warning the user by displaying a box massage if the website potentially is a phishing websites or invalid or expired SSL certificates. Mostly a third party and black-list are involved to display and identify phishing websites [3]. Recently, Users started to have more varieties of access to surf the internet for example notebooks, PC game, handhelds and smartphones , However; using more varieties of devises made in different abilities and features leads to complicate and sophisticate providing a full protection, especially from phishing attacks methods. Yet there is no such a complete protection. One of the most used devices is smartphones. According to a survey of ComScore, Inc. the number of smartphones subscribers increased 60 percent in 2010 compared to 2009 [4]. Another report by Nielsen Company indicates that by 2011 half of cell-phones users would be using smartphones [5].

Figure explains the global rapid growth of smartphones market 2009 - 2010

Users started to use these types of access to do their activities and tasks due to the advantages they provide i.e. smartphone preferred to use because of the easiness, flexibility and mobility that smartphone have. Some activates such as online banking, paying bills, online shopping and emailing [5] demand users need to enter sensitive information to complete the authentication and authorization process, sensitive information could be credit-cards numbers, password and usernames. In fact, having many types of devices to access the internet would expand the surface for phishing attackers and complicate the protection.

Related Work

PhishTank is an unprofitable project aimed to build dependable database of phishing websites [7], the project is to collect, verify, track and share phishing data. In order to report a phishing links, the user has to be register as a member. So the admin can learn and judge each member's contribution. The phishing websites can be reported and submitted via emails or PhishTank’s websites. The data are verified by committee after they are submitted by the members. Phishtank’s database can be shared via the API. The links in the original database are only classified as “phishing” and “unknown”. We will classify the phishing sites in PhishTank database into more precise categories and used them in the proposed project. PhishTank Has been working effectively to fight against phishing attacks, thousands of phishing links are monthly detected and verified as valid phishing sites [9], using the public’s effort and contribution to build a trustworthy and dependable database that is open for everyone to use and share. As result of that several well know organizations and browsers started using PhishTank database such as Yahoo mail, Opera, MaCafee, and Mozilla Firefox [10]. In my prototype, I use PhishTank as a phishing URLs’ provider.

In the paper titled “Large-Scale Automatic Classification of Phishing Pages [2]”, Colin Whittaker, Brian Ryner, and Marria Nazif proposed an automatic classiﬁer to detect phishing websites. The classiﬁer maintains Google’s phishing blacklist automatically and analyzes millions of pages a day including examining the URL and the contents to verify whether the page is phishing or not. The paper proposed a classifier works automatically with large-scale system which will maintain a false positive rate below 0.1% and reduce the life time of phishing page. They used machine learning technique to analyze the web page content. In my project, the determination is based on Phishtank’s blacklist, However; I aim to propose a methodology for classification the phishing website. My ultimate goal is not to determine whether the page phishing or not, PhishLurk determines depending on Phishtank’s blacklist, but to provide a new method to classify phishing links and considering two factors: consuming as less memory and screen space as possible which eventually improve the overall classification efficiency.

In the paper titled “PhishGuard: A Browser Plug-in for Protection from Phishing [8], Joshi, Y. Saklikar, S. Das, D. Saha, proposed a mechanism to detect a forged website via submitting fake credentials before the actual credentials during the login process of a website, then the server-side analyzes the responses of the submissions of all those credentials to determine whether the website is phishing or not. The mechanism was implemented on browsers side “user-side” as plug-in of Mozilla FireFox, However; the mechanism only detects during the log-in process for a user. If another user log-in to the same phishing website, he will goes through the same detection process. In my project, if the website reported as phishing site, no other user can get access, the reported link will be blocked, to the reported website.

In the paper titled “BogusBiter: A Transparent Protection Against Phishing Attacks [17]” Chuan Yue and Haining Wang proposed a client-side tool called BogusBiter that send a large number of bogus credentials to suspected phishing sites and hides the real credentials from phishers . BogusBiter is unique it also helps legitimate web sites to detect stolen credentials a timely manner by having the phisher to verify the credentials he has collected at that legitimate web site. BogusBiter was implanted as Firefox 2 extension, however; My project is different that uses server side to provide the protection.

In the paper titled “The Battle Against Phishing: Dynamic Security Skins [18]” Rachna Dhamija and J. D. Tygar proposed, an extended paper of [1], an anti-phishing tools helps user distinguishing if they are interacting with a trusted site or not by. This approach uses shared cryptographic image that remote web servers use to proof their identities to users, in a way that supports easy veriﬁcation for humans being and hard for attackers to spoof, however; in my project there is no dependency on the client-side. [18] can’t provide protection when we have user utilizing a public access because the approach requires support from both client-sides and server-side.

Most popular browsers provide a phishing filter that warns users from malicious websites including phishing websites. Filters mainly depend on certain lists to detect the malicious websites. IE7 used “Phishing Filter” that has been improved to be SmartScreen Filter in later version of IE due to the weak protection phishing filter provides [15]. In IE 8 and IE 9 "SmartScreen Filter" verifies the visited websites based on the updated list of malicious websites that Microsoft created and updated continuously [11] [12]. Similar to IE, Safari browser has filters checks the websites while the user browsing against a list of phishing sites. After the warning of PayPal to its members that Safari is not safe for their service [13], Safari started to use an extended validation certificates to support analyzing websites [14]. Firefox earlier versions of Firefox take advantage of ant-phishing companies such as GeoTrust or the Phish- Tank, using their list to support identifying malicious websites. The current version of Firefox has adopted Google's anti-phishing program to support its phishing protection.

Many research projects have proposed mechanisms that implemented as browser plugs-in and tool-bar against phishing attack. The main problem with plugs-in and tool bar is the need for users’ cooperation. Users may not cooperate and install the tool. Some users occasionally prefer to turn their filter off to brows faster [16]. Plugs-in and tools bar in some devices may not be as effective as it in desktop browser due to the limitation in the performance and the screen space as the case in smartphones. PhishIurk’s mechanism is aimed to use as less space and memory as possible in the Client-side, using the server side to provide the classification and protection of phishing links. So even the phishing protection was disabled on client-side PhishIurk still provide protected and classified links to the user.

The different phishing defense approaches can be further classified based on where the alerts are generated: • Browsers themselves: IE9, Firefox 5. • Browsers extensions or plug-ins: BogusBiter, PhishGuard. • Anti-phishing Search Site: Phishlurk “my project”. • Proxy server: Dansguardian [20]. • Anti-phishing Server: OpenDNS [19], GFI MailEssentials [21], and some browser extensions use server side partially such as Skins [18].

According to the official website [20], DansGuardian is an active web content filter that filters web sites based on a number of criteria including website URL, words and phrases included in the page, file type, mime type and more. DansGuardian use as proxy server that control, filter, and monitor all content, So its function more than anti-phishing. There is no such a project using proxy server as anti-phishing but it can be really an effective technique to classify and prevent phishing websites.

Proposed Project

I propose a mechanism to protect the user from phishing attacks; the mechanism assesses and classifies the sites, based on Phishtank’s blacklist, from the server side and using color scheme. The system also utilizes less screen space and memory to be work even with small sizes devices. The mechanism classifies the links into four types by using coloring scheme that use less space and requires less memory. I expanded the classification that used in Phishtank to be as following:  Phishing link (Red): is an absolute phishing link. The link will be disabled, so even if the user is ignorant or surfing carelessly as we saw in the survey [1], there is no way to access the link.  Unknown link (Orange): suspicious link, it might potentially be phishing link, it could be link indicate the same name or part of a real company's name asking the user to provide sensitive information. The link is submitted as phishing link but it hasn’t be verified yet. The user can click and get access to this type in their responsibility. The user gets warned before accessing the link.  Unlikely link (Gray): The same as unknown link, the difference is when the black list get a report about link that unlikely to be a phishing link for example websites that have Top- Level Domain “TLD” ends with (.edu or .gov), they are unlikely to be used by hackers website because their specialized for official use of organizations. The link will maintain to be unlikely until gets verified by Phishtank. Note that it might be someone reported the unlikely site trying to denigrate the organizations; it is fair to maintain the unlikely status until it gets verified and changed to a Safe link, or the site might actually be attacked by Cross-site scripting attacks or SQL injection attack. Global Phishing Survey: Trends and Domain Name Use - April 2011

As we see in the above chart, 60% phishing attacks was lunched by TLDs: .COM, .NET, .TK, and .CC.  Safe Link (Green): These are safe links, totally not phishing. The user can access the link without triggering warning messages. Providing the protection from the server side and using the coloring scheme for classification would safe much memory and more space on the client-side. The mechanism determines whether the website is phishing or not based on provided black-list of phishing website that is periodically updated to achieve the possible maximum accuracy.

The plan In this project, I will develop an anti-phishing search web site called “PhishLurk” using PHP and CSS that responds to the user search inquires with classified protective links. In case the website was a phishing link, the engine would classify it as risky, disable it, and warn the user by producing a red link. If the link was classified as “unknown” or “suspicious”, it would give users the choice whether to access the link or not, and warn them about the impact or consequences. If the link was classified “unlikely”, it would give the user the choice whether to access the link or not and warning to take the responsibility and warn that the link unlikely to be phishing, the link might be hacked or there is someone try to denigrate the organizations of the website. The last case when the link has no risk or suspicious note, the engine would classify it as a safe link. I use CSS to help classifying the links because it doesn’t consume a lot of screen resources or demand extensive computation. Beside processing the classification and providing the safe results to the user, PhishLurk system reads and updates the blacklist periodically from PhishTank.com to have the most up-to-date results. PhishLurk’s Design

Metric for Evaluating the PhishLurk System The proposed PhishLurk system can be evaluated by examining the effectiveness of its usage by the users and the processing overhead. We will conduct a survey on the usage of PhishLurk and summarize the feedbacks. Stress tests will be performed on the system and collecting the statistics about the average processing time overheads for classifying the URL, and modifying the links. Deliverables  The working software prototype, PhishLurk, with user guide and installation manual.  A master report documenting the design and implementation of PhishLurk, implementation choices and their performance evaluation, and the lessons learned. Additional Part: the proposal includes feedbacks of the committee members, they are as following:  What is other blacklist maintaining web site? Include that in the final report. Survey of the related system or blacklist and make it more complete. - Go to ACM/Digital library or IEEE/Explore to search on literature that use blacklists and report their effectiveness, and how many other systems using them. - Find out how firefox or other browsers use the blacklist and which black list they used.  Make your system design modular (make each key feature of your system a function, e.g., blacklistUpdate, userFeedback) and so that other systems usch as DansGuardian can reuse your code, such as the phishing checking result of the url.  Source code available on the gsc web site.  Coloring techniques are old. (color blind) other way to blocks or provide indications. Whole page warning, dialog , User configurable system is better choice - Modify your system so that based on user profile (login or IP address) to provide different display and prevention scheme (color or pop-up warning)  Get feedback from users to see if they proceed with visiting the sites or not. - Modify the revised pop-up warning page to have AJAX call back to the server (feedback.php) to record whether user proceeds with the visit or not. - Tally the feedback: The feedback.php will update the phishinglist with this new statistics. Next time same url was contained in the search results. Phishlurk will report the current classification and the local feedback statistics (how many local users visit and how many did not) -  Add follow-up reporting: - Include this new feature in “Future Direction” section. This module will send out email to ask users who decided to visit potential phishing sites and provide their feedback. These can be used as phishlurk published blacklist to be shared with other internet community.  Collect data from users and analyze them shared with new users. - Small scale survey on how useful is your system, can be just a few friends or classmates. - They can help test if the feedback module function. References:

1. Rachna Dhamija, J. D. Tygar, and Marti Hearst. 2006. Why phishing works. In Proceedings of the SIGCHI conference on Human Factors in computing systems (CHI '06), Rebecca Grinter, Thomas Rodden, Paul Aoki, Ed Cutrell, Robin Jeffries, and Gary Olson (Eds.). ACM, New York, NY, USA, 581-590. DOI=10.1145/1124772.1124861 http://doi.acm.org/10.1145/1124772.1124861. 2. Aaron Blum, Brad Wardman, Thamar Solorio, and Gary Warner. 2010. Lexical feature based phishing URL detection using online learning. In Proceedings of the 3rd ACM workshop on Artificial intelligence and security (AISec '10). ACM, New York, NY, USA, 54-60. DOI=10.1145/1866423.1866434 http://doi.acm.org/10.1145/1866423.1866434 Colin Whittaker, Brian Ryner, Marria Nazif, “Large-Scale Automatic Classification of Phishing Pages”, NDSS '10, 2010.< http://research.google.com/pubs/pub35580.html > 3. Gross, Ben. "Smartphone Anti-Phishing Protection Leaves Much to Be Desired | Messaging News." Messaging News | The Technology of Email and Instant Messaging. 26 Feb. 2010. Web. . 4. ComScore, Inc. "Smartphone Subscribers Now Comprise Majority of Mobile Browser and Application Users in U.S." ComScore, Inc. - Measuring the Digital World. ComScore, Inc, 1 Oct. 2010. . 5. Entner, Roger. "Smartphones to Overtake Feature Phones in U.S. by 2011." Http://www.nielsen.com. Nielsen Wire, 26 Mar. 2010. Web. . 6. Kerstein, Paul L. "How Can We Stop Phishing and Pharming Scams?" CSO Online - Security and Risk. CSO Magazine - Security and Risk, 19 July 2005. Web. . 7. OpenDNS, LLC. PhishTank: an Anti-phishing Site. [Online]. http://www.phishtank.com. 8. Joshi, Y.; Saklikar, S.; Das, D.; Saha, S.; , "PhishGuard: A browser plug-in for protection from phishing," Internet Multimedia Services Architecture and Applications, 2008. IMSAA 2008. 2nd International Conference on , vol., no., pp.1-6, 10-12 Dec. 2008 doi: 10.1109/IMSAA.2008.4753929, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp? tp=&arnumber=4753929&isnumber=4753904 9. PhishTank - Statistics about phishing activity and PhishTank usage , http://www.phishtank.com/stats.php 10. PhishTank, Friends of PhishTank, http://www.phishtank.com/friends.php 11. SmartScreen Filter: Frequently Asked Questions." Windows Home - Microsoft Windows. . 12. "SmartScreen Filter - Microsoft Windows." Windows Home - Microsoft Windows. Web. . 13. Apple - Safari - Learn about the Features Available in Safari." Apple. . 14. TECH.BLORGE- Top Technology news, Paypal warns buyers to avoid Safari browser from Apple - < http://tech.blorge.com/Structure:%20/2008/02/28/paypal-warns-buyers-to-avoid-safari-browser-from-apple/ > 15. "Firefox 2 Phishing Protection Effectiveness Testing." Home of the Mozilla Project. . 16. "AVIRA News - Anti-Virus Users Are Restless, Avira Survey Finds." Antivirus Software Solutions for Home and for Business. . 17. Chuan Yue and Haining Wang. 2010. BogusBiter: A transparent protection against phishing attacks. ACM Trans. Internet Technol. 10, 2, Article 6 (June 2010), 31 pages. DOI=10.1145/1754393.1754395 http://doi.acm.org/10.1145/1754393.1754395 18. Rachna Dhamija and J. D. Tygar. 2005. The battle against phishing: Dynamic Security Skins. In Proceedings of the 2005 symposium on Usable privacy and security (SOUPS '05). ACM, New York, NY, USA, 77-88. DOI=10.1145/1073001.1073009 http://doi.acm.org/10.1145/1073001.1073009 19. OpenDNS | DNS-Based Web Security. . 20. DansGuardian - True Web Content Filtering for All. . 21. GFI - Web, Email and Network Security Solutions for SMBs on Premise and Hosted. .