Using URL Shorteners to Compare Phishing and Malware Attacks

Using URL Shorteners to Compare Phishing and Malware Attacks Sophie Le Page Guy-Vincent Jourdan Gregor v. Bochmann University of Ottawa University of Ottawa University of Ottawa [email protected] [email protected] [email protected] Jason Flood Iosif-Viorel Onut CTO Security Data Matrices IBM Centre for Advanced Studies fl[email protected] [email protected] Abstract—In this work we use URL shortener click analytics to providers such as bit.ly1 or goo.gl2, allow viewing in real time compare the life cycle of phishing and malware attacks. We have the click traffic of a given short URL, including referrers and collected over 7,000 malicious short URLs categorized as phishing countries referring it. or malware for the 2 year period covering 2016 and 2017, along with their reported date. We analyze the short URLs and find Unfortunately attackers abuse URL shortening services, that phishing attacks are most active 4 hours before the reported perhaps to mask the final destination where the victim will date, while malware attacks are most active 4 days before the land after clicking on the malicious link. In our research we reported date. We find that phishing attacks have higher click make use of the Bitly URL shortening service, reportedly the through rate with shorter timespan. Conversely malware attacks most popular service [1], [2], to analyze clicks for phishing and have lower click through rate with longer timespan. We also show the comparisons of referrers and countries from which malware attacks. By phishing attacks we mean websites that short URLs are accessed, showing in particular an increased trick a user into providing personal or financial information by use of social media to spread both kinds of attacks. We also find falsely claiming to be a legitimate website. By malware attacks phishing clicks mainly come from USA and Brazil, while malware we mean websites that install data transmitting programs clicks mainly come from USA and Russia. Overall based on the without the user’s knowledge. We rely on two independent observation that 50% of malware attacks are active for several years, while less than 50% of phishing attacks are active past services to gather and classify these attacks: we use the popular 3 3 months, we conclude that the efforts against phishing attacks community driven portal PhishTank as a main source of are stronger than the efforts against malware attacks. phishing attack reports, as well as IBM’s threat intelligence sharing platform X-Force Exchange4 for both phishing and malware attacks. I. INTRODUCTION We first gathered over 300,000 malicious URLs categorized as phishing or malware along with their reported date, and For several years now, research has looked at phishing of these we identified over 7,000 Bitly short URLs. We then website removal times and the number of visitors that the fetched click analytics from Bitly for each of these short website attracts. Surprisingly very little research has looked URLs. From our analysis, we find that phishing attacks are at answering the same questions for malware websites. One most active 4 hours before the reported date, while malware reason for this may be because the data needed to perform attacks are most active 4 days before the reported date. We such an analysis is generally hard to come by. Research that also find that phishing attacks have higher click through rate looks at phishing website removal times and number of visitors than malware, but that malware attacks have longer timespan mostly makes use of resources such as log files, third party than phishing. We also show the comparisons of referrers and resources, and honeypots to gather data. However the last two countries from which short URLs are accessed, showing in resources are not always easy to get a hold of or to set up. particular an increased use of social media to spread both kinds As for the first resource option, log files are becoming scarce of attacks. We also find phishing clicks mainly come from since people are instead using tools such as Google analytics to USA and Brazil, while malware clicks mainly come from USA track their website click traffic. These analytics are not public and Russia. Overall we find that the efforts against phishing information for understandable reasons. attacks are stronger than the efforts against malware attacks, URL shortening services, which provide users with a based on the observation that 50% of malware URLs last for smaller equivalent of any provided long URL, are an example of a service which sometimes provide analytics as public 1https://bitly.com/ 2 information on their shortened URLs. Some URL shortening https://goo.gl/ 3https://www.phishtank.com/ 978-1-5386-4922-0/18/$31.00 c 2018 IEEE 4https://exchange.xforce.ibmcloud.com/ several years, while less than half of phishing URLs are active past 3 months. This work provides a new and comparative insight about the life cycle of phishing and malware attacks. Our work adds to the analysis on the life cycle of phishing attacks from the perspective of short URLs, an ever increasing strategy that attackers use to obfuscate malicious URLs. As far as we are Fig. 1. Bitly mapping of long URL, user short URL and aggregate short URL. A long URL may have many short URLs, shortened by different registered aware, our work also constitutes possibly the first analysis on users, but the aggregate shortened URL keeps cumulative count of statistics the life cycle of malware URLs with respect to timespan and for every click on the long URL. number of visitors, that is not limited to a single Enterprise or University. In addition this work complements previous short URL analyses, with new findings about click activity for given a limit of three character representations, each new URL flagged short URLs, and new analyses looking at the timespan gets a unique three character combination until no more unique of malicious short URLs. From our analysis of short URLs, combinations are left, at which point the limit is increased to we also find that Twitter spam click activity has remained four character representations. consistent since 2010. Users can also choose to create a customized backhalf Additionally, all of the data used in this work is being made which is easier to remember than random letters and num- publicly available and can be found at http://ssrg.site.uottawa. bers, but the backhalf is only accepted if it has not al- ca/ecrime18/. ready been used. A customized backhalf can further contain characters “-” and “ ” [5]. For example, the short URL II. BACKGROUND “bit.ly/SuperBowl” is a customized backhalf. Further cus- The concept behind URL shortening services is to assist in tomization of short URLs include Branded Short Domain sharing URLs by providing a short equivalent. URL short- (BSD), such as “nyti.ms/SuperBowl”, where “nyti.ms” is the ening services provide users with a smaller equivalent of BSD for New York Times, which allows large organizations any provided long URL, and redirects to the corresponding to maintain their brand identity while using URL shorteners. long URL by the service provider through an “HTTP 301 Unfortunately URL shortening services provide attackers Moved Permanently” response. Shortened URLs first appeared with a convenient and free tool to obfuscate their URLs. in 2001 [3] and initially, the concept was used to prevent Through the use of customized backhalfs, attackers can even breaking of complex URLs while copying text, and to prevent craft a short URL so as to fit the target’s profile (e.g., email clients from inserting line breaks in the links which bit.ly/freemoviesfast). In the case of blacklisting malicious rendered them unclickable. However, its adoption was slow short URLs, for blacklists based only on domains rather than until they became popular in online social networks. Now full URLs, false positives pose a threat of blacklisting entire URL shorteners are almost a requirement due to character sites such as URL shorteners. This resulted in blacklists having limitations in some social media, and due to mobile devices, to use crawling in order to resolve shortened URLs, and where space is always at a premium. blacklist the long URL. Nikiforakis et al. [6] also give a good Over the years URL shorteners such as Bitly have evolved to overview of other dangers of URL shortening, such as linkrot provide increased utility to users, such as providing analytics and hijacking. to track clicks. Bitly is one of the most popular URL shorten- Pressure was also put on URL shortening services to put ing services, as a few studies have found [1], [2]. Each short in place countermeasures for spam. For example as shown URL Bitly issues is unique and will not be re-used, so the Bitly in Figure 2, Bitly flags malicious short URLs and displays a short URL will always direct to the same long URL. When warning page upon clicking the malicious short URL. Bitly a registered user shortens the same long URL, each instance also provides a “preview” of a short URL by allowing users gets a unique short URL. This way users can keep track of to append a ‘+’ to the end of a short URL when entering it their own click analytics. A long URL may have many short into the browser, as shown in Figure 3. This preview shows URLs, shortened by different registered users, but an aggregate the long URL as well as detailed analytics such as number of shortened URL keeps cumulative count of statistics for every clicks and percentage of referrer and country clicks. Note that click on the long URL through Bitly (see Figure 1).

Using URL Shorteners to Compare Phishing and Malware Attacks

Whisperkey: Creating a User-Friendly URL Shortener

Phishing Detection Using Machine Learning

Longitudinal Study of Links, Linkshorteners, and Bitly Usage on Twitter Longitudinella Mätningar Av Länkar, Länkförkortare Och Bitly An- Vänding På Twitter

JETIR Research Journal

Dead.Drop: URL-Based Stealthy Messaging

Detecting Spam Classification on Twitter Using URL Analysis, Natural Language Processing and Machine Learning

Longitudinal Measurements of Link Usage on Twitter Longitudinella Mätningar Av Länkanvändning På Twitter

The Internet As Infrastructure

A First Look at Private Communications in Video Games Using Visual Features

Malicious URL Detection Using Machine Learning: a Survey

A Review on Spam Classification of Twitter Data Using Text Mining and Content Filtering Sandeep Kumar Rawat Assistant Prof

Detecting Malicious Shortened Urls Using Machine Learning