Linköping University | Department of Computer and Information Science Bachelor’s thesis, 16 ECTS | Link Usage 2020 | LIU-IDA/LITH-EX-G--20/001--SE

Longitudinal study of links, linkshorteners, and Bitly usage on Longitudinella mätningar av länkar, länkförkortare och Bitly an- vänding på Twitter

Mathilda Moström Alexander Edberg

Supervisor : Niklas Carlsson Examiner : Marcus Bendtsen

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman- nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

Mathilda Moström © Alexander Edberg Students in the 5 year Information Technology program complete a semester-long software develop- ment project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, cur- rently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culminates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elicitation. During the final stage of the semester, students create small groups and specialise in one topic, resulting in a bachelor thesis. The current report represents the results obtained during this specialisation work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis. Abstract

Social networks attract millions of users who want to share information and connect with people. One of those platforms are Twitter,which has the power to greatly shape peo- ple’s opinions and thoughts. It is therefore important to understand how information is shared among users. In this thesis, we characterize the link sharing usage on Twitter, plac- ing particular focus on third-party link shortener services that hide the actual URL from the users until the users click on a generic, shortened URL, focusing mainly on the link man- agement platform Bitly. The purpose of this thesis is to analyze link usage among users over a specific time period, the domains that different users and link shortens direct their users to and compare the click rates of such links with the corresponding retweet rates to see how this vary over time. We use a measurement framework that is developed by two other students from Linköping University to collect datasets over different time periods. First, we will compare a one-week-long dataset from the spring of 2019 to one that is gath- ered one year later in the spring of 2020. Two additional one-week-long datasets has also been collected during the spring of 2020. We use the two main datasets, separated by a year, to evaluate long-term differences, and the three datasets from the spring of 2020 to analyze shorter-term variations in the link usage. The study highlights with this approach is to be able to highlight significant patterns over time, including with regard to what domains that are tweeted. We have found that the usage of URL link shorterns has not decreased over the last year, though the usage of specifically Bitly has done so. The top domains with highest occurrences from 2019 did not get to keep their high rankings in 2020, this is especially true for .com whose occurrence has dropped by 2.7 percentage points in 2020. Our conclusion is that the difference between the years is not huge but that there are some interesting trends and patterns. Given the prevailing pandemic Covid-19, we have also chosen to do a minor analysis of how many users of Twitter link to domains related to this. It turned out that the link sharing of Covid-19 related substances decreased quite sharply during our analysis period. Acknowledgments

We would like to thank our supervisor Niklas Carlsson for his support and guidance during the project. We would also like to special thanks Oscar Järpehult and Martin Lindblom for giving us the opportunity to use their framework for our research and for being so helpful answering questions.

v Contents

Abstract iv

Acknowledgments v

Contents vi

List of Figures viii

List of Tables x

1 Introduction 1 1.1 Motivation ...... 1 1.2 Aim...... 1 1.3 Approach ...... 2 1.4 Contribution ...... 2 1.5 Delimitations ...... 2 1.6 Thesis outline ...... 3

2 Background 4 2.1 Twitter ...... 4 2.2 Shortened URL ...... 5 2.3 Top domain ranking sites ...... 6 2.4 Related work ...... 6

3 Method 9 3.1 Dataset ...... 9 3.2 Collection approach ...... 11 3.3 Limitations ...... 11

4 Results 12 4.1 High-level link shortener usage ...... 12 4.2 Domain statistics ...... 15 4.3 User statistics ...... 19 4.4 Bitly link interaction ...... 24 4.5 Verified vs non-verified users ...... 25 4.6 Covid-19 analysis ...... 29

5 Discussion 32 5.1 Results ...... 32 5.2 Method ...... 33 5.3 The work in a wider context ...... 33

6 Conclusion 35 6.1 Future work ...... 36

vi Bibliography 37

A Appendix 40 A.1 URL shorteners ...... 40 A.2 Collections from 18/3-25/3 and 1/4-8/4 ...... 43

vii List of Figures

4.1 Top 20 most frequent domains overall (2019)...... 13 4.2 Top 20 most frequent domains overall (2020)...... 13 4.3 Top 20 most frequent domains for shortener domains (2019)...... 14 4.4 Top 20 most frequent domains for shortener domains (2020)...... 14 4.5 Link popularity distribution to domains of different popularity classes, as defined using the Alexa top-1M lists...... 17 4.6 Link popularity distribution to domains of different popularity classes, as defined using the Majestic top-1M lists...... 17 4.7 Distribution of domain rank (2019)...... 18 4.8 Distribution of domain rank (2020)...... 18 4.9 The results from 2019 is found below the vertical divider in pink and 2020 above in blue...... 19 4.10 Distribution of the age for users account at the time of posting their tweet (2019). . 20 4.11 Distribution of the age for users account at the time of posting their tweet (2020). . 20 4.12 Distribution of the number of tweets favourited by users at the time of posting their tweet (2019)...... 20 4.13 Distribution of the number of tweets favourited by users at the time of posting their tweet (2020)...... 21 4.14 Distribution of the number of tweets posted by users at the time of posting their tweet (2019)...... 21 4.15 Distribution of the number of tweets posted by users at the time of posting their tweet (2020)...... 21 4.16 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2019)...... 22 4.17 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2020)...... 22 4.18 Distribution of the number of followers for users at the time of posting their tweet (2019)...... 22 4.19 Distribution of the number of followers for users at the time of posting their tweet (2020)...... 23 4.20 Distribution of the number of friends for users at the time of posting their tweet (2019)...... 23 4.21 Distribution of the number of friends for users at the time of posting their tweet (2020)...... 23 4.22 Followers-to-friends ratio for users at the time of posting their tweet...... 24 4.23 Two scatter plots of Bitly clicks-to-retweets-ratio...... 25 4.24 Logarithmic average of Bitly clicks per retweet...... 25 4.25 Clicks-to-followers ratio for Bitly links for verified users...... 26 4.26 Clicks-to-followers ratio for Bitly links for non-verified users...... 26 4.27 Heat-map of retweets vs followers tweeted (2019)...... 27 4.28 Heat-map of retweets vs followers tweeted (2020)...... 27 4.29 Heat-map of retweets vs number of tweets tweeted (2019)...... 27

viii 4.30 Heat-map of retweets vs number of tweets tweeted (2020)...... 28 4.31 Heat-map of followers vs number of tweets tweeted (2019)...... 28 4.32 Heat-map of followers vs number of tweets tweeted (2020)...... 28 4.33 Scatter plots for all 3 collections 2020 of Covid-19 clicks-to-retweets-ratio...... 30 4.34 Scatter plots for all 3 collections 2020 of the overall clicks-to-retweets-ratio...... 30 4.35 CDFs of the ratio between clicks and retweets for tweets containing Covid-19 re- lated links or hashtags and non Covid-19 related links and hashtags...... 31

A.1 Link popularity distribution to domains of different popularity classes, as defined using the Alexa and Majestic top-1M lists (18/3-25/3)...... 47 A.2 Link popularity distribution to domains of different popularity classes, as defined using the Alexa and Majestic top-1M lists (1/4-8/4)...... 47 A.3 Distribution of domain rank (18/3-25/3)...... 47 A.4 Distribution of domain rank (1/4-8/4)...... 48 A.5 Distribution of domain rank (18/3-25/3)...... 48 A.6 Distribution of domain rank for (1/4-8/4)...... 49 A.7 Distribution of the age for users account at the time of posting their tweet (18/3- 25/3)...... 49 A.8 Distribution of the age for users account at the time of posting their tweet (1/4-8/4). 50 A.9 Distribution of the number of tweets favourited by users at the time of posting their tweet (18/3-25/3)...... 50 A.10 Distribution of the number of tweets favourited by users at the time of posting their tweet (1/4-8/4)...... 50 A.11 Distribution of the number of tweets posted by users at the time of posting their tweet (18/3-25/3)...... 51 A.12 Distribution of the number of tweets posted by users at the time of posting their tweet (1/4-8/4)...... 51 A.13 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (18/3-25/3)...... 51 A.14 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (1/4-8/4)...... 52 A.15 Distribution of the number of followers for users at the time of posting their tweet (18/3-25/3)...... 52 A.16 Distribution of the number of followers for users at the time of posting their tweet (1/4-8/4)...... 52 A.17 Distribution of the number of friends for users at the time of posting their tweet (18/3-25/3)...... 53 A.18 Distribution of the number of friends for users at the time of posting their tweet (1/4-8/4)...... 53 A.19 Followers-to-friends ratio for users at the time of posting their tweet ...... 53 A.20 Clicks-to-followers ratio for Bitly links for verified users...... 54 A.21 Clicks-to-followers ratio for Bitly links for non-verified users...... 54 A.22 Heat-map of retweets vs number of tweets tweeted (18/3-25/3)...... 55 A.23 Heat-map of retweets vs followers tweeted (18/3-25/3)...... 55 A.24 Heat-map of followers vs number of tweets tweeted (18/3-25/3)...... 55 A.25 Heat-map of retweets vs number of tweets tweeted (1/4-8/4)...... 56 A.26 Heat-map of retweets vs followers tweeted (1/4-8/4)...... 56 A.27 Heat-map of followers vs number of tweets tweeted (1/4-8/4)...... 56 A.28 Two scatter plots of Bitly clicks-to-retweets-ratio...... 57 A.29 Logarithmic average of Bitly clicks per retweet...... 57

ix List of Tables

4.1 Amount of tweets collected at the various time occasions divided into categories. . 12 4.2 Top 20 most frequent domains for all links...... 15 4.3 Top 20 most frequent domains for shortened links...... 16 4.4 Top 20 most frequent domains for Bitly links...... 16 4.5 Amount of unique users for each category and how many of those users that are verified...... 24 4.6 Amount of links tweets related to Covid-19...... 29 4.7 Amount of Bitly links tweets related to Covid-19...... 29

A.1 All collected shorteners sorted on domains and how many we were able to get the full domain from...... 42 A.2 Amount of tweets collected at the various time occasions divided into categories. . 43 A.3 Most frequent domains (18/3-25/3)...... 43 A.4 Most frequent domains (1/4-8/4)...... 44 A.5 Top 20 most frequent domains (18/3-25/3)...... 45 A.6 Top 20 most frequent domains (1/4-8/4)...... 46 A.7 Amount of unique users for each category and how many of those users that are verified...... 54

x 1 Introduction

In 2006, a social media networking site launched called Twitter. It is today one of the most popular social media platforms, with 100 million daily active users and 500 million tweets sent daily [35]. Twitter can be used for many purposes, to follow friends and family, receive news, follow high-profile celebrities and world leaders. To understand and analyse patterns of what kind of content that is being shared online, and how this pattern varies over time on Twitter, is important to get a better understanding how users behave on social media. We will investigate how this behavior varies and analyse patterns.

1.1 Motivation

Using short on Twitter has increased in popularity in the past few years [9]. This is mostly due to the fact that Twitter, as one of the most popular social media networks, has a 280 character limit for tweets [7]. You can simply fit more content in less space with URL shorteners, you can even customise the URL to make it more attractive for you followers to click on. When social media networks such as Twitter become more popular, the need to make sharing web content easier will increase. Shorter URLs are becoming more and more integral to that cause. URL shorteners, in their own way, work as aggregates of information. This can lead to some useful mashups and innovations in how people share and digest content. This thesis analyzes the use of short URLs by Twitter users, to get a better understanding how users behaveomon social medias. We also study how users behave in special times like the pandemic Covid-19 that currently dominates the world.

1.2 Aim

In this thesis, we perform an analysis on the use of short URL by investigating their click traffic, doing a longitudinal measurement over a span of one year. The aim is to get a bet- ter understanding of patterns over time when it comes to link, linkshortener and Bitly usage on Twitter. Bitly is a commonly used link management platform [2]. From the aim of this research the following sentences lays as the foundation that will be analyzed over a specific time to see how the patterns varies. With links being posted on Twitter, we want to inves- tigate if it is possible to discover significant patterns over time in regards to what domains

1 1.3. Approach are tweeted. Also, over different time aspects see if there are any correlations to be found between users and how they interact with tweets containing links. This is important because a social network as Twitter typically contains a massive number of short URLs and an ef- ficient mechanism is needed to collect their traffic. Another horrible, yet interesting aspect this year is the prevailing pandemic Covid-19 which at the time of writing is affecting the whole world, today social media account for 30 % of the overall visits to websites [11]. In other words, this is a time when knowledge of misleading information sharing, and fake news is more important than ever. The highlights of our work can be summarized as follows: We generate and collect tweet traffic dataset, looking for tweets containing URL shorteners, looking especially for Bitly links. The script will also look for links that regards Covid-19 to see how large a proportion of this topic increases or decreases the need to access information about the pandemic.

Research questions • Over a one year span, is it possible to see any significant patterns in regards to links, linkshortener and Bitly usage on Twitter? What has changed or not changed over the last year?

• In a shorter time aspect of just over a month, how do users on Twitter tend to write and link things that regards Covid-19? How does this behavior change over time?

1.3 Approach

Last year two IT students from Linköping University, Martin Lindblom and Oscar Järpehult developed a framework to see how tweets are retweeted and clicked. This thesis uses their framework in the same way but collects datasets in different time intervals. We compare the results they got last year (2019) to the datasets collected this year (2020). Both years the data is collected in early May. In the Appendix we include the results from two other datasets collected in April 2020, this to conclude if there are any weekly patterns in the behaviour of users on Twitter.

1.4 Contribution

With a longitudinal analysis we provide a time-aspect to compare data of link usage and user behavior on Twitter, with different aspects as Covid-19. A temporal analysis to see how the behaviour varies over time, both year-to-year and across multiple weeks. In the future this methodology can be used to compare behaviour over longer time periods, looking at different aspects.

1.5 Delimitations

We limit our analysis to only collect data over a time period spanning seven days at each collection, this to make it manageable to process and to make sure the datasets did not get too large. The collection have been chosen to be done in three rounds for a rather narrow period of time during the last half of our spring term as we write our Bachelor’s thesis. In the future, it would have been interesting to look at data collected over a longer period of time. Every tweet contains a lot of data and we have chosen to focus on link shorteners, so we have chosen to adapt which parameters we retrieve and we then limit what fields are saved to the dataset. Media fields and user created text fields are ignored to avoid complications when saving to file. The API’s that we used from Twitter and Bitly are the free versions that only gives a sample of all tweets, in the future it would be of intrest to compare it to a paid tier of their API’s.

2 1.6. Thesis outline

1.6 Thesis outline

This thesis is structured as follows. Chapter 2 presents some background of relevant areas which is necessary to understand the following work. This chapter ends with a section with related work to see what others have done in the subject. Then, in the next chapter we present our method and analysis. Chapter 4 presents our results, highlighting comparisons between the 2019 and 2020 datasets. In Chapter 5 we discuss similarities and differences from the col- lections, and provide suggestions of improvements and ideas for future work. Finally, con- clusions are presented in the last chapter. Additional results and complementing information (e.g., a list of all the domains that we considered shorteners) are provided in the Appendix.

3 2 Background

Background research has been made to assign information relevant to the work. The areas explained are Twitter, shortened URL and top domain ranking sites. Lastly, related work is presented, different papers regarding URL shorteners and spam, Twitter behavior, what kind of content that is being shared has been summaries to fill the aspect of what others have done researches about in this topic.

2.1 Twitter

Twitter is a ’microblogging’ system that allows its users to write posts that can be up to 280 characters long, these posts are called tweets. Tweets can include text, photos, videos and links to relevant websites and resources. To cope with the limitations of tweet lengths, link- shortening is commonly used. Twitter has today 326 million monthly active users, users follow other users and the country with the most users is USA and there are more men than women [35]. You can create your own tweets, or you can retweet information that has been tweeted by others. When you retweet, you forward the original tweet to your followers [32]. Retweeting means that information can be shared quickly and efficiently with a large number of people. A user on twitter can follow other users and have their own followers. Friends is someone that Twitter users follows (e.g. following or refereed as "follower"). When a user is following another user, they subscribe to that users tweets which will appear in their home timeline [34]. On twitter a user can respond to another user’s tweet by replying to the tweet. It is also possible to mention another user in your tweet. This is accomplished by using the "@" symbol in front of the username of the user that you want to mention in your tweet. When doing so the mentioned user will receive a notification about your tweet [30]. An Twitter ac- count may be verified if it is determined to be an account of public interest. This typically includes accounts own by users in politics, religion, music, acting, business and other key in- terest areas [27]. The Internet and especially social media have a great influence on the world, it consists of a huge ocean of opinions and Twitter is no exception [4]. In turbulent times around the world that we see in spring 2020 in conjunction with the Covid-19 pandemic, but also in less context as election campaigns, many seek social media to make their voices heard. Twitter today has almost all of the world’s leaders as diligent users on their platform [21]. The American president Donald Trump and his Twitter account @realDonaldTrump has over 75 million followers [31]. In addition to ordinary individuals, many companies, websites, artists

4 2.2. Shortened URL and organizations use Twitter as a way to reach out to their users, customers and fans. The service received a lot of attention during presidential election in the US 2008 when presiden- tial candidate Barack Obama’s campaign used Twitter and other social media to reach out to his voters. Twitter has also been a tool for regime critics in totalitarian regimes to communi- cate with the outside world, as well as during natural disasters and the pandemic we see right now, conflicts and the like, where private individuals on the spot have been able to report di- rectly on the events. The service is considered to have played an important role during the Arab Spring, where activists were able to communicate and spread their messages globally [21]. Twitter has compared to many other social media platforms as for example Facebook has several rules and policies that must be followed by all their users. One can often read in the newspapers about how different tweets posted by, among others, different politicians and world leaders have created big headlines if Twitter decides if some of their tweets gets removed due to rule violations [33]. Thus, it is not difficult to imagine the real power that Twitter actually has today.

Twitter API To share information as widely as possible, Twitter provides users with programmatic access to Twitter data through there APIs (application programming interfaces). APIs are the way computer programs "talk" to each other’s so that they can request and deliver information through HTTP requests. To get access to free API version you have to fill in an application about your intentions with the API to get a developer account, the application then needs to be granted by Twitter, for this paper we have used one developer account [1]. With the free version of collection tweets, the tweets are returned as a JSON object from the last seven days [28].

2.2 Shortened URL

URL-shortening is used to shorten a long web-address to a short, often is this shortening created by an external service, for example through Bitly or ow.ly. The purpose is at one aspect to be able to send links when there is a character limit to the service, for example both SMS and twitter has a limit on how many characters are allowed in a single message or tweet. But a URL-shortener can also be used to reduce the risk of a URL containing special characters being distorted and to make it easier to memorize a URL. Some URL-shortener services also offer so you can custom your URL, making it even more easily to remember [13]. Web services that generate shortened URLs are referred to as URL shortening services, for example Bitly and TinyURL, for example Bitly and TinyURL. They also provide the dereference function, i.e. redirection of the shortened URL to the original one. The shortened URLs generated from the same URL differ from service to service. A shortened URL consists of the of the shortening service and a unique key associated with the original URL. Today there are many services that offers link shortening, the problem is that many of the services are not serious and have been used for the purpose of sending out spam as well as getting people to visit pages they would normally never do. You can easily redo the URL provisioning, so it has nothing to do with the website you are directed to. The security aspect is important and therefore several major sites have chosen to obtain their own shorteners or clearly write which of all the third-party link shortener service that they recommend their users to use. This paper focus on the third-party URL shorteners and the clicks that the most popular such service (Bitly) generates. Twitter has its own link shortener t.co, links shared on Twitter will automatically be processed and shortened to an http://t.co link. There link service measures information such as how many times a link has been clicked, which is an important quality signal in determining how relevant and interesting each Tweet is when compared to similar Tweets. Having a link shortener protects users from malicious sites that engage in spreading malware, attacks, and other harmful activity. A link converted by Twitter’s link

5 2.3. Top domain ranking sites service is checked against a list of potentially dangerous sites. Users are warned with the error message below when clicking on potentially harmful URLs. The link service at http://t.co is only used on links posted on Twitter and is not available as a general shortening service on other apps or sites [29].

Bitly API Bitly is a platform where you can shorten, share, managing and analyze links to your con- tent. Billions of links are created every year by millions of users, from individuals to small businesses to Fortune 500 companies. Through the Bitly API you can track real-time click data and learn your top referrers and locations, you can also see when and where your links are clicked. Info, clicks, countries and referrers are the 4 major types of meta-data that Bitlys API provides. Info contains the properties of the short URL that referrers to the actual long URL behind the short one. Clicks contains the total amount of clicks for the short URL. The number of clicks and referrers from various countries are also provided, referrers is the ap- plications or web page that contain the short URLs [5]. This thesis is mainly focusing on Bitly as a third party links shortening.

2.3 Top domain ranking sites

Top domain rankings sites are sites that rank the most popular websites worldwide. The rank websites are based on a combined measure of page views and unique site users and creates a list of most popular websites based on this ranking time-averaged over a specific time periods, often only the highest-level domain is recorded. In this paper we will compare our results to two of the most common used top domain ranking sites, Alexa and Majestic. Which one you choose depends on what you are trying to accomplish. Alexa determines the rank of a website on combined measure of unique page views and visitors. Page views are the total number of Alexa user URL request for a site and unique visitors are determined by the number of unique Alexa users who visit a site on a given day. However, multiple requests for the same URL on the same day by the same user is counted as a one page view. The site with the highest combination of unique visitors and page views is ranked as number one [17]. The most popular and widely used top list is the Alexa Global Top 1M list [25]. The "Majestic Million" ranks the top 1 million websites in order how many other websites that link to them. By crawling the web and counting the number of referring subnets for each individual domains the data is used to construct the list, so unlike Alexa, Majestic does not take in count how often a link is clicked [3]. In this paper for the main collection collected 18-25/4 the Alexa and Majestic list that will be used was downloaded from their respective website 25-04-2020.

2.4 Related work

Related work has been divided into smaller sections covering the following areas: URL short- eners and spam, Twitter behaviour and what kind of content is being shared with URL short- eners.

URL shorteners and spam Shortened URLs can serve many legitimate purposes, such as click tracking as this paper is analyzing but also serve illicit behavior such as fraud, decit and spam. A research in the topic conclude that more than half of the URLs shared today are spam [8]. Spam, is something that Florian Klien and Markus Strohmaier has studied more about, the usage of logs of a URL shortener service. They expose the extent of taking place in their logs and provide an interesting insight into the danger of spamming via URL shortener services. Services as

6 2.4. Related work bit.ly and others play a critical role on the web today, spam is a problem both for users of link shorterns and operators. The paper has found that around 80% of shortened URLs contained spam-related content. The lack of spam blocking features can be a major reason to the high numbers. Their geographical analysis reveals that this problem has an international scale, the state that URL shorteners play a role in spam attacks that cross different countries. Also, that the use of URL shorteners varies a lot between different countries. A lot of countries resolve more links than they create but even more create more links than they resolve which can be drawn in parallel to that a high outdegree seems to be indicative of creating nations, that the authors states can be linked to spamming. A high indegree seems to be indicative of spam receiving countries (target of spam). In the ratio between resolves and creates, which tells us if a particular country visited more links than it created. Their research has found that Northern America, Asia, Australia and some part of Europe are identified as mostly resolvers and only a small number of creators. Whereas South America and Africa are identified as mostly creators with small number of resolves [19]. Another study about URL shortener spam is from [9] who highlight spammers that adopt the URL shorteners to camouflage and improve the user click-through of their spam URLs. They measure the misuse of the short URLs and analyze the characteristics of the spam and non-spam short URLs. There results showed that the majority of the clicks are from direct sources and that the spammers utilize popular websites to attract more attention by cross-posting the links.

Twitter behavior Antoniades et al.[2] studied the usage of shortened URLs on Twitter. In this study they col- lected data from twitter, owly and bitly and look at what content is being shared, how pop- ular the URLs are, the life span of the URLs and how shortened URLs can affect the web performance. They found that news, info/edu and "various" were the most popular types of content and that a small number of the URLs gets most of the clicks. Furthermore 50% of shortened URLs appears to be live for more than three months and they also found that short- ened URLs can affect the user experience because of the redirection time which will result in a slower access time for the user. [18] studied a different aspect of Twitter related to URL shorteners, the value of the shortened URLs referenced in tweets. Their results indicated that unlike frequently bookmarked URLs, which are generally of high quality, frequently tweeted URLs tend to fall in two different conflicting types. Either they come from sites of high qual- ity or they are spam. Another article by Garimella et al. [12] studied the Twitter behaviour in the US between 2009 to 2016, they used Twitter data to study the political polarization. The article did not though state in which direction the polarization went, but there analysis showed that the polarization did increase, depending on the measure, the relative change is 10 % to 20 %. Another interesting Twitter behaviour that has been analysed by [14]that focus on temporal click dynamics for links to the news articles of a selected set of new websites by combining the Twitter steaming API and the Bitly API, to see how many users that actually read the articles linked in the tweets they share and retweet. Their analysis highlights signif- icant differences in the clicks-per-retweet ratios of individual links and also big differences in number of links for which there are more retweets than clicks. Another interesting research that was made after the tragedy that happened in Christchurch terrorist attack in New Zealand in 2019 [10]. Two days after the attack, at least, 7,22,295 tweets are created by users of Twitter, they tweet about their thoughts and prayers about the attack. This again shows how very important social media is to spread information tweets are created by users of Twitter, they tweet about their thoughts and prayers about the attack. The paper examines the use of Twitter in this specific crisis. Their findings is that an individual might have more information-spreading power than authority or government institutions. That the influence of non-authority individuals on social media platforms like Twitter might spread information wider than authority without knowing the righteousness. Another project in this subject was made after the great earthquake and tsunami that hit east-

7 2.4. Related work ern Japan in 2011 [15]. Right after that, several web sites, especially those providing helpful disaster-related information, were overloaded due to flash crowds caused by Twitter users (flash crowd is a sudden, large surge in traffic to a particular Web site). To reduce this is- sue because flash crowds can be a serious problem in an emergency, they developed a new URL shortener that redirects Twitter users to a CDN (Content Delivery Network) instead of original sites. Their dataset was launched just days after the earthquake and is now publicly available online for further collaboration.

What kind of content is being shared with URL shorteners Nikiforakis et al. [23] CITE discuss how ad-based URL shortening services can pose a threat to those clicking on the shortened link. Ad-based URL shortening services work by display- ing an ad to the user before redirecting them to the actual site and in that way the person publishing the shortened URL can get some income from the clicks on the URL. A short- ened URL is a good way for malicious sites to evade blacklists and filtering systems, but an ad-based shortened URL service also makes it possible for a malicious ad to get to the user, making it harder for the user to stay safe on the internet. These malicious ads appears to be able to escape their container and redirect the user to a different harmful site. Nikiforakis et al. CITE also found that these advertisements were also able to perform drive-by downloads and attempt to trick the user to download malicious malware through the ad.

8 3 Method

The method used for our analysis was performed with the framework that combines the two APIs from Twitter and Bitly. For this longitudinal study we collect multiple week long datasets and compare them to each other, and also to the dataset that was collected roughly one year ago by Martin Lindblom and Oscar Järpehult (2019). The aim is to compare the different datasets to get an longitudinal studie of link usage on Twitter. We collected in total three seven-days datasets, the first dataset was collected between 18-25/3 2020, the second 1-8/4 2020. The last dataset was collected between 18-25/4, which is most close to the same period last year (26/4-3/5 2019) and the primary collection we will focus on in our results.

3.1 Dataset

The datasets where stored locally and every tweet we extracted the data needed for the set. To get data for number of retweets we also collected extra data for each tweet 24 hours after it was collected the first time, in case that a Bitly-link was found in the tweet the data would then also add information about the Bitly-link.

Collection The collection of the data was divided up between three different .csv files, these are "collec- tion_date.csv", "bitly_date.csv" and "retweet_date.csv". In the collection file we have all of the twitter data, in the Bitly file all the Bitly related data where collected and in the retweet file all of the retweet information where collected. Each file contains the information from four hour intervals. Since we did week-long collections each file is split up into 42 different files. When working with this data we merged all of these files to end up with one big .csv file called dataset.csv. This dataset.csv contains 28 different headers. Below we list the different headers and what information was collected with each one.

• tweet_id - The unique id for the twitter account • tweet_created_at - When the tweet was created • tweet_place_id - The place id from where the tweet is posted • tweet_place_full_name - The full name of where the tweet was posted from

9 3.1. Dataset

• tweet_place_country_code - The country code of where the tweet was posted from

• tweet_geo_coordinates - The geographical coordinates from where the tweet was posted from

• tweet_language - The language of the tweet

• tweet_hashtags - The hastags in the tweet

• tweet_urls - URLS in the tweet

• tweet_contains_retweet - If the tweet is a retweet or not

• tweet_in_reply_to_status_id - If the tweet is a replay to antoher tweet

• user_id - The id of the user that created the tweet

• user_created_at - When the user account was created

• user_followers_count - How many followers the user has

• user_friends_count - How many friends the user has

• user_statuses_count - How many tweets the user has posted

• user_favourites_count - That amount of tweets the user has favourited

• user_verified - If the user is verified or not

• user_language - The users language

• retweet_count - The amount of retweets the tweet has gotten

• retweets_retrieved_at - The date when the amount of retweets was retrived

• bitly_all_clicks- The number of clicks a bitly link has recived

• bitly_twitter_clicks - The amount of clicks the bitly link has recieved from twitter

• bitly_all_clicks_since_posting - The amount of click the bitly link has recived since it was tweeted

• bitly_twitter_clicks_since_posting - The amount of clicks the bitly link has recived from twitter since the tweet was posted

• bitly_end_url_string - The URL that the bitly link redirects to

• bitly_created_at - When the bitly ink was created

• bitly_data_retrieved_at- When the bitly data was collected

The data collection is split into two phases. In the first phase Twitter’s streaming API (Ap- plication Programming Interface) is used to collect as many tweets as possible together with information about each tweet such as when it was tweeted and who posted the tweet. After 24 hours, the second phase took place, where information about retweets of these particular tweets where collected. The second phase also collected information about the URLs that the link shorteners in the tweets redirected to. For every link, three things where important to collect: all clicks from all sources, all clicks from Twitter, clicks from all sources since the post- ing of the tweet and clicks from Twitter since the posting of the tweet. In case it was a Bitly link, specific information about the embedded Bitly link, including various click statistic. To filter out the Bitly links was the easier part using the Bitly API, it was harder to identified all link shorteners and look up their full URLs, some URLs redirected to an invalid page. For

10 3.2. Collection approach all invalid pages, we decided to no include these in to our analysis of shorteners. In total we collected over 11 million link shorteners. From this framework a longitudinal measure- ment study of tweets posted on Twitter could be done. Last year, a data collection from a bit more than 25 million tweets over the span of seven days where collected between 26/4-3/5 2019. This report will use the framework developed last year and also the results that were carried out to collect new data sets and analyze the difference at different time periods. For the interested reader we would like to refer to the report Longitudinal measurements of link us- age on Twitter written by Oscar Järpehult and Martin Lindblom for more details on how the framework was implemented[16].

3.2 Collection approach

The main aspect is to analyse how link shorteners and mostly Bitly links are used on Twit- ter and to understand how tweets with links are retweeted and clicked, also who use URL shorteners and their Twitter behavior. The other aspect is whether some of these links have connections to Covid-19, how large or not that ratio is and what kind of info and web pages about the subject these links are linked to. We want to see how this has looked in relation to other tweets as well as how that behavior has changed from last year.

3.3 Limitations

Due to the limited time available, it led us to delineate us to do only three data collections. Also the fact that it regards a huge amount of data, some problems with the network con- nections and working with external APIs we ran into some limitations in our workflow that affected the output that is stated below. As the Covid-19 pandemic was ongoing during all our collections, it was inevitable not to include it in our calculations, but it is the only special thing during our collection period that we have taken into account. We have not taken into account if it was some other types of special events during this time that might had an impact on the data.

Dataset collection Our data collection is limited to a narrow time period, which means that there may be pat- terns or other user behavior just during our collection period that do not really fully reflect reality. We also chose to use the free version of the Twitter API which gives us only limited access to all tweets, streaming realtime tweets will return around 1 % of all tweets posted at any timed with the ability to add custom filters to the stream [24]. Other analyzes that have used the pay version are more higher-level analysis than ours, so we have considered that the free version is sufficient for our analysis.

Covid-19 relations To find tweets with URL shorteners regarding the Corona virus we put out markers and searched for hashtags or URL names containing: COVID19, covid19, Covid-19, covid-19, Coro- navirus, coronavirus, pandemic and Pandemic. Of course, there are several links and hashtags that are sure to affect the virus that we missed. Please note that we have only used English spellings of the pandemic, but the virus name Covid-19 and Corona are used worldwide. English is generally the most widely used language on Twitter (34 % of all tweets are written in English [22]), in order to get a more fair world-wide picture, we would have had to look at all the spellings there is of the word pandemic.

11 4 Results

We will first present the results of the longiudinal measurment from the collections from 2019 and 2020, where the results from 2019 are taken from the report Longitudinal measurements of link usage on Twitter by Oscar Järphult and Martin Lindblom. Table 4.1. below summarizes the fraction of the total number of tweets collected, we have used the same list as Oscar and Martin did to which domains that are considered as shorteners (see the list in Appendix). Results from the two collections from April is stated in the Appendix. Table 4.1 shows that we roughly gathered 8 million tweets more 2020 than 2019, though in 2019 it was collected significantly more Bitly tweets. The final part of the result is about Covid-19 relation analysis.

Category 2019 2020 All Tweets 25,482,108 (100%) 33,281,088 (100%) Link Tweets 4,026,101 (15.8%) 3,803,233 (11.4%) Shortener Tweets 322,954 (1.27%) 310,915 (0.93%) Bitly Tweets 159,143 (0.625%) 52,517 (0.158%)

Table 4.1: Amount of tweets collected at the various time occasions divided into categories.

4.1 High-level link shortener usage

The first section shows the high-level link shortener usage. For all collections twitter.com is naturally the most frequent domain, this is because each retweets contains the URL of the original tweet. The figures in this sections lists the 20 most common domains and shorteners. Figures 4.1 and 4.2 displays that youtu.be and bit.ly where common shorteners both years. youtu.be is Youtube’s own link shortener, that only points to Youtube videos. Two differences that can be distinguished is that du3a.org (a website which automatically post Islamic prayers to Twitter) that had 187580 occurrence’s 2019 is not at all in the top 20 in 2020, neither in the two first collections from 2020 (see Appendix Table A.4). The other thing is that facebook.com goes from fourth place to seventeenth place between the years. The top shortener domains from both years are almost the same, this can be seen in Figures 4.3 and 4.4.

12 4.1. High-level link shortener usage

Figure 4.1: Top 20 most frequent domains overall (2019).

Figure 4.2: Top 20 most frequent domains overall (2020).

13 4.1. High-level link shortener usage

Figure 4.3: Top 20 most frequent domains for shortener domains (2019).

Figure 4.4: Top 20 most frequent domains for shortener domains (2020).

14 4.2. Domain statistics

4.2 Domain statistics

This section regards the domains and understanding what domains people tend to link on Twitter. We have used different methods, we extracted the long URL that each link shortener directed to and analyzed the frequencies and popularity of these domains.

Top domains From the framework three different sets where constructed, all links, link shorteners" and Bitly links. The figures show the number of occurrences of the top 20 for all three sets respec- tively. Here is also the Alexa and Majestic top 1 million domains rankings included, when they are not available we list "-".

Domain Occur. Alexa Maj. Domain Occur. Alexa Maj. 1 twitter.com 2167059 12 4 1 twitter.com 2134010 55 4 2 du3a.org 187580 - - 2 .com 213002 2 3 3 youtube.com 147359 2 3 3 instagram.com 50572 32 5 4 facebook.com 123883 3 2 4 peing.net 46834 17513 185762 5 instagram.com 57117 15 7 5 onlyfans.com 34244 1060 25602 6 showroom-live.com 42356 4156 77018 6 open.spotify.com 23339 - 155 7 curiouscat.me 41691 4915 168225 7 twittascope.com 21163 300510 - 8 peing.net 25007 6472 228312 8 fllwrs.com 19676 321649 - 9 twittascope.com 23174 301905 - 9 naver.me 18346 82756 19751 10 dlvr.it 19700 - 11127 10 family.co.jp 18327 665828 24624 11 fllwrs.com 17613 59565 831014 11 dlvr.tv 17812 - 14409 12 open.spotify.com 14160 - 219 12 twitch.tv 16106 33 331 13 lawson.co.jp 13264 35589 17836 13 twitcasting.tv 14322 5932 46809 14 twcm.co 12265 - - 14 ift.tt 11693 - 10073 15 naver.me 11310 177121 23425 15 facebook.com 11185 7 1 16 pscp.tv 10895 2836 1428 16 news.livedoor.com 11081 - 8934 17 blbrd.cm 9964 210800 - 17 twtcom.co 10076 - - 18 swarmapp.com 9752 73711 29610 18 pscp.tv 9828 16945 1785 19 cas.st 8642 - - 19 curiouscat.me 9534 16634 150182 20 shindanmaker.com 8326 8309 32267 20 headlines.yahoo.com 8839 - -

(a) Top domains for all links (2019). (b) Top domains for all links (2020).

Table 4.2: Top 20 most frequent domains for all links.

In Table 4.2 above can we see that du3a.org that was the second most occurred link in 2019 is not even in the top 20 in 2020. Another difference is facebook.com that in 2019 had 123883 occurs and then in 2020 drops to 11185 occurs.

15 4.2. Domain statistics

Domain Occur. Alexa Maj. Domain Occur. Alexa Maj. 1 youtube.com 117912 2 3 1 youtube.com 177203 2 3 2 twittascope.com 23173 301905 - 2 twittascope.com 21160 300510 - 3 lawson.co.jp 13226 35589 17836 3 k.kakaocdn.net 3565 - - 4 k.kakaocdn.net 5457 - - 4 goo.gl 1846 3320 6 5 img1.daumcdn.net 5168 - - 5 linkedin.com 1661 75 6 6 linkedin.com 2327 43 6 6 img1.daumcdn.net 1535 - - 7 instagram.com 2137 15 7 7 dolk.jp 1322 - - 8 t1.daumcdn.net 1846 - - 8 t1.daumcdn.net 1264 - - 9 .com 1521 16 43 9 akindo-sushio.co.jp 1256 17307 191994 10 youtu.be 1510 31627 14 10 easyriders.jp 1247 - - 11 .com 1343 1 1 11 shop.funko.com 1215 - - 12 cards.twitter.com 1195 - - 12 drive.google.com 1132 - 36 13 extratv.com 1106 118225 12330 13 rbeiv.com 801 - - 14 el-nacional.com 980 1253 8829 14 go.onelink.me 582 - - 15 mayla.jp 927 - - 15 music.bugs.co.kr 576 - - 16 facebook.com 884 3 2 16 rbeja.com 527 - - 17 54.202.34.80 861 - - 17 duratexintl.com 518 - - 18 careerarc.com 822 70517 91327 18 youtu.be 512 10612 13 19 uls.her.jp 805 - - 19 str2b.openstream.co 484 - - 20 drive.google.com 792 - 39 20 rbejc.com 482 - - (a) Top domains for shortened links (2019). (b) Top domains for shortened links (2020).

Table 4.3: Top 20 most frequent domains for shortened links.

In Table 4.3 above there is not much difference between the results, lawson.co.jp had high occurrence (third place) in 2019 but is not in the top 20 in 2020. The same trend we seen before can also be seen here that facebook.com does not occurs in the results from 2020.

Domain Occur. Alexa Maj. Domain Occur. Alexa Maj. 1 twittascope.com 23173 301905 - 1 twittascope.com 9374 300510 - 2 lawson.co.jp 13226 35589 17836 2 k.kakaocdn.net 2449 - - 3 k.kakaocdn.net 5457 - - 3 img1.daumcdn.net 1085 - - 4 img1.daumcdn.net 5164 - - 4 t1.daumcdn.net 777 - - 5 instagram.com 2133 15 7 5 rbeiv.com 697 2 3 6 t1.daumcdn.net 1843 - - 6 drive.google.com 684 - 36 7 reddit.com 1518 16 43 7 youtube.com 665 2 3 8 google.com 1333 1 1 8 shop.funko.com 629 - - 9 youtu.be 1320 31627 14 9 easyriders.jp 593 - - 10 cards.twitter.com 1194 - - 10 akindo-sushiro.co.jp 591 17307 191994 11 extratv.com 1106 118225 12330 11 dlvr.it 385 - 14409 12 youtube.com 1040 2 3 12 youtu.be 366 10612 13 13 el-nacional.com 980 1253 8829 13 go.onelink.me 319 - - 14 mayla.jp 927 - - 14 rbapc.top 311 - - 15 54.202.34.80 861 - - 15 rbtoe.com 289 - - 16 facebook.com 824 3 2 16 dscygl.xyz 282 - - 17 careerarc.com 822 70517 91327 17 facebook.com 280 7 1 18 uls.her.jp 805 - - 18 rbeja.com 278 - - 19 drive.google.com 789 - 39 19 lin.ee 275 - 8042 20 cdiscount.com 781 780 11783 20 str2b.openstream.co 330 - - (a) Top domains for Bitly links (2019). (b) Top domains for Bitly links (2020).

Table 4.4: Top 20 most frequent domains for Bitly links.

The top 20 most frequent domains for Bitly links in 2019 versus 2020 is listed above in Table 4.4. Nor can we see such a big difference except for a few, in 2019 instagram.com got several hits using Bitly links, similar results can not be seen in 2020.

16 4.2. Domain statistics

Popularity Distribution We also want to present how the distribution of links looks like with similar global rank according to Alexa and Majestic. This by distribute the domain in every link assigned to one of the following classes: Alexa[1-10]; Alexa[11-100]; Alexa[101-1K]; Alexa[1001-10K]; Alexa[100001-100K]; Alexa[100001-1M]; other [nonranked]. Figures 4.5 displays Alexa 1 mil- lion and Figures 4.6 Majestic 1 million with the same classes.

(a) 2019. (b) 2020.

Figure 4.5: Link popularity distribution to domains of different popularity classes, as defined using the Alexa top-1M lists.

(a) 2019. (b) 2020.

Figure 4.6: Link popularity distribution to domains of different popularity classes, as defined using the Majestic top-1M lists.

Domain frequencies The figures show the Cumulative Distribution Function (CDF) and Complmentary CDF (CCDF) respectively, of the fraction of links that different ranked domains are responsible for. The results from both years are almost identical and we can see in Figures 4.7a and 4.8b that a small number of domains make up a large part of all links and that all classes show the same pattern. The straight line shape in Figures 4.7b and 4.8b suggest that the distribution are power-law line [20].

17 4.2. Domain statistics

(a) CDF (2019). (b) CCDF (2019).

Figure 4.7: Distribution of domain rank (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.8: Distribution of domain rank (2020).

Relative ranks and frequencies Figure 4.9 displays a pairwise scatter plots showing the frequencies and ranks of the top-25 domains sets based on All Links, Shortened Links, Bitly Links, Alexa and Majestic for 2019 and 2020. Blue circles are used for domains with known ranks and red crosses at rank 106are used to illustrate domains with unknown rank.

18 4.3. User statistics

Figure 4.9: The results from 2019 is found below the vertical divider in pink and 2020 above in blue.

Phishing domains We ran all our collections through the of Phishtank (a service that enables users to report and review suspicious phishing sites) but did not find any matches with our links.

4.3 User statistics

In this section we will present the results of how users use link shorterners. We will take a closer look to age of the account, number of tweets favourited by users, how many tweets the user posted and how many followers the user has. Last we will look at verified users.

Age Figures 4.10 and 4.11 show the distribution of the age of accounts for users at the time of post- ing their tweet. A conclusion that can be drawn from both results is that tweets containing links are more likely to belong to a rather old account, this is especially true for Bitly links.

19 4.3. User statistics

(a) CDF (2019). (b) CCDF (2019).

Figure 4.10: Distribution of the age for users account at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.11: Distribution of the age for users account at the time of posting their tweet (2020).

Favourites In this section, we look at how users of link shorteners favorites other tweets, the probability that a user of link shortcuts favorites other tweets. We note that Bitly tweets more frequently are posted by users that have favourited less tweets over both years in Table 4.12a and 4.13a. The CCDF plotted in the same figures to the right (Figures 4.12b and 4.13b) show that "All Tweets" and "Link Tweets" are more back-heavy (first mentioned more than the second) when it comes to favorite other tweets.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.12: Distribution of the number of tweets favourited by users at the time of posting their tweet (2019).

20 4.3. User statistics

(a) CDF (2020). (b) CCDF (2020).

Figure 4.13: Distribution of the number of tweets favourited by users at the time of posting their tweet (2020).

Number of tweets Figures 4.14 and 4.15 will give an overview how many tweets that the user has tweeted at the time of posting their tweet we collected. Figures 4.14.a and 4.15.a show that there is no specific type of tweet from our categories that has been tweeted more in the past, but in 2019 we can see that in the end of the spectrum before the last jump there is a clear difference. Figures 4.18b and 4.19b tells the same story as above but here is a clear difference between the years, in 2019 there is a huge jump in probability, a trend that can not be seen in 2020.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.14: Distribution of the number of tweets posted by users at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.15: Distribution of the number of tweets posted by users at the time of posting their tweet (2020).

Favourites to tweets In this section we look closer at the relation between tweeting and interaction of tweets with the ratio between tweets favourited and tweeted by users at the time of posting their tweet. In Figures 4.16a and 4.17a we can see a difference in that users that posted a Bitly link in

21 4.3. User statistics general tweet more than they retweet other tweets, the trend is though larger in 2019. Figures 4.16b and 4.17b shows the same thing.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.16: Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.17: Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2020).

Followers Another interesting user statistic is to analyze how many followers the user had at the time of posting their tweets, the results are shown in the graphs below. A follower is someone that follows the user. In Figure 4.19b we can see that in 2020 results all categories has almost the same probability to have many followers, for the results in 2019 (Figure 4.18b) users that tweeted Bitly links has a higher probability to have more followers than the users of the other categories. This conclusion can also be seen in the same Figures to the right (Figures 4.18b and 4.19b), for both years are tweets containing Bitly links a higher probability to have more followers. Though note that users that had the most followers did not post tweets containing tweets.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.18: Distribution of the number of followers for users at the time of posting their tweet (2019).

22 4.3. User statistics

(a) CDF (2020). (b) CCDF (2020).

Figure 4.19: Distribution of the number of followers for users at the time of posting their tweet (2020).

Friends A friend is an account that a user follows on Twitter. The tables below show the distribution of the number of friends for users at the time of posting their tweet. The results show the same similarity as for followers above, that for 2019 there is a little more noticeable difference between the different categories than for 2020 and that for both years the accounts that have most friends are those that tweet without links.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.20: Distribution of the number of friends for users at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.21: Distribution of the number of friends for users at the time of posting their tweet (2020).

Followers vs friends These scatter plots below in Figure 4.22 shows a followers-to-friends ratio or the so-called golden ratio for users at the time of posting their tweet. Every Twitter account can follow up to 5,000 accounts. Once you reach that number, you may need to wait until your account has more followers before you can follow additional accounts [26]. This friend gate are illustrated in the figures below (red-line) and with a equality-rate (green-line). The results from both

23 4.4. Bitly link interaction year are similar with a trend that around equal following but with some more users with more followers than friends. However, what we can see here as shown earlier, is a downward trend for the use of Bitly links.

(a) 2019. (b) 2020.

Figure 4.22: Followers-to-friends ratio for users at the time of posting their tweet.

Unique and verified users The table below shows how many unique and verified users there where in the data sets from 2019 and 2020.

Category Unique Users 2019 Unique Users 2020 Verified 2019 Verified 2020 All Tweets 12,253,599 15,247,424 53,326 (0.44%) 61,614 (0.4%) Link Tweets 2,905,502 2,828,482 28,736 (0.99%) 32,576 (1.15%) Shortener Tweets 245,984 267,963 2,859 (1.16%) 2,373 (4,261%) Bitly Tweets 112,682 46,307 1,856 (1.65%) 775 (1,303%)

Table 4.5: Amount of unique users for each category and how many of those users that are verified.

From the results in regards to percentage of verified users in Table 4.5 we can see that for All Tweets and Link Tweets there is just a small difference between the years. When init comes to Shortener Tweets and Bitly Tweets we can see clearly the decrease in the Bitly link usage over the past year.

4.4 Bitly link interaction

This section shows the number of clicks for a Bitly link found in tweets that relates to the number of retweets for the same tweet. Figures 4.23 shows two scatter plots of Bitly clicks-to- retweets-ratio and figures 4.24 the logarithmic average of Bitly clicks per retweet count for all Bitly links. In both Figures 4.23 and 4.24 we can see a rather compact cluster right below the black "Equal ratio" line, for Figures 4.23 that tells us that a lot of tweets have more retweets than clicks on the embedded Bitly link and for Figures 4.24 that tweets with fewer retweets than 30 tend to have a higher clicks-to-retweets. If we look at the difference between the years in both Figures 4.23 and 4.24 we again can see the same pattern as shown before, that there is a big reduction in Bitly tweets overall.

24 4.5. Verified vs non-verified users

(a) 2019. (b) 2020.

Figure 4.23: Two scatter plots of Bitly clicks-to-retweets-ratio.

(a) 2019. (b) 2020.

Figure 4.24: Logarithmic average of Bitly clicks per retweet.

4.5 Verified vs non-verified users

This section will see if there are any noticeable differences between verified and non-verified users.

Bitly Clicks The figures below shows the number of clicks to posts made by users with different number of followers, this was done to how many followers a user has in relation to how many clicks the user gets. For verified users that are shown in Figure 4.25 the clicks-to-followers ratio for Bitly links are very similar, though might a bit more clicks for users with a smaller amount of followers in 2019.

25 4.5. Verified vs non-verified users

(a) 2019. (b) 2020.

Figure 4.25: Clicks-to-followers ratio for Bitly links for verified users.

Figure 4.26 below tells the same story but now instead for non-verified users. In 2019 we can see that non-verified users tend to get up to 10,000 clicks even if they do not even have more than 100 followers. Figure 4.26 show that the results is a bit different in 2020 compared to 2019, even here, those who have few followers can get many clicks, even more clicks than was shown the year before. But the scale is only a fraction of 2019, so the clicks are more but to a much lesser extent.

(a) 2019. (b) 2020.

Figure 4.26: Clicks-to-followers ratio for Bitly links for non-verified users.

Followers, number of tweets and retweets To compare different user statistics between non-verified and verified users we will show three pairs of heat-maps. The first one is retweet versus tweets tweeted, second is retweets versus followers tweeted and the last one is followers versus number of tweets tweeted.

Retweet versus Followers Tweeted The heat-map in Figures 4.27 and 4.28 present a difference between non-verified and verified users for both years an indication of a vertical streak of higher density for non-verified users while there is a more of a blob formation for verified users. This indicates that for non-verified that there are many accounts around the same number of followers that have received very different amounts of retweets. For verified users indicates that many accounts that have the same amount of followers also received the same amount of retweets. This pattern can be seen for both 2019 and 2020.

26 4.5. Verified vs non-verified users

(a) Non-verified users (2019). (b) Verified users (2019).

Figure 4.27: Heat-map of retweets vs followers tweeted (2019).

(a) Non-verified users (2020). (b) Verified users (2020).

Figure 4.28: Heat-map of retweets vs followers tweeted (2020).

Retweet versus tweets tweeted Figures 4.29 and 4.30 above shows the difference between retweets verse the number of tweets for non-verified and verified users. Both years indicate that the relation for retweets and number of tweets looks somewhat the same as the relation between retweets and followers.

(a) Non-verified users (2019). (b) Verified users (2019).

Figure 4.29: Heat-map of retweets vs number of tweets tweeted (2019).

27 4.5. Verified vs non-verified users

(a) Non-verified users (2020). (b) Verified users (2020).

Figure 4.30: Heat-map of retweets vs number of tweets tweeted (2020).

Followers versus number of tweets tweeted Figures 4.31 and 4.32 displays the number of tweets that we observed on the total number of tweet (over the lifetime) of the user making that post and the number of followers that user has (at the time the tweet was made). Verified users tend to have more followers than normal tweets, the result are almost linear but seem to have more tweets than followers.

(a) Non-verified users (2019). (b) Verified users (2019).

Figure 4.31: Heat-map of followers vs number of tweets tweeted (2019).

(a) Non-verified users (2020). (b) Verified users (2020).

Figure 4.32: Heat-map of followers vs number of tweets tweeted (2020).

28 4.6. Covid-19 analysis

4.6 Covid-19 analysis

This section displays the results we got searching for tweets with URL shorteners containing and/or hashtags with any of the specific Covid-19 related words (listed in the method). We will present data from all three collections made in 2020 (18-25/3, 1-8/4 and 18-25/4). We can see that there was a lot more Covid-19 related tweets that we collected in the first occasion. Between 18-25/4 was 4.7% of all link tweets Covid-19, for 1-8/4 and 18-25/5 was the same proportion 3.5% respective 2.5%.

Category 18-25/3 1-8/4 18-25/4 Link Tweets 3,786,543 (100%) 3,788,332 (100%) 3,803,233 (100%) Link Tweets with Covid-19 179,279 (4.7%) 132,056 (3.5%) 96,663 (2.5%) Link Tweets without Covid-19 3,607,264 (95.3%) 3,656,276 (96.5%) 3,706,570 (97.5%)

Table 4.6: Amount of links tweets related to Covid-19.

Bitly make up approximately 1.6% of all link tweets in the respective collection. The pro- portion of links that belonged to Bitly with URL or/and hashtags with Covid-19 was 12.4% for 18-25/3, 8.5% for 1-8/4 and 5.7% for 18-25/4. In both table 4.6 and 4.7 we can see that dur- ing the first period most Covid-19 related links occurred that we collected with our method.

Category 18-25/3 1-8/4 18-25/4 Bitly Links 65,973 (100%) 61,153 (100%) 52,517 (100%) Bitly Links with Covid-19 8,192 (12.4%) 5,224 (8.5%) 2,970 (5.7%) Bitly Links without Covid-19 57,781 (87.6%) 55,929 (91.5%) 49,547 (94.3%)

Table 4.7: Amount of Bitly links tweets related to Covid-19.

The scatter plots below shows the ratio between clicks and retweets for all 3 collections made during 2020. Figures 4.33 shows the clicks-to-retweets-ratio for all the Covid-19 related tweets and Figures 4.34 shows the same thing but in a more zoomed out perspective to show that there are some single tweets that have received several thousand more retweets than the majority of all tweets. Interestingly this is recurring for all 3 collection periods. We can also tell that users tend to click and retweets Covid-19 related tweets as much. In Figure 4.35 we have created a CDF graph for the ratio between clicks and retweets for every collection made in 2020. Here we can see that Covid-19 related links and/or hashtags have a higher probability of a higher ratio than the non Covid-19 related links and hashtags.

29 4.6. Covid-19 analysis

(a) 18-25/3. (b) 1-8/4.

(c) 18-25/4.

Figure 4.33: Scatter plots for all 3 collections 2020 of Covid-19 clicks-to-retweets-ratio.

(a) 18-25/3. (b) 1-8/4.

(c) 18-25/4.

Figure 4.34: Scatter plots for all 3 collections 2020 of the overall clicks-to-retweets-ratio.

30 4.6. Covid-19 analysis

(a) Non Covid-19 related links or hashtags. (b) Covid-19 related links or hashtags.

Figure 4.35: CDFs of the ratio between clicks and retweets for tweets containing Covid-19 related links or hashtags and non Covid-19 related links and hashtags.

31 5 Discussion

This chapter contains discussion regarding the results obtained in this paper as well as the methods attempted. Also, the work is discussed in a wider context.

5.1 Results

Performing this longitudinal study we can see that most of the results obtained are quite alike, but there are some interesting differences worth mentioning. In the high-level link shortener usage, one that really stuck out in 2019 was dua3a.org, a website which automatically post Islamic prayers to Twitter was one second place in 2019 but did not occur at all in 2020. Another difference that could be seen in many of the figures was that facebook.com occurrence lost quite a lot in 2020 compared to 2019. Bitly (bit.ly) lost first place as a link shortener and was replaced in 2020 by youtu.be which was in second place in 2019, other link shorteners are not used as frequently as Bitly and Youtube. Though, it is worth mentioning that our last collection is the one when we captured least Bitly links in total, the first collection we had a little over 13 thousand more, but there are still not so many that were collected 2019 when they in total had 159143 Bitly tweets compared to 52517 in 2020. In the most frequent domains for Bitly links 2019, lawson.co.jp (a Japanese convenience store) had second most occurrences, but did not occur at all in the top 20 rank 2020. In the popularity distribution, how the distribution of links looks like with similar global rank according to Alexa and Majestic the results are almost exactly the same both years. Regarding the user statistics, there was also some similarities but also some trends for each year. Distribution of the age for users account at the time of posting their tweet, the same conclusion can be drawn for both years that tweets containing links are more likely to belong to a rather old account. In the analysis of the distribution of the number of followers for users at the time of posting their tweet there was a trend for both years that users that tweet using Bitly has a higher probability to have more followers that any other category, although most followers had those that tweet without links. However, there was though a difference in how may friends a user has a the time of posting their tweet, in 2019 Bitly tweeters had higher probability to have more friends than for users using Bitly in 2020. To investigate a little more about why Youtube’s own link shortener have become so much more popular in 2020, we looked at how many of the Bitly links that point to Youtube links in each year. It turned out that at the end of 2019, 2377 of Bitly pointed out the links to Youtube, in 2020 the corresponding figure was 1034. This year, users

32 5.2. Method are increasingly choosing Youtube’s own link shorteners to distribute their films and clips. The scatter plots with follwers-to-friends ratio for users at the time of posting their tweet also showed a downturn for Bitly usage, what was interesting here was that this conclusion only seems to be made for those who have a lot of followers, those who have a smaller amount of followers still seem to use Bitly as a link shortener to the same extent as last year. The difference between a verified users and non-verified users were analysed in the last section in this topic. First we looked at how many clicks Bitly links gets depending on how many followers the users has, the ratio between how many clicks a Bitly link gets are almost the same for the years. But we can see clearly also here that the usage of Bitly links has decrease a lot. In the Covid-19 analysis one observation from this is that there was a quite higher hit- rate in the first collection that where 18-25/3 in comparison with the data-sets collected just a few days later, so it is a fairly radical reduction in tweets that includes something about the pandemic in our brief analysis. With the fact that we only look at a few specific words that are generally used worldwide in relation with the pandemic, we miss out on capturing several that would also fit into our survey, but we still find a fairly large number of tweets related to subject. The scatter plots showed an interesting phenomenon that we have not seen before. In the Bitly link interaction section we saw clearly that users tend to retweet to a much greater extent than actually clicking on the links, this trend was the opposite when it came to Coivd-19 related tweets where you click more than retweet instead. Perhaps this can actually show that in this case you choose to be more careful about reading what you share about when it comes to these things, or in which case you choose to read them to a greater extent.

5.2 Method

We are satisfied with our methodology. We were able to get more than one collection, of course it would have been better to have more data-sets to analyze if there was more time. Another improvement could also be to vary the length of the data-set instead of only seven days of collecting. Maybe the results would be different if you compare week and weekends, or month to month and so on. Another thing to note is when we downloaded the Alexa and Majestic lists respectively was this done on the last day of each collection. Possibly it would have been better to do this in the middle of collection instead, or make a comparison if it can differ greatly depending on the day of the week or if there are some other special events just that day that can affect the results. Something we now retrospectively considered after analyzing our results is that we should perhaps have spent more time investigating if the list we used as which domains we considered were link shorten how good it is for this year, the list was taken from last year. It would have been worthwhile to see if it needs updating, among other things we saw that reddit.com has developed its own link shortener (redd.it) which we have seen used a lot to shorten just Reddit links. Regarding the Covid-19 analysis there are somethings that could use some improvement, if you look at any Twitter feed you can see quickly that there are thousands of ways and above all many more terms on the pandemic than we have been looking for (other terms and language etc.). Another improvement would be to also include the domain that the URL shortener directs to. This to see what kind of pages users look for information in and if those are reliable and accurate in their info about the pandemic. Given more time and also because of that the pandemic escalated half way in to our work period this is definitely something we would liked to have time to investigate more about.

5.3 The work in a wider context

In a wider context this project can give us a better understanding of sharing habits and link interaction of users on Twitter. It can also help us understand how information, thoughts and

33 5.3. The work in a wider context ideas are being shared. It can provide important information on when and why users interact with links and what kind of links gets more or less attention then others. We have also seen in this paper how the framework can be used in different ways outside of the primary focus which is link use and how to study other users behaviors. The method for the Covid-19 analysis can be developed to look at, for example, more dissemination of misinformation and to ask for a basis for how this can be counteracted in the future. This is also generally true when it comes to sharing information on the internet. Twitter has been working actively especially now in the spring of 2020 to find better ways to avoid the spread of misinformation [6]. When a tweet is flagged in their algorithms, a yellow warning text is added below the tweet to mark that the information is classified as incorrect. With our work on link shorteners, this may help to easily find incorrect information as the shortened URLs are better at hiding.

34 6 Conclusion

The purpose of this paper has been to generate and collect tweet traffic, looking for tweets containing URL shorteners and especially Bitly links. The analysis is an extension of a work that started already last year, the main goal of our paper is to add the longitudinal compari- son of, among other things, domain statistics, user behavior, verified users and non-verified users. The purpose has been met to a great extent and we will in this chapter state some key conclusion and look at things that can be improved in a future study.

• There where 8 million more tweets collected in 2020 compared to 2019, though in 2019 0.62% of all tweets where tweets containing Bitly links, the same number was 0.16% in 2020. In other words, the usage of Bitly links has decrease drastically according to our analysis. The most frequent link shortener in 2020 according to our analysis is youtu.be.

• A note on the above point is that the trend of using link tweets and shortener in general has not decreased, just the usage of Bitly as link shortener.

• A pattern found is that just because a domain is highly ranked at Alexa or Majestic it does not always mean that it gets retweeted more often. The top domains for all links are quite similar both years.

• Our surveys show that the more tweets you make, the more followers you tend to have, this trend is visible in both 2019 and 2020.

• Our Covid-19 study showed that the showed that the demand and sharing of informa- tion about the virus drastically decreases in just a few weeks. It seems that the virus quotes as a classic news bubble, which is then quickly replaced with new ones. Or, users are tired of hearing more bad news (as the majority of news about the virus was during our data collection period) and you simply can not read more about bad things. As conclusion we can definitely determine that it would been interesting to do a deeper analysis on the subject.

• You tend to read tweets that contain information about Covid-19 rather than direct retweets without reading (for all tweets in general the trend was the opposite), an im- portant conclusion that can show that in some subjects we are better at not spreading misinformation (or one can hope that is the case).

35 6.1. Future work

6.1 Future work

For future research it would be interesting to continue this study over a longer time period. This paper shows that there are patterns and trends that change and evolve over time, but the time period is too short to be able to draw comprehensive conclusions. We have improved the method from last year when we made several smaller data collections during this spring in order to be able to exclude more weekly patterns over a shorter period. But of course we could always have done more, the more data, the more comprehensive the results. Another thing we want to highlight is the opportunities that comes with this framework. You can really specify what you are looking for and look for temporary trends, like we did with the current pandemic, it is also something that you would like to do a longer analysis to see if the trend is declining at the same rate throughout its period. The last thing is also to look at how the use of links looks on other similar social media platforms.

36 Bibliography

[1] About Twitter’s APIs. URL: https : / / help . twitter . com / en / rules - and - policies/twitter-api. [2] Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis, Elias Athanasopoulos, Sotiris Ioannidis, Evangelos P. Markatos, and Thomas Karagiannis. “Web: The web of short URLs”. In: Proc. of WWW. 2011. [3] API Reference Guide. URL: https://developer-support.majestic.com/api/. [4] Krishnamurthy Balachander, Phillipa Gill, and Martin Arlitt. “A few chirps about twit- ter”. In: Proc. of WOSN. 2008. [5] Bitly API (4.0.0). 2017. URL: https://dev.bitly.com/v4_documentation.html. [6] Ben Collins. Twitter is testing new ways to fight misinformation — including a community- based points system. 2020. [7] Counting characters. 2020. URL: https://developer.twitter.com/en/docs/ basics/counting-characters. [8] Anqi Cui, Min Zhang, Yiqun Liu, and Shaoping Ma. “Are the URLs really popular in microblog messages?” In: Proc. of IEEE CCIS. 2011. [9] Wang De, Shamkant B. Navathe, Liu Ling, Danesh Irani, Acar Tamersoy, and Calton Pu. “Click traffic analysis of short URL spam on Twitter”. In: Proc. of IEEE of CollaborateCom. 2013. [10] Hanif Fakhrurroja, Muhammad Nashir Atmaja, Joe Nathan C.G Panjaitan, Andry Alamsyah, and Aris Munandar. “Crisis Communication on Twitter: A Social Network Analysis of Christchurch Terrorist Attack in 2019”. In: Proc. of ICISS. 2019. [11] Maksym Gabielkov, Augustin Chaintreau, Arthi Ramachandran, and Arnaud Legout. “Social Clicks: What and Who Gets Read on Twitter?” In: Proc. of ACM SIGMETRICS. 2016. [12] Kiran Garimella and Ingmar Weber. “A Long-Term Analysis of Polarization on Twit- ter”. In: Proc. of ICWSM. 2017. [13] Neha Gupta, Anupama Aggarwal, and Kumaraguru Ponnurangam. “bit.ly/malicious: Deep dive into short URL based e-crime detection”. In: Proc. of APWG Symposium on Electronic Crime Research. 2014.

37 Bibliography

[14] Jesper Holmström, Daniel Jonsson, Filip Polbratt, Olav Nilsson, Linnea Lundström, Sebastian Ragnarsson, Anton Forsverg, Karl Andersson, and Niklas Carlsson. “Do we read what we share?: analyzing the click dynamic of news articles shared on Twitter”. In: Proc. of ASONAM. 2019. [15] Takeru Inoue, Toriumi Fujio, Shirai Yasuyuki, and Shin-ichi Minato. “Great east Japan earthquake viewed from a URL shortener”. In: Proc. of ACM SWID. 2011. [16] Oscar Järpehult and Martin Lindblom. “Longitudinal measurements of link usage on Twitter”. 2019. [17] Dixon Jones. Alexa top 1 Million sites is retired. Here’s the Majestic Million. 2016. URL: https : / / . majestic . com / development / alexa - top - 1 - million - sites-retired-heres-majestic-million/. [18] Vasileios Kandylas and Ali Dasdan. “The Utility of Tweeted URLs for Web Search”. In: Proc. of WWW. 2010. [19] Florian Klien and Markus Strohmaier. “Short links under attack: geographical analysis of spam in a URL shortener network”. In: Proc. of ACM conference on Hypertext and social media. 2012. [20] Aniket Mahanti, Niklas Carlsson, Anirban Mahanti, Martin Arlitt, and Williamson Carey. “A Tale of the Tails: Power-laws in Internet Measurements”. In: Proc. of IEEE Network (2013). [21] Silvia Marcos-Garcia, Laura Alonso-Muñoz, and Amparo López-Meri. Extending influ- ence on social media: The behaviour of political talk-show opinion leaders on Twitter. Tech. rep. 2020. [22] Most-used languages on Twitter 2013. 2013. URL: https : / / www . statista . com / statistics/267129/most-used-languages-on-twitter/. [23] Nick Nikiforakis, Maggi Federico, Gianluca Stringhini, M.Zubair Rafique, Wouter Joosen, Christopher Kruegel, Frank Piessens, Giovanni Vigna, and Stefano Zanero. “Stranger Danger: Exploring the Ecosystem of Ad-based URL Shortening Services”. In: Proc. of WWW. 2014. [24] Andy Piper. Potential adjustments to Streaming API sample volumes. URL: https : / / twittercommunity.com/t/potential-adjustments-to-streaming-api- sample-volumes/31628. [25] Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. “A Long Way to the Top: Signif- icance, Structure, and Stability of Internet Top Lists”. In: Proc. of ACM IMC. 2018. [26] Twitter. About following on Twitter. 2019. URL: https://help.twitter.com/en/ using-twitter/twitter-follow-limit. [27] Twitter. About verified accounts. URL: https://help.twitter.com/en/managing- your-account/about-twitter-verified-accounts. [28] Twitter API references. URL: https : / / developer . twitter . com / en / docs / tweets/search/api-reference/get-search-tweets. [29] Twitter URL-shortener. URL: https://help.twitter.com/en/using-twitter/ -shortener. [30] Twitter, About replies and mentions. URL: https://help.twitter.com/en/using- twitter/mentions-and-replies. [31] Twitter, Donald Trump. URL: https://twitter.com/realDonaldTrump. [32] Twitter, New user FAQ. 2020. URL: https://help.twitter.com/en/new-user- faq.

38 Bibliography

[33] Twitter, Rules and policies. URL: https://help.twitter.com/en/rules- and- policies. [34] Using Twitter, Following FAQs. URL: https://help.twitter.com/en/using- twitter/following-faqs. [35] Lin Ying. Twitter data, 10 Twitter Statistics Every Marketer Should Know in 2020. 2019. URL: https://www.oberlo.com/blog/twitter-statistics.

39 A Appendix

A.1 URL shorteners

Start of the list of all domains that were considered shorteners in this study.

• buff.ly • v.gd • bc.vc • short.to • clk.im • s2r.co • adf.ly • ping.fm • bit.ly • clicky.me • bit.do • digg.com • thinfi.com • lc.chat • mcaf.ee • post.ly • shor.tswit.ch • soo.gd • rebrandly.com • just.as • short.cm • tweez.me • su.pr • bkite.com • snip.ly • vzturl.com • y2u.be • snipr.com • zzb.bz • link.zip.net • youtu.be • flic.kr • tldrify.com • lnkd.in • goo.gl • loopt.us • adfoc.us • cur.lv • tinyurl.com • doiop.com • shorte.st • filoops.info • is.gd • twitthis.com • git.io • po.st • cli.gs • htxt.it • al.ly • qr.net • pic.gd • alturl.com • hec.su • x.co • dwarfurl.com • redirx.com • ph.dog • scrnch.me • ow.ly • digbig.com • tny.im • prettylinkpro.com • yfrog.com • short.ie • urlkr.com • ity.im • migre.me • u.mavrev.com • sptfy.com • yourls.org • ff.im • kl.am • shrinkee.com • cutt.us • tiny.cc • wp.me • shrinkurl.in • buzurl.com • url4.eu • u.nu • ouo.io • j.mp • tr.im • rubyurl.com • s.id • u.bb • twit.ac • om.ly • lynk.my • q.gs • su.pr • linkbee.com • good.ly • u.to • twurl.nl • yep.it • fur.ly • qr.ae • snipurl.com • posted.at • 7.ly • db.tt • budurl.com • xrl.us

40 A.1. URL shorteners

• metamark.net • shrinkify.com • lt.tl • shrten.com • sn.im • ri.ms • twirl.at • shorturl.com • hurl.ws • b23.ru • zipmyurl.com • urlao.com • eepurl.com • fly2.ws • urlot.com • a2a.me • idek.net • xrl.in • a.nf • tcrn.ch • urlpire.com • fhurl.com • hurl.me • goshrink.com • chilp.it • wipi.es • urlhawk.com • decenturl.com • moourl.com • korta.nu • tnij.org • decenturl.com • snurl.com • shortna.me • 4url.cc • zi.ma • xr.com • fa.b • firsturl.de • 1link.in • lin.cr • wapurl.co.uk • hurl.it • sharetabs.com • easyuri.com • urlcut.com • sturly.com • shoturl.us • zz.gd • 6url.com • shrinkster.com • fff.to • ur1.ca • abbrr.com • go2cut.com • hover.com • url.ie • simurl.com • liip.to • lnk.in • adjix.com • klck.me • shw.me • jmp2.net • twurl.cc • x.se • xeeurl.com • dy.fi • s7y.us • 2big.at • liltext.com • urlcover.com • easyurl.net • url.co.uk • lnk.gd • 2pl.us • atu.ca • ewerl.com • xzb.cc • tweetburner.com • sp2.ro • inreply.to • linkbun.ch • u6e.de • profile.to • tighturl.com • href.in • xaddr.com • ub0.cc • a.gg • urlbrief.com • gl.am • minurl.fr • tinytw.it • 2ya.com • dfl8.me • cort.as • zi.pe • safe.mn • go.9nl.com • fire.to • riz.gd • shrunkin.com • gurl.es • 2tu.us • hex.io • bloat.me • traceurl.com • twiturl.de • fwd4.me • krunchd.com • liurl.cn • to.ly • bacn.me • minilien.com • myurl.in • burnurl.com • shrt.st • shortlinks.co.uk • urlenco.de • nn.nf • tiny.pl • qicute.com • ne1.net • clck.ru • starturl.com • rb6.me • buk.me • notlong.com • jijr.com • urlx.ie • rsmonkey.com • thrdl.es • shorl.com • pd.am • cuturl.com • spedr.com • icanhaz.com • go2.me • turo.us • vl.am • updating.me • tinyarro.ws • sqrl.it • miniurl.com • kissa.be • tinyvid.io • iterasi.net • virl.com • hellotxt.com • lurl.no • tiny123.com • piurl.com • pnt.me • ru.ly • esyurl.com • 1url.com • nsfw.in • lru.jp • urlx.org • gri.ms • xurl.jp • rickroll.it • iscool.net • tr.my • yweb.com • togoto.us • twitterpan.com • sharein.com • urlkiss.com • clickmeter.com • gowat.ch • urlzen.com • qlnk.net • hugeurl.com • poprl.com • fon.gs • w3t.org • tinyuri.ca • njx.me

End of the list of all domains that were considered shorteners in this study.

41 A.1. URL shorteners

Shortener Domain Collected Looked Up bit.ly 164307 164211 youtu.be 116849 116719 goo.gl 14472 244 ow.ly 10059 419 buff.ly 8742 5 tinyurl.com 5871 5549 j.mp 3034 63 wp.me 2617 2374 lnkd.in 2385 2381 is.gd 2300 317 ouo.io 1005 148 po.st 400 39 cort.as 243 0 flic.kr 225 224 tcrn.ch 217 0 bit.do 187 68 eepurl.com 173 109 migre.me 116 115 snip.ly 108 106 bc.vc 83 79 tiny.cc 64 0 yfrog.com 57 57 git.io 25 25 x.co 21 21 alturl.com 18 18 db.tt 17 0 y2u.be 14 14 s.id 8 0 digg.com 7 0 icanhaz.com 5 0 tiny.pl 4 4 url.ie 4 4 tr.im 2 2 mcaf.ee 2 0 qr.ae 2 0 hec.su 1 1 nsfw.in 1 0

Table A.1: All collected shorteners sorted on domains and how many we were able to get the full domain from.

42 A.2. Collections from 18/3-25/3 and 1/4-8/4

A.2 Collections from 18/3-25/3 and 1/4-8/4

In this section we will present the results of our two collections based on the tweets collected and different subsets of the data, both collections are collected during a seven days period. The first data collection was made 18/3-25/3. The second data collection was made 1/4-8/4. Table A.2. below summarizes the fraction of the total number of tweets collected for also the third collection.

Category 18/3-25/3 1/4-8/4 18-25/4 All Tweets 32807085 33078822 33281088 Link Tweets 3786543 3788332 3803233 Shortener Tweets 273677 286235 310915 Bitly Tweets 65973 61153 52517

Table A.2: Amount of tweets collected at the various time occasions divided into categories.

High-level link shortener usage The first section shows the high-level link shortener usage. For both collections twitter.com is naturally the most frequent domain, this is because each retweets contains the URL of the original tweet. The figures in this sections lists the 25 most common domains and shorteners. In the first collection in Table 4.2 displays that youtu.be and bit.ly where the most common shorteners with 14271 respective 100052 occurrences. youtu.be is Youtube’s own link short- ener, that only points to Youtube videos.

Domain Occurrences Domain Occurrences 1 twitter.com 2204508 1 youtu.be 146271 2 youtu.be 146271 2 bit.ly 100052 3 bit.ly 100052 3 ow.ly 8411 4 instagram.com 46788 4 buff.ly 7192 5 peing.net 33178 5 tinyurl.com 5605 6 youtube.com 30920 6 goo.gl 5365 7 onlyfans.com 28688 7 is.gd 1868 8 open.spotify.com 22458 8 lnkd.in 1763 9 twcm.co 21494 9 wp.me 1481 10 fllwrs.com 20780 10 j.mp 1476 11 curiouscat.me 17969 11 ouo.io 696 12 dlvr.it 17077 12 tiny.cc 241 13 twitch.tv 13981 13 bit.do 198 14 ift.tt 12204 14 eepurl.com 172 15 naver.me 11476 15 flic.kr 138 16 news.v.daum.net 11074 16 migre.me 114 17 pscp.tv 11049 17 tcrn.ch 71 18 cas.st 10162 18 snip.ly 63 19 suho.smtown.com 9324 19 bc.vc 63 20 facebook.com 9279 20 yfrog.com 41 21 headlines.yahoo.co.jp 9091 21 tr.im 24 22 ow.ly 8411 22 sptfy.com 20 23 news.livedoor.com 7609 23 y2u.be 15 24 .nhk.or.jp 7323 24 v.gd 12 25 buff.ly 7192 25 alturl.com 10 (a) Top 25 most frequent domains. (b) Top 25 most frequent shortened domains.

Table A.3: Most frequent domains (18/3-25/3).

43 A.2. Collections from 18/3-25/3 and 1/4-8/4

The second collection in Table 4.2. results are similar to those shown above, only a few small differences between the data, the top 6 is the same for both domain and shorteners, except for tinyurl.com and goo.gl which has changed place in the second collection.

Domain Occurrences Domain Occurrences 1 twitter.com 2125425 1 youtu.be 155150 2 youtu.be 155150 2 bit.ly 103975 3 bit.ly 103975 3 ow.ly 8253 4 instagram.com 48517 4 buff.ly 7574 5 peing.net 37623 5 goo.gl 5280 6 youtube.com 32575 6 tinyurl.com 4909 7 family.co.jp 32352 7 wp.me 1666 8 onlyfans.com 32190 8 is.gd 1651 9 sumail.com 24360 9 j.mp 1646 10 open.spotify.com 23422 10 lnkd.in 1637 11 twcm.co 21172 11 ouo.io 721 12 fllwrs.com 19812 12 tiny.cc 270 13 dlvr.it 17655 13 eepurl.com 195 14 twitch.tv 14596 14 flic.kr 145 15 news.livedoor.com 11584 15 migre.me 135 16 ift.tt 11438 16 snip.ly 86 17 curiouscat.me 11182 17 bit.do 74 18 .nhk.or.jp 10869 18 bc.vc 67 19 naver.me 10782 19 tcrn.ch 63 20 cas.st 10559 20 yfrog.com 32 21 headlines.yahoo.co.jp 10497 21 v.gd 24 22 pscp.tv 10154 22 s.id 15 23 news.v.daum.net 9438 23 sptfy.com 14 24 facebook.com 9272 24 po.st 13 25 ow.ly 8253 25 tr.im 11 (a) Top 25 most frequent domains. (b) Top 25 most frequent shortened domains.

Table A.4: Most frequent domains (1/4-8/4).

Domain Statistics From the framework 3 different sets where constructed, all links, link shorteners and Bitly links. The figures show the number of occurrences of the top 20 for all 3 sets respectively. Here is also the Alexa and Majestic top 1 million domains rankings included, when they are not available we list "-".

44 A.2. Collections from 18/3-25/3 and 1/4-8/4

Top Domains

Domain Occur. Alexa Maj. Domain Occur. Alexa Maj. 1 twitter.com 2205643 35 4 1 twittascope.com 12317 - - 2 youtube.com 178070 2 3 2 k.kakaocdn.net 1651 - - 3 instagram.com 46962 29 5 3 img1.daumcdn.net 1103 - - 4 peing.net 33178 22860 194411 4 t1.daumcdn.net 1026 - - 5 onlyfans.com 28753 1816 30871 5 youtube.com 768 2 3 6 open.spotify.com 22562 - 162 6 drive.google.com 725 - 35 7 twittascope.com 21791 - - 7 twitter.com 605 35 4 8 twcm.co 21494 - - 8 youtu.be 566 11785 13 9 fllwrs.com 20780 220101 - 9 cp.kirin.jp 558 - - 10 curiouscat.me 17969 10190 152589 10 str2b.openstream.co 499 - - 11 dlvr.it 17627 577580 14259 11 boatrace-amagasa.org 476 - - 12 twitch.tv 14000 40 344 12 dlvr.it 474 577580 14259 13 ift.tt 12204 - 9927 13 go.onelink.me 462 - 11857 14 naver.me 11549 - 19471 14 facebook.com 462 4 1 15 news.v.datum.net 11092 - - 15 aoxx69.com 432 143328 - 16 pscp.tv 11052 7102 1814 16 punchng.com 427 818 9964 17 cas.st 10162 - - 17 trib.al 398 227811 22402 18 facebook.com 9915 4 1 18 streamingv2.shout 371 - - 19 suho.smtown.com 9324 - - 19 end.ciao.jp 362 - - 20 headlines.yahoo.com 9103 - 3880 20 careerarc.com 355 351510 91196

(a) Top 20 domains for all links. (b) Top domains for Bitly links.

Domain Occur. Alexa Maj. 1 youtube.com 147150 2 3 2 twittascope.com 21790 - - 3 k.kakaocdn.net 2001 - - 4 goo.gl 1900 9664 19 5 linkedin.com 1784 60 6 6 img1.daumcdn.net 1293 - - 7 t1.daumcdn.net 1282 - - 8 twitter.com 1135 35 4 9 drive.google.com 1028 - 35 10 tinyurl.com 774 4371 77 11 cp.kirin.jp 773 - - 12 youtu.be 696 11785 13 13 facebook.com 636 4 1 14 go.onelink.me 620 - 11857 15 str2b.openstream.co 618 - - 16 rbnnw.com 610 - - 17 boatrace.amagasa.org 594 - - 18 dlvr.it 550 577580 14259 19 aoxx69.com 546 143328 - 20 punchng.com 539 818 9964 (c) Top domains for shortened links.

Table A.5: Top 20 most frequent domains (18/3-25/3).

In the first collection in Table 4.4. we can see that for especially Bitly but also other shortened links a frequent domain occurrence is twitteascope.com which is a well known horoscope page.

45 A.2. Collections from 18/3-25/3 and 1/4-8/4

Domain Occur. Alexa Maj. Domain Occur. Alexa Maj. 1 twitter.com 2125792 49 4 1 twittascope.com 11080 - - 2 youtube.com 188745 2 3 2 k.kakaocdn.net 1429 - - 3 instagram.com 48612 30 5 3 img1.daumcdn.net 1144 - - 4 peing.net 37623 9756 194138 4 rbaov.top 913 - - 5 family.co.jp 32352 52743 24174 5 youtube.com 764 2 3 6 onlyfans.com 32280 1163 28858 6 t1.daumcdn.net 725 - - 7 sumail.com 24360 76713 41662 7 rbsye.com 684 - - 8 open.spotify.com 23492 - 158 8 rbaoy.top 675 - - 9 twittascope.com 21591 - - 9 cp.kirin.jp 547 - - 10 twcm.co 21172 - - 10 this.kiji.is 544 - - 11 fllwrs.com 19812 50550 - 11 rbaor.top 540 - - 12 dlvr.tv 18091 206614 14248 12 youtu.be 493 16310 12 13 twitch.tv 14606 31 338 13 drive.google.com 425 - 35 14 news.livedoor.com 11591 - - 14 dlvr.it 391 206614 14248 15 ift.tt 11438 - 10051 15 rbeon.com 371 - - 16 curiouscat.me 11182 16036 149817 16 punchng.com 364 1009 9884 17 .nhk.or.jp 10876 - - 17 trib.al 358 6 1 18 naver.me 10788 53995 19455 18 facebook.com 352 6 1 19 cas.st 10559 - - 19 elmundo.es 336 853 1196 20 headlines.yahoo.com 10504 - 3826 20 str2b.openstream.co 330 - - (a) Top 20 domains for all links. (b) Top domains for Bitly links.

Domain Occur. Alexa Maj. 1 youtube.com 156170 2 3 2 twittascope.com 21590 - - 3 k.kakaocdn.net 1894 - - 4 goo.gl 1772 3400 19 5 linkedin.com 1645 69 6 6 img1.daumcdn.net 1475 - - 7 t1.daumcdn.net 1090 - - 8 rbaov.top 913 - - 9 rbsye.com 788 - - 10 cp.kirin.jp 766 - - 11 youtu.be 738 16310 12 12 this.kiji.is 714 - - 13 rbaoy.top 675 - - 14 drive.google.com 674 - 35 15 rbaor.top 540 - - 16 tinyyrl.com 524 4101 77 17 facebook.com 519 6 1 18 akindo-sushio.co.jp 518 - 194422 19 punchng.com 489 1009 9884 20 rbeon.com 484 - - (c) Top domains for shortened links.

Table A.6: Top 20 most frequent domains (1/4-8/4).

46 A.2. Collections from 18/3-25/3 and 1/4-8/4

Popularity Distribution

(a) Alexa top-1M. (b) Majestic top-1M.

Figure A.1: Link popularity distribution to domains of different popularity classes, as defined using the Alexa and Majestic top-1M lists (18/3-25/3).

(a) Alexa top-1M. (b) Majestic top-1M.

Figure A.2: Link popularity distribution to domains of different popularity classes, as defined using the Alexa and Majestic top-1M lists (1/4-8/4).

Domain Frequencies

(a) CDF. (b) CCDF.

Figure A.3: Distribution of domain rank (18/3-25/3).

47 A.2. Collections from 18/3-25/3 and 1/4-8/4

(a) CDF. (b) CCDF.

Figure A.4: Distribution of domain rank (1/4-8/4).

Relative Ranks and Frequencies

Figure A.5: Distribution of domain rank (18/3-25/3).

48 A.2. Collections from 18/3-25/3 and 1/4-8/4

Figure A.6: Distribution of domain rank for (1/4-8/4).

Phishing Domains We ran all our collections through the database of Phishtank (a service that enables users to report and review suspicious phishing sites) but did not find any matches with our links.

User Statistics In this section we will present the results of how users use link shorterners. We will take a closer look to age of the account, number of tweets favourited by users, how many tweets the user posted and how many followers the user has. Last we will look at verified users.

Age

(a) CDF. (b) CCDF.

Figure A.7: Distribution of the age for users account at the time of posting their tweet (18/3- 25/3).

49 A.2. Collections from 18/3-25/3 and 1/4-8/4

(a) CDF. (b) CCDF.

Figure A.8: Distribution of the age for users account at the time of posting their tweet (1/4- 8/4).

Favourites

(a) CDF. (b) CCDF.

Figure A.9: Distribution of the number of tweets favourited by users at the time of posting their tweet (18/3-25/3).

(a) CDF. (b) CCDF.

Figure A.10: Distribution of the number of tweets favourited by users at the time of posting their tweet (1/4-8/4).

50 A.2. Collections from 18/3-25/3 and 1/4-8/4

Number of tweets

(a) CDF. (b) CCDF.

Figure A.11: Distribution of the number of tweets posted by users at the time of posting their tweet (18/3-25/3).

(a) CDF. (b) CCDF.

Figure A.12: Distribution of the number of tweets posted by users at the time of posting their tweet (1/4-8/4).

Favourites to Tweets

(a) CDF. (b) CCDF.

Figure A.13: Ratio between tweets favourited and tweeted by users at the time of posting their tweet (18/3-25/3).

51 A.2. Collections from 18/3-25/3 and 1/4-8/4

(a) CDF. (b) CCDF.

Figure A.14: Ratio between tweets favourited and tweeted by users at the time of posting their tweet (1/4-8/4).

Followers

(a) CDF. (b) CCDF.

Figure A.15: Distribution of the number of followers for users at the time of posting their tweet (18/3-25/3).

(a) CDF. (b) CCDF.

Figure A.16: Distribution of the number of followers for users at the time of posting their tweet (1/4-8/4).

52 A.2. Collections from 18/3-25/3 and 1/4-8/4

Friends

(a) CDF. (b) CCDF.

Figure A.17: Distribution of the number of friends for users at the time of posting their tweet (18/3-25/3).

(a) CDF. (b) CCDF.

Figure A.18: Distribution of the number of friends for users at the time of posting their tweet (1/4-8/4).

Followers vs Friends

(a) 18/3-25/3. (b) 1/4-8/4.

Figure A.19: Followers-to-friends ratio for users at the time of posting their tweet .

Unique and Verified Users The table below shows how many unique and verified users there where in the data sets from 18-25/3 and 1-8/4.

53 A.2. Collections from 18/3-25/3 and 1/4-8/4

Category Unique Users 18-25/3 Unique Users 1-8/4 Verified 18-25/3 Verified 1-8/4 All Tweets 15450290 15361077 67423 63214 Link Tweets 2838692 2806472 34996 33173 Shortener Tweets 236784 248384 0 0 Bitly Tweets 56739 52665 0 0

Table A.7: Amount of unique users for each category and how many of those users that are verified.

Verified vs Non-verified Users This section will see if there are any noticeable differences between verified and non-verified users.

Bitly Clicks

(a) 18-25/3. (b) 1-8/4.

Figure A.20: Clicks-to-followers ratio for Bitly links for verified users.

(a) 18-25/3. (b) 1-8/4.

Figure A.21: Clicks-to-followers ratio for Bitly links for non-verified users.

Followers, Number of Tweets and Retweets To compare different user statistics between non-verified and verified users we will show three pairs of heat-maps. The first one is retweet versus tweets tweeted, second is retweets versus followers tweeted and the last one is followers versus number of tweets tweeted.

54 A.2. Collections from 18/3-25/3 and 1/4-8/4

(a) Non-verified users (18/3-25/3). (b) Verified users (18/3-25/3).

Figure A.22: Heat-map of retweets vs number of tweets tweeted (18/3-25/3).

(a) Non-verified users (18/3-25/3). (b) Verified users (18/3-25/3).

Figure A.23: Heat-map of retweets vs followers tweeted (18/3-25/3).

(a) Non-verified users (18/3-25/3). (b) Verified users (18/3-25/3).

Figure A.24: Heat-map of followers vs number of tweets tweeted (18/3-25/3).

55 A.2. Collections from 18/3-25/3 and 1/4-8/4

1/4-8/4

(a) Non-verified users (1/4-8/4). (b) Verified users (1/4-8/4).

Figure A.25: Heat-map of retweets vs number of tweets tweeted (1/4-8/4).

(a) Non-verified users (1/4-8/4). (b) Verified users 2020.

Figure A.26: Heat-map of retweets vs followers tweeted (1/4-8/4).

(a) Non-verified users (1/4-8/4). (b) Verified users (1/4-8/4).

Figure A.27: Heat-map of followers vs number of tweets tweeted (1/4-8/4).

56 A.2. Collections from 18/3-25/3 and 1/4-8/4

(a) 18-25/3. (b) 1-8/4.

Figure A.28: Two scatter plots of Bitly clicks-to-retweets-ratio.

(a) 18-25/3. (b) 1-8/4.

Figure A.29: Logarithmic average of Bitly clicks per retweet.

57