A Twitter Case Study of Anonymity in Social Networks

“On the Internet, Nobody Knows You’re a Dog”: A Twitter Case Study of Anonymity in Social Networks Sai Teja Peddinti* Keith W. Ross*y Justin Cappos* [email protected] [email protected] [email protected] *Dept. of Computer Science and Engineering, NYU yNYU Shanghai Brooklyn, New York, USA Shanghai, China ABSTRACT 1. INTRODUCTION Twitter does not impose a Real-Name policy for usernames, Many online social networks, including Facebook and Goo- giving users the freedom to choose how they want to be iden- gle+, enforce a Real-Name policy, requiring users to use tified. This results in some users being Identifiable (disclos- their real names when creating accounts [3, 2]. The cited ing their full name) and some being Anonymous (disclosing reasons for the Real-Name policy include that it improves neither their first nor last name). the quality of the content and the service (helping decrease In this work we perform a large-scale analysis of Twitter to spam, bullying, and hacking), increases accountability, and study the prevalence and behavior of Anonymous and Iden- helps people to find each other. The Real-Name policy, how- tifiable users. We employ Amazon Mechanical Turk (AMT) ever, also enables the social networks to tie user interests{as to classify Twitter users as Highly Identifiable, Identifiable, reflected from their use of the online services{with their true Partially Anonymous, and Anonymous. We find that a sig- names, generating a treasure trove of consumer data. This nificant fraction of accounts are Anonymous or Partially has resulted in many debates [13] and petitions [6], with Anonymous, demonstrating the importance of Anonymity in privacy advocates claiming that Real-Name policy erodes Twitter. We then select several broad topic categories that online freedom [31]. Privacy-conscious users have started are widely considered sensitive{including pornography, es- finding ways to bypass the policy, hiding their real identity cort services, sexual orientation, religious and racial hatred, while continuing to use these social networks [22]. online drugs, and guns{and find that there is a correlation Twitter, on the other hand, does not impose strict rules between content sensitivity and a user's choice to be anony- for users to provide their real names, although it does require mous. Finally, we find that Anonymous users are generally them to register with and employ unique pseudonyms. Tak- less inhibited to be active participants, as they tweet more, ing advantage of this lack of Real-Name policy, many Twit- lurk less, follow more accounts, and are more willing to ex- ter users choose to employ pseudonyms that have no relation pose their activity to the general public. To our knowledge, to their real names. Some users choose such a pseudonym this is the first paper to conduct a large-scale data-driven only because they enjoy being associated with a particu- analysis of user anonymity in online social networks. lar fun or interesting pseudonym. But many users likely choose pseudonyms with no relation to their real names because they want to be anonymous on Twitter. For ex- Categories and Subject Descriptors ample some users may desire the ability to tweet messages J.4 [Social And Behavioral Sciences]: Sociology; K.4.1 without revealing their actual identities. Other users may [Public Policy Issues]: Privacy; H.4 [Information Sys- desire to follow sensitive and controversial accounts with- tems Applications]: Miscellaneous out exposing their real identities. The lack of Real-Name policy enforcement has turned Twitter into a popular information exchange portal where users share and access infor- General Terms mation without being identifiable{as is evident by Twitter's Measurement, Human Factors role in Egyptian revolution [25] and for reporting news in Mexico [34]. However, there is a meaningful debate about the pros and cons of online anonymity, as it allows people Keywords to more easily spread false rumours [14], defame individu- Online Social Networks; Twitter; Anonymity; Quantify; Be- als [12], attack organizations [33], and even spread spam [41, havioral Analysis 17]. In this work we use Twitter to study the prevalence and be- Permission to make digital or hard copies of all or part of this work for personal or havior of Identifiable users (those disclosing their full name) classroom use is granted without fee provided that copies are not made or distributed and Anonymous users (those disclosing neither their first for profit or commercial advantage and that copies bear this notice and the full cita- nor last name). Although both on-line and off-line anony- tion on the first page. Copyrights for components of this work owned by others than mity has been considered by researchers in psychology and ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- sociology, as discussed in Section 7, these studies have gen- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. erally been carried out with small data sets and surveys. COSN’14, October 1–2, 2014, Dublin, Ireland. There have also been a few data-driven studies of anony- Copyright 2014 ACM 978-1-4503-3198-2/14/10 ...$15.00. mity in blogs and postings to Web sites [16, 5, 36]. To our http://dx.doi.org/10.1145/2660460.2660467. knowledge, this paper is the first to conduct a large-scale The following sections of the paper are organized as fol- data-driven analysis of user anonymity in online social net- lows. Section 2 provides a brief background on Twitter and works. The potential benefits of such a study include: (i) a its terminology. Section 3 gives details about the user cate- deeper understanding of the importance and role of anony- gories we are interested in and the classification procedure. mity in our society; (ii) guidance for the incorporation of We describe our collected dataset statistics in Section 4. Our privacy and anonymity features in existing and future on- findings on the use of non-identifying pseudonyms, correla- line social networks; (iii) and as we shall discuss in the body tion with following sensitive accounts, and group behavioral of the paper, the discovery of illegal (such as child-porn and differences are reported in Section 5. Section 6 discusses terrorism) or controversial (such as ethnic or religious hate) future work. Section 7 describes the related work and Sec- activities. tion 8 concludes the paper. Contributions 2. BACKGROUND • We first analyze a large random sample of 100,000 Every Twitter account is comprised of four main pieces of Twitter users. After removing ephemeral users (ac- information. tive on Twitter for less than six months) and spam users, we employ Amazon Mechanical Turk (AMT) to • First is the account Profile which includes the details classify Twitter users as Highly Identifiable, Identifi- provided by the user about him/her. These include the able, Partially Anonymous, and Anonymous based on screen name, which is a user-chosen unique alphanu- whether their first and last names are given in their meric ID (also referred to as the username); the name, profiles and whether they link to other social networks which may be the user's actual first and last name; and with a Real-Name policy. We find that 5.9% of the (optionally) a small textual description, a profile pic- accounts are Anonymous and 20% of the accounts are ture, the user's city/location and a URL (either linking Partially Anonymous, demonstrating the importance to another social network profile or to something the of Anonymity for a large fraction of Twitter Users. user supports). It is to be noted that the details pro- Leveraging this same data set, we find Identifiable and vided in the profile need not always be true (e.g., the Anonymous users exhibit distinctly different behavior name field can contain a fake first and/or last name). in choosing which accounts to follow. • Second is the list of Tweets (i.e., messages) posted by the user. A tweet is a message restricted to 140 char- • We evaluate whether content sensitivity has any corre- acters and can contain text, URLs (URL shortening is lation with users choosing to be anonymous. For this generally applied to limit the URL size to 20 charac- analysis we select several broad topic categories that ters) and HashTags (which is a metadata tag used to are widely considered sensitive and/or controversial{ group messages). pornography, escort services, sexual orientation, religious and racial hatred, online drugs, and guns. We • Third is the Friends list of the user. When a Twitter also consider several generic non-sensitive categories. user follows another user (a \friend"), he/she receives For each of these broad categories we identify Twit- the tweets from that friend. This relationship is uni- ter accounts that tweet about these categories. We directional, so if A is a friend of B, B need not be a observe that the different categories contain greatly friend of A. different percentages of Anonymous and Identifiable • Fourth is the Followers list of the user. All the users followers. Strikingly, all but one of the sensitive aggre- who follow a particular Twitter user are termed his/her gate categories have the largest percentage of Anony- followers. They receive all the tweet updates posted by mous users. We also examine each of the non-sensitive the particular user. and sensitive accounts individually and observe that there is a general pattern of having larger percentages By default, all of this information is publicly available of Anonymous followers for the sensitive accounts and from the Twitter web site. Twitter provides a protected larger percentages of Identifiable followers for the non- privacy feature, to enable users to hide their tweets, friend sensitive accounts.

A Twitter Case Study of Anonymity in Social Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support