Attention Inequality in Social Media
Total Page:16
File Type:pdf, Size:1020Kb
Attention Inequality in Social Media Linhong Zhu and Kristina Lerman USC Information Sciences Institute 4676 Admiralty Way Marina Del Rey, CA 90292 flinhong,[email protected] Abstract Klamer and Van Dalen 2002). Researchers have long spec- ulated about the origins of inequalities. Current consen- Social media can be viewed as a social system where the cur- sus holds that inequality cannot be explained by variation rency is attention. People post content and interact with oth- in quality of the work or individual talent (Adler 1985; ers to attract attention and gain new followers. In this paper, Salganik, Dodds, and Watts 2006; Lariviere and Gingras we examine the distribution of attention across a large sam- ple of users of a popular social media site Twitter. Through 2010). Rather, it is the result of the underlying social pro- empirical analysis of these data we conclude that attention is cesses, such as the “cumulative advantage” or the “rich-get- very unequally distributed: the top 20% of Twitter users own richer” effect, which brings greater recognition to those who more than 96% of all followers, 93% of the retweets, and 93% are already distinguished (Merton 1968; Allison, Long, and of the mentions. We investigate the mechanisms that lead to Kraze 1982). attention inequality and find that it results from the “rich- In online social media attention is also non-uniformly dis- get-richer” and “poor-get-poorer” dynamics of attention dif- tributed. Some user-generated photos and videos are viewed, fusion. Namely, users who are “rich” in attention, because liked, or shared orders of magnitude more times than the they are often mentioned and retweeted, are more likely to gain new followers, while those who are “poor” in attention rest. Psy’s Gangnam Style video, for example, was viewed are likely to lose followers. We develop a phenomenological over 2 billion times on YouTube, while the average video model that quantifies attention diffusion and network dynam- has fewer than 100 thousand views. Attention paid to people ics, and solve it to study how attention inequality grows over is also highly unequal. A few Twitter users have tens of mil- time in a dynamic environment of social media. lions of followers, while the majority have just a handful of followers. These popular users enjoy an unparalleled advan- tage: things they say on Twitter, from product endorsements Introduction to personal opinions, are seen by millions, making them po- Inequality is a pervasive social phenomenon. For example, tentially far more influential than ordinary users. income and wealth are not evenly distributed in modern so- In this paper, we carry out quantitative analysis of data ciety, but instead, concentrated among few individuals. The from the microblogging service Twitter to characterize and richest 20% in the United States own more than 80% of quantify attention inequality in social media. We identify a the country’s total wealth, while the poorest 20% own just set of almost 6,000 users and monitor their activity over a a fraction of one percent (Norton and Ariely 2011). Inequal- period of several months. We show that attention is very un- ity does not just violate our sense of fairness (Norton and equally distributed: the top 20% of Twitter users in our sam- Ariely 2011), but it also reduces economic opportunities, ple own 97% of all followers, 93% of retweets, and 93% and has been linked to negative social outcomes, such as of mentions. By observing how the monitored users allo- arXiv:1601.07200v1 [cs.SI] 26 Jan 2016 poor health (Kawachi and Kennedy 1999) and high crime cate their attention over time, via following, mentioning or rates (Kelly 2000) in societies with high levels of inequality. retweeting other users, we are able to study the processes While economic inequality has been widely discussed re- leading to attention inequality. We empirically demonstrate cently, inequalities are rampant in other domains, including that the attention Twitter users receive by being mentioned science and entertainment, where a few individuals receive or having their posts retweeted by others results in them disproportionate share of attention and the benefits it brings. gaining new followers. In contrast, users with fewer follow- In science, specifically, attention can be measured by the ers are retweeted less frequently and are more likely to lose number of citations that papers (and scientists who publish followers. We construct a phenomenological model of atten- them) receive. The distribution of the number of citations is tion diffusion where these two processes — mentioning and extremely unequal, with some papers receiving thousands of retweeting — result in changes in the structure of the un- citations, while many others are seldom cited (Allison 1980; derlying follower network. After parameterizing the model, we solve it to study how attention and inequality evolve over Copyright c 2015, Association for the Advancement of Artificial time. Our model produces a “rich-get-richer” and “poor-get- Intelligence (www.aaai.org). All rights reserved. poorer” dynamic, whereby users with more followers, i.e., Twitter celebrities, are mentioned and retweeted more fre- −2 10 −2 quently, which results in them gaining even more followers. 10 −5 −5 Our model is able to identify users who will gain more at- 10 10 tention, to predict the evolution of the follower network and −10 −10 attention inequality. 10 Seed users 10 Seed users There are consequences to high attention inequality, few All users All users Normalized frequency 2 4 6 7 Normalized frequency 1 3 5 7 of which have been explored in the context of social me- 10 10 10 10 10 10 10 10 dia. High inequality focuses the attention of the online com- Number of followers Number of friends munity on relatively few users, allowing them to dispropor- (a) (b) tionately benefit from the attention they receive. Compa- −2 10 −2 nies compensate social media celebrities for endorsing their 10 −4 products and messages. Ordinary users receive no such pay- 10 −4 10 −6 offs. Attention is the currency of social media and, similar 10 −6 10 to other currencies, it is finite. When everyone is retweeting −8 10 −8 Seed users 10 Seed users a popular celebrity or watching a viral video, it means that −10 All users 10 −10 All users other people are not being retweeted, and other videos are Normalized frequency 1 2 3 4 5 Normalized frequency 10 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 not being watched. Thus, inequality diminishes the diversity Number of status Number of retweets of content and view points to which people are exposed. But (c) (d) inequality may also have hidden benefits. When every single person had watched the same video, or read the same story, Figure 1: Distribution of the numbers of (a) followers, (b) it gives people a common topic for conversation (and tweet- friends, (c) status updates posted, and (d) retweets received ing) and a common vocabulary with which to discuss. Thus, for seed users and all users. We logarithmically bin data for inequality reinforces social identity. smoothing. Our work raises a number of questions, including why do users tolerate extreme attention inequality? How does it af- fect user behavior? Does it reduce their engagement with edges. To provide more details about statistics of our data, Twitter, or does it inspire them to be more active in gain- we analyze the distribution of the numbers of followers, ing attention? And not least, should Twitter, and other social friends, status updates (user activity), and the times that the media sites, do something to ensure attention is more equi- tweets of the seed users, and all users in our data, were tably distributed? We hope that our work stimulates research retweeted. Note that we use ‘status updates’ to refer to the community to address these questions. aggregate tweets posted by user since account creation, and Quantifying Attention Inequality we use ‘tweets’ to refer to the tweets user posted during a specific time period. These distributions, plotted in Fig- In this section we quantitatively characterize the attention ure 1, have a characteristic long-tailed form, where a few received by users and content they post on Twitter. Although users have extremely large numbers of friends, followers or we focus on Twitter, our definitions of attention generalize posted tweets, while many users have few friends, followers to other social media platforms. or posted tweets. Such distributions are generally associated Data Description with high inequality. Except for activity (number of posted tweets) and number We collected data from Twitter over a period from March of retweets, the distributions for seed and all users are very 2014 to Oct 2014 using the following strategies. First, we similar, which indicates success of our random seed user se- randomly selected 5600 seed users whose user ids range lection strategy. The small differences between the distribu- from [0,760000000]. We then used the Twitter API to peri- tion of number of tweets among seed users and among all odically query the friends of seed users (i.e., whom the seed users arise because the profile information of seed users is users were following), which resulted in a dynamic network obtained from raw Jason objects of tweets, and thus users where each edge had a time stamp with its creation and/or who never post any tweets cannot be selected as seed users deletion date. Next, we collected profile information for all in our data collection. Due to these differences, the distribu- the seed users and their friends, as well as timeline tweets. tion of number of retweets among seed users, also slightly The number of tweets gives that user’s activity, or engage- differs from that among all users.