Dissemination of scholarly literature in

Pablo Moriano, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer Center for Complex Networks and Systems Research School of and Computing Indiana University, Bloomington, USA

ABSTRACT Table 1: Publisher domains used in the study. Social media data have been increasingly used to assess the impact 1 acm.org 10 ieee.org 19 plosone.org of scholarly research. Such data provide complementary metrics 2 acs.org 11 jbc.org 20 pnas.org (often called altmetrics) to traditional impact indicators. This pa- 3 ams.org 12 jstor.org 21 sciencedirect.com per provides a summary on the diffusion of scholarly content in 4 aps.org 13 mdpi.com 22 sciencemag.org 5 .org 14 metapress.com 23 springer.com social media, based on a collection of tweets citing papers from a 6 biomedcentral.com 15 nature.com 24 ssrn.com set of 27 academic publishers within various fields between 2011 7 cell.com 16 nejm.org 25 thelancet.com and 2013. We first show that there has been an increasing adop- 8 doi.org 17 oxfordjournals.org 26 wiley.com tion of as a channel to disseminate scholarly literature. In 9 elsevier.com 18 plos.org 27 worldscientific.com particular, between 2012 and 2013, the number of scholarly tweets and the fraction of tweets (over the entire corpus) have increased by 91.2% and 42.6% respectively. We then analyze the structure semination of scholarly content, we characterize the distribution of of the information diffusion network. We show that the distribu- popularity of scholarly products, and illustrate statistical patterns tions of the numbers of times a specific paper is tweeted, retweeted, in the aggregate network of scholarly information diffusion. We and the number of connected components in the diffusion network focus on a multidisciplinary set of 27 well-known academic pub- are scale-free. These preliminary results suggest that, as for other lishers. Our contribution is threefold. First, we analyze to what ex- kinds of information, there are underlying mechanisms that lead tent scientific publications are discussed in Twitter. In doing so, we some scholars and their products to become viral. measure the increase in the use of Twitter as a dissemination tool for academic content. Second, we quantify the distribution of the 1. INTRODUCTION numbers of times a specific paper is tweeted and retweeted. Third, we characterize the structure of the diffusion network for schol- Social media have been adopted as widely used sources of data arly products by characterizing the distribution of the number of in the study of complex social dynamics. In recent years, data from connected components for the diffusion network. Our results sug- social media platforms have attracted the attention of researchers gest that, similar to a variety of physical, biological, and man-made who study the diffusion and impact of scholarly content [8, 2]. So- phenomena [5], Twitter communication dynamics about scientific cial media data, specifically data gathered from platforms such as literature is also characterized by scale-free distributions. LinkedIn, Facebook and Twitter, offer complementary ways (called altmetrics) to measure scholarly impact against traditional citation indicators [6]. Here, we are particularly interested in quantifying 2. DATA COLLECTION how scientific knowledge is disseminated in these environments. Our collection of tweets is based on a 10% sample of the Twitter While the reliability and validity of altmetrics is still under in- stream, and spans three years of data (2011–2013). In particular, vestigation [3], the use of traditional citation metrics as measures we use tweets collected by the Truthy project [4]. In our dataset we of impact of scholarly publications is being reevaluated. On the one retain only tweets and retweets with an explicit or shortened link to hand, people argue that traditional metric indicators are exclusively a paper within a pre-selected set of publishers (Table 1.) based on the transfer of knowledge in close research communities [9]. On the other hand, altmetrics are supposed to account for a 3. RESULTS AND DISCUSSION broader audience, since they are based on data from online sources used by different types of populations [7]. Focusing on Twitter, we Fig. 1(A) show the total number of tweets about scholarly pa- provide evidence of the increasing use of social media in the dis- pers for every year during the observation period. We note a grow- ing amount of scholarly communication. In particular, there is an increase of 165.1% between 2011 and 2012, and one of 91.2% between 2012 and 2013. Fig. 1(B) illustrates the variation in the percentage of scholarly tweets. For instance, there is an increase Permission to make digital or hard copies of all or part of this work for of 29.5% between 2011 and 2012, and 42.6% between 2012 and personal or classroom use is granted without fee provided that copies are 2013. This suggests that social media platforms are increasingly not made or distributed for profit or commercial advantage and that copies perceived as tools to promote academic literature, and that schol- bear this notice and the full citation on the first page. To copy otherwise, to arly content is capturing a growing share of social media attention. republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Fig. 2 shows the number of tweets about scholarly papers dur- Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. ing the observation period by month and by publisher. The most 0 0 0 10 ìæà 10 ìæà 10 +42.6% A æ 2011 B æ 2011 ìæà C æ 2011 +91.2% ìæà A B ìæà -1 -1 ì -1 ì æà 100 000 10 à 10 ì 10 æì æà à à ìà à ìà à æìà 2012 æìà 2012 2012 -6 æìà æìà æ æìà æìà ì 6 x 10 -2 æìàì -2 æìàìà -2 à æàìà æ ìà 10 æìàì 10 æìàìà 10 æ ìà ææàìàì ææìàìì ææàìàì æààììì æì æææàìàìì ì æàààììæ ì æàì ì 80 000 ææààìàì 2013 àìàìææ 2013 à 2013 ææààìì ììààààæ æìà -6 +29.5% ææàæàìì ìììààààæ æì æààìàì ìììààæ æàì 5 x 10 -3 æìàìàà -3 ììàà -3 æàà ææìæìàìà ììæìà ìæìà 10 æìàìàìà 10 ììà 10 ìà æìàìì àì æìààà àìàææ àìì ììàæàæà ìàì æ àì ìàæ ììàà æ ì ììà æ ìììàà à ì ììà -6 ììàæ ì ììàà æ 60 000 -4 ìàì -4 à -4 +165.1% 4 x 10 ìà ì ì ì à 10 àì æ 10 10 ì à æ ì ì ì ìà à ì -6 ì à àì 3 x 10 10-5 ì 10-5 10-5 ì 40 000 100 101 102 103 100 101 102 100 101 102 2 x 10-6 Figure 3: CCDF of the numbers of times a specific paper is (A) 20 000 1 x 10-6 tweeted, (B) retweeted, and (C) connected components in the paper dif- 0 fusion network. 2011 2012 2013 2011 2012 2013 Figure 1: (A) Scholarly tweets per year. (B) Percentage of scholarly tweets over time. imately power laws, with exponents of 2.8, 2.7, and 2.7. To rep- resent the dissemination of academic literature within the collected set of tweets, for each paper we construct a directed graph in which 2011 2012 2013 nodes represent users. An edge from user X to Y represents a men- December A tion of Y by X or a retweet by Y of a message from X. Fig. 3(C) November shows the CCDF of the number of connected components for this diffusion network (exponents of 2.5, 3.0, and 3.0). This analysis October leads us to hypothesize that, similarly to other information diffu- September sion phenomena, scholarly paper dissemination may follow a pref- August erential attachment mechanism. This results in a few widely popu- lar papers accounting for a great fraction of mentions, retweets and July community penetration. June

May 4. CONCLUSIONS We analyzed the use of Twitter in scholarly-related discussion. April We observed an increasing use of social media as channels for March scholarly content dissemination. We also studied the structure of these diffusion networks, finding that most conversations tend to February spread within small numbers of communities. However, there are January some papers that become very popular and go viral spreading in 0 2000 4000 6000 8000 10 000 multiple communities. The present analysis is at the level of pub- lisher domains; tracking URLs of individual papers will allow us to 2011 2012 2013 investigate whether the number of distinct papers and the fraction nature.com of papers represented in Twitter are growing. Further investiga- arxiv.org sciencemag.org B tion will also aim to understand under what conditions scholarly- wiley.com related conversations become viral in social media. Comparison sciencedirect.com plos.org with spreading of information in other contexts (e.g., politics, social dx.doi.org doi.org movements, or the news) will shed light on the underlying forces plosone.org that shape academic content dissemination. This work was sup- oxfordjournal.org acm.org ported in part by NSF grant CCF-1101743. springer.com ssrn.com ieee.org 5. REFERENCES nejm.org [1] A. Clauset, C. R. Shalizi, and M. E. Newman. Power-law distributions pnas.org thelancet.com in empirical data. SIAM Rev., 51(4):661–703, 2009. biomedcentral.com [2] S. Haustein, I. Peters, C. R. Sugimoto, M. Thelwall, and V. Larivière. acs.org aps.org Tweeting biomedicine: An analysis of tweets and citations in the cell.com biomedical literature. J. Assoc. Inf. Sci. Technol., 65(4):656–669, 2014. jstor.org elsevier.com [3] R. Kwok. Research impact: altmetrics make their mark. Nature, mdpi.com 500(7463):491–493, 2013. metapress.com ams.org [4] K. McKelvey and F. Menczer. Truthy: Enabling the study of online jbc.org social networks. In Proceedings of the 2013 Conference on Computer worldscientific.com Supported Cooperative Work Companion, pages 23–27, 2013. 0 5000 10 000 15 000 20 000 [5] M. E. J. Newman. The structure and function of complex networks. Figure 2: Scholarly tweets per (A) month, (B) publisher. SIAM Rev., 45(2):167–256, 2003. [6] J. Priem and K. L. Costello. How and why scholars cite on twitter. In Proceedings of the American Society for Information Science and Technology tweeted papers are published by Nature, arXiv, and Science. , pages 1–4, 2010. [7] J. Priem, D. Taraborelli, P. Groth, and C. Neylon. altmetrics: a Fig. 3(A) shows the complementary cumulative distribution func- manifesto. http://altmetrics.org/manifesto/, 2011. tion (CCDF) of the number of times a url to a specific paper is [8] X. Shuai, A. Pepe, and J. Bollen. How the scientific community reacts contained in a tweet. The distributions follow a power law with to newly submitted preprints: Article downloads, twitter mentions, and estimated scaling exponents [1] of 3.1, 2.7, and 2.6, respectively. citations. PLoS One, 7(11):e47523, 2012. Fig. 3(B) shows the CCDF of the number of retweets that refer- [9] C. H. Yeong and B. J. J. Abdullah. Altmetrics: the right step forward. ence a specific paper. Corresponding distributions are also approx- Biomed. Imag. Interv. J., 8(3):e15, 2012.