Arxiv:1403.5206V2 [Cs.SI] 30 Jul 2014
Total Page:16
File Type:pdf, Size:1020Kb
What is Tumblr: A Statistical Overview and Comparison Yi Chang‡, Lei Tang§, Yoshiyuki Inagaki† and Yan Liu‡ † Yahoo Labs, Sunnyvale, CA 94089, USA § @WalmartLabs, San Bruno, CA 94066, USA ‡ University of Southern California, Los Angeles, CA 90089 [email protected],[email protected], [email protected],[email protected] Abstract Traditional blogging sites, such as Blogspot6 and Living- Social7, have high quality content but little social interac- Tumblr, as one of the most popular microblogging platforms, tions. Nardi et al. (Nardi et al. 2004) investigated blogging has gained momentum recently. It is reported to have 166.4 as a form of personal communication and expression, and millions of users and 73.4 billions of posts by January 2014. showed that the vast majority of blog posts are written by While many articles about Tumblr have been published in ordinarypeople with a small audience. On the contrary, pop- major press, there is not much scholar work so far. In this pa- 8 per, we provide some pioneer analysis on Tumblr from a va- ular social networking sites like Facebook , have richer so- riety of aspects. We study the social network structure among cial interactions, but lower quality content comparing with Tumblr users, analyze its user generated content, and describe blogosphere. Since most social interactions are either un- reblogging patterns to analyze its user behavior. We aim to published or less meaningful for the majority of public audi- provide a comprehensive statistical overview of Tumblr and ence, it is natural for Facebook users to form different com- compare it with other popular social services, including blo- munities or social circles. Microblogging services, in be- gosphere, Twitter and Facebook, in answering a couple of key tween of traditional blogging and online social networking questions: What is Tumblr? How is Tumblr different from services, have intermediate quality content and intermediate other social media networks? In short, we find Tumblr has social interactions. Twitter9, which is the largest microblog- more rich content than other microblogging platforms, and it contains hybrid characteristics of social networking, tradi- ging site, has the limitation of 140 characters in each post, tional blogosphere, and social media. This work serves as an and the Twitter following relationship is not reciprocal: a early snapshot of Tumblr that later work can leverage.12 Twitter user does not need to follow back if the user is fol- lowed by another. As a result, Twitter is considered as a new social media (Kwak et al. 2010), and short messages can be Introduction broadcasted to a Twitter user’s followers in real time. Tumblr is also posed as a microblogging platform. Tum- Tumblr, as one of the most prevalent microblogging sites, blr users can follow another user without following back, has become phenomenal in recent years, and it is acquired which forms a non-reciprocal social network; a Tumblr post by Yahoo! in 2013. By mid-January2014, Tumblr has 166.4 3 can be re-broadcasted by a user to its own followers via re- millions of users and 73.4 billions of posts . It is reported to blogging. But unlike Twitter, Tumblr has no length limi- be the most popular social site among young generation, as 4 tation for each post, and Tumblr also supports multimedia half of Tumblr’s visitor are under 25 years old . Tumblr is post, such as images, audios or videos. With these differ- ranked as the 16th most popularsites in United States, which arXiv:1403.5206v2 [cs.SI] 30 Jul 2014 ences in mind, are the social network, user generated con- is the 2nd most dominant blogging site, the 2nd largest mi- 5 tent, or user behavior on Tumblr dramatically different from croblogging service, and the 5th most prevalent social site . other social media sites? In contrast to the momentum Tumblr gained in recent press, little academic research has been conducted over this bur- In this paper, we provide a statistical overview over Tum- geoning social service. Naturally questions arise: What is blr from assorted aspects. We study the social network struc- Tumblr? What is the difference between Tumblr and other ture among Tumblr users and compare its network proper- blogging or social media sites? ties with other commonly used ones. Meanwhile, we study content generated in Tumblr and examine the content gen- eration patterns. One step further, we also analyze how a 1This paper will be officially published on SIGKDD Explo- rations 2014, please cite official version. blog post is being reblogged and propagated through a net- 2This work is also reported by MIT Technology Review at work, both topologically and temporally. Our study shows http://www.technologyreview.com/view/525966/the-anatomy-of- a-forgotten-social-network/ 6http://blogspot.com 3http://www.tumblr.com/about 7http://livesocial.com 4http://www.webcitation.org/64UXrbl8H 8http://facebook.com 5http://www.alexa.com/topsites/countries/US 9http://twitter.com that Tumblr provides hybrid microblogging services: it con- Photo: 78.11% Text: 14.13% tains dual characteristics of both social media and traditional Quote: 2.27% blogging. Meanwhile, surprising patterns surface. We de- Audio: 2.01% Video: 1.35% scribe these intriguing findings and provide insights, which Chat: 0.85% hopefully can be leveraged by other researchers to under- Answer: 0.82% stand more about this new form of social media. Link: 0.46% Tumblr at First Sight Tumblr is ranked the second largest microblogging service, right after Twitter, with over 166.4 million users and 73.4 billion posts by January 2014. Tumblr is easy to register, and one can sign up for Tumblr service with a valid email address within 30 seconds. Once sign in Tumblr, a user can follow other users. Different from Facebook, the connec- tions in Tumblr do not require mutual confirmation. Hence Figure 2: Distribution of Posts (Better viewed in color) the social network in Tumblr is unidirectional. Both Twitter and Tumblr are considered as microblogging platforms. Comparing with Twitter, Tumblr exposes several in the figure, even though all kinds of content are sup- differences: ported, photo and text dominate the distribution, accounting for more than 92% of the posts. Therefore, we will con- • There is no length limitation for each post; centrate on these two types of posts for our content analysis • Tumblr supports multimedia posts, such as images, audios later. and videos; Since Tumblr has a strong presence of photos, it is natural • Similar to hashtags in Twitter, bloggers can also tag their to compare it to other photo or image based social networks blog post, which is commonplace in traditional blog- like Flickr10 and Pinterest11. Flickr is mainly an image host- ging. But tags in Tumblr are seperate from blog content, ing website, and Flicker users can add contact, comment or while in Twitter the hashtag can appear anywhere within like others’ photos. Yet, different from Tumblr, one cannot a tweet. reblog another’s photo in Flickr. Pinterest is designed for curators, allowing one to share photos or videos of her taste • Tumblr recently (Jan. 2014) allowed users to mention and with the public. Pinterest links a pin to the commercial web- link to specific users inside posts. This @user mechanism site where the product presented in the pin can be purchased, needs more time to be adopted by the community; which accounts for a stronger e-commerce behavior. There- • Tumblr does not differentiate verified account. fore, the target audience of Tumblr and Pinterest are quite different: the majority of users in Tumblr are under age 25, while Pinterest is heavily used by women within age from 25 to 44 (Mittal et al. 2013). We directly sample a sub-graph snapshot of social net- work from Tumblr on August 2013, which contains 62.8 million nodes and 3.1 billion edges. Though this graph is Figure 1: Post Types in Tumblr not yet up-to-date, we believe that many network proper- ties should be well preserved given the scale of this graph. Specifically, Tumblr defines 8 types of posts: photo, text, Meanwhile, we sample about 586.4 million of Tumblr posts quote, audio, video, chat, link and answer. As shown in from August 10 to September 6, 2013. Unfortunately, Tum- Figure 1, one has the flexibility to start a post in any type ex- blr does not require users to fill in basic profile information, cept answer. Text, photo, audio, video and link allow one to such as gender or location. Therefore, it is impossible for us post, share and comment any multimedia content. Quote and to conduct user profile analysis as done in other works. In chat, which are not available in most other social network- order to handle such large volume of data, most statistical ing platforms, let Tumblr users share quote or chat history patterns are computed through a MapReduce cluster, with from ichat or msn. Answer occurs only when one tries to some algorithms being tricky. We will skip the involved im- interact with other users: when one user posts a question, in plementation details but concentrate solely on the derived particular, writes a post with text box ending with a question patterns. mark, the user can enable the option for others to answer the Most statistical patterns can be presented in three dif- question, which will be disabled automatically after 7 days. ferent forms: probability density function (PDF), cumula- A post can also be reblogged by another user to broadcast to tive distribution function (CDF) or complementary cumula- his own followers. The reblogged post will quote the origi- tive distribution function (CCDF), describing P r(X = x), nal post by default and allow the reblogger to add additional P r(X ≤ x) and P r(X ≥ x) respectively, where X is a comments.