Pinterest: Case study on a picturesque network

Sudip Mittal, Neha Gupta, Ponnurangam Kumaraguru IIIT Delhi {sudip09068, neha1209, pk}@iiitd.ac.in

menace which is concerning photographers, designers etc. INTRODUCTION Spammers are using the fact that when a user clicks an image he/she is directed to the source of the image. Spammers use this We present a preliminary study of users and their activity on method to direct the user to a phishing page. The user never Pinterest. Our target is to analyze and understand the behavior suspects that the image can be linked to a phishing site through of the pins, boards, activities etc. on this OSN. this technique. Pinterest introduced in March 2010 is one of the fastest growing Seeing the growth of Pinterest and the amount of users it online social networking sites. As on 5th Jan’2013, it has 25 attracts, it was surprising to observe that not much work related million users [1]. In a matter of years it has become an to security and privacy has been done on this OSN so far. immensely popular online social media. Before the launch of

Pinterest there were many image based social networks like , Instagram, Picasa etc. but still Pinterest has METHODOLOGY actually overtaken them all. Pinterest is being used by big business houses like etsy, The Gap, Allrecipes, Jetsetter and With no standard API provided, we started with some handles as many others to advertise their products to the users of the initial seed to collect the large number of Pinterest profiles website. More and more companies are looking at Pinterest as a (Table 1). These handles are selected such that they have a huge good online marketing platform. Being an image oriented social follower set [2-4]. Thus, taking these seeds as input, we created network, Pinterest is attracting many professionals such as a site crawler which would crawl through all the followers along photographers, designers, interior decorators, small industry with the “followers of the followers” and so on. With the help of manufactures as a credible and a potent advertising tool. this continuous crawler, we have captured a dataset of 2.4 Million unique handles (still growing). With these user handles As each OSN has some associated jargon with it, Pinterest also as the top level data, we started to collect the individual profile uses some terminology to refer to the various services it information and the data associated. As a part of our study, we provides: managed to collect 748,275 user profiles, 49 million pins and 271,526 boards. Pins: Users share content via Pins. A pin is an image that is generally uploaded by the user from a “source”. User #Followers Pinner: The user who has “pinned” a pin. ohjoy 12,139,575 Pin Boards: Themed collection of pins, organized by a user. bekkapalmer 8,185,742

Activity on Pinterest: A user can like, comment or repin a pin. janew 7,617,662

A “Pin It” button: a browser bookmark used to upload content. formfireglass 2,036,776 Unlike all the other OSNs, Pinterest is an extensive photo beaherzberg 1,114,013 sharing platform. Where on one hand, the sites like Facebook and twitter follow a text dominant approach; Pinterest follows Table 1. Seed users. an image dominant approach. The amount of personal user RESULTS information that it holds is also less as compared to others. This After collecting sufficient amount of data, we started with the can be clearly understood from the fact that Pinterest does not analysis on boards, pins and the user profile. The collected even provide a section asking the users for the basic information dataset includes the following- about, image, number of like gender, date of birth, phone number etc. but users are free to followings, number of followers, location, facebook ID, twitter connect their profiles on other OSNs like Facebook, Twitter ID, boards, pins, comments etc. with their Pinterest profile. Total Max Mean With all the boards and pins public, Pinterest does not restrict its users to like, comment or repin any of the available pins. The #boards 7,068,484 799 9.477 profiles along with all the related data can also be viewed by any #pins 235,655,572 100135 315.7 user. This in turn highlights the weak security policies in #likes 10,559,633 5640 0.2 contrast to that provided by other leading OSNs. #Comment 316,663 3345 0.006 With the growth of Pinterest and the lack of various security and privacy features on this OSN, it has become a hub for spammers #Repins 34,623,510 20212 0.706 and people who infringe on image copyrights. Copyright violations of protected content by Pinterest users is becoming a Table 2. Total, maximum and mean data 2. The number of pins and 3. The number of followers under each category. The category “Food & Drink” is found to be the most popular for all the 3 factors followed by “DIY & Crafts” for factor 1 and

Figure 1. Popular words in “about” In Pinterest, users provide a small description of themselves in the about field. As it can be seen in the wordle (Figure 1) many of the top words are professions like photographer, teacher, designer etc. While others are hobbies like photography, cooking, reading, gardening etc. All these words help us understand various interest areas of the users on Pinterest. Furthermore it was observed that most of the user’s activities were correlated with their interests/profession.

Figure 4. Board categories (top 23 categories)

“Women’s Fashion” for factor 2 and 3.(Figure 4) When calculating the average number of followers, we first calculate it taking into account the excessively high impact user handles and the average is observed to be 847.3. When we

Figure 2. Popular words in “boards” Figure 3. Popular words in “pins” eliminate these high impact user handles, the average drops down steeply to 111.7083.

We also analyzed the names of Pin boards on Pinterest (Figure IMPLICATIONS 2). Pin boards represent the topics on which the users post. Through this study we to understand and analyze user Some of the most common boards have home, love, places, behavior on Pinterest. We present a simple idea about the styles etc in their titles. Hence, as these Pin boards are quite popularity of various kinds of boards and pins that the users common among users, we can conclude that these are some of create. This generalized analysis will lay a foundation to analyze the most talked about topics on Pinterest. and categorize the possible spams on Pinterest along with the copyright issues. We also aim to understand and differentiate the Each pin has a pin description associated with it. On analysis of user behavior on Pinterest with that on other OSNs. this description, we were able to identify what were users pinning about the most (Figure 3). From this wordle, we conclude that home, DIYs, books etc. are some of the most FUTURE WORK common words used by the users to describe their pins. After completing with this user study we wish to analyze and investigate spam and copyright issues on Pinterest.

BlogSpot Pinterest etsy Flickr polyvore wordpress Google houzz Amazon REFERENCES [1]http://expandedramblings.com/index.php/resource-how- marthastewart Facebook Bing eBay Twitter many-people-use-the-top-social-media/

Table 3. Top sources (left to right) [2]Profile of ohjoy: http://pinterest.com/ohjoy [3]Profile of bekkapalmer: http://pinterest.com/bekkapalmer/ The most important information embedded in pins is the source [4]Profile of janew : http://pinterest.com/janew/ of the pinned image. Table 3 lists the top 15 sources on [5]Profile of formfireglass: http://pinterest.com/formfireglass Pinterest. [6]Profile of beaherzberg: http://pinterest.com/beaherzberg Pinterest provides 33 different categories for board creation. We analyzed the popularity of all these categories based on 3 factors: 1. The occurrence of boards,