<<

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 144

ACCEPTED FROM OPEN CALL Understanding User Behavior in Online Social Networks: A Survey

Long Jin, University of California, San Diego Yang Chen, Duke University Tianyi Wang, Tsinghua University Pan Hui, Hong Kong University of Science and Technology/Telekom Innovation Laboratories Athanasios V. Vasilakos, Kuwait University

ABSTRACT standing OSN user behavior is important to dif- ferent Internet entities in several aspects: Currently, online social networks such as • For Internet service providers (ISPs), as , , Google+, LinkedIn, and OSN traffic is growing quickly and becom- Foursquare have become extremely popular all ing significant, they want to learn the evolu- over the world and play a significant role in tion of the traffic pattern of OSNs. This can people’s daily lives. People access OSNs using guide them to do some infrastructural both traditional desktop PCs and new emerging actions (e.g., adding traffic optimization in mobile devices. With more than one billion network middle-boxes). users worldwide, OSNs are a new venue of inno- • For OSN service providers, it helps them vation with many challenging research prob- understand their customers’ attitudes lems. In this survey, we aim to give a toward different functions, especially for comprehensive review of state-of-the-art some experimental functions. Moreover, research related to user behavior in OSNs from from the perspective of infrastructure several perspectives. First, we discuss social con- investment, such as which locations are nectivity and interaction among users. Also, we most cost-effective to build data centers or investigate traffic activity from a network per- which content delivery network (CDN) spective. Moreover, as mobile devices become a cluster could be leveraged to deliver fre- commodity, we pay attention to the characteris- quently accessed data, understanding users’ tics of social behaviors in mobile environments. geographic distribution and traffic activity is Last but not least, we review malicious behav- vital. iors of OSN users, and discuss several solutions • For OSN users, behavior study is important to detect misbehaving users. Our survey serves to enhance user experience. For example, the important roles of both providing a system- there are numerous malicious accounts in atic exploration of existing research highlights OSNs. These accounts generate unwanted and triggering various potentially significant messages for legitimate users. Therefore, research in these topics. identifying and blocking malicious users are very important to ensure good user experi- INTRODUCTION ence. Our survey contains four aspects of under- In recent years, online social networks (OSNs) standing user behavior in OSNs. First, a social have dramatically expanded in popularity around graph is a classic and effective mathematical the world. According to the data in October model to represent the relationship between 2012, Facebook has 1.01 billion people using the users in OSNs, and has been widely used in OSN site each month.1 Moreover, the numbers of research. Based on four different types of social users in five popular OSNs are listed in Table 1. graphs, we discuss the aspect of connectivity and The rapid growth of OSNs has attracted a large interaction. Second, network monitoring records number of researchers to explore and study this detailed traffic activity of OSNs and provides us popular, ubiquitous, and large-scale service. In with a method to understand the network usage this article, we focus on understanding user of OSNs. Also, network-based measurement behavior in OSNs. results can demonstrate more users’ activities 1 http://finance.yahoo. OSN user behavior covers various social than using the only. Therefore, we com/news/number-active- activities that users can do online, such as friend- focus on the perspective of traffic activity. Third, users-facebook-over- ship creation, content publishing, profile brows- the rapid development of mobile platforms and years-214600186—financ ing, messaging, and commenting. Notably, these applications plays an important role in OSN- e.html activities can be legitimate or malicious. Under- related applications. Mobile devices not only

144 0163-6804/13/$25.00 © 2013 IEEE IEEE Communications Magazine • September 2013 VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 145

provide a venue for users to access OSNs every- where, but also establish mobile-centric func- OSN site No. of users tions like location-based service (LBS) in OSNs. Understanding the behavior of mobile users can Facebook 1.01 billion (Oct. 2012) be leveraged to enhance the performance of mobile social applications and systems. Thus, we Twitter 500 million (Apr. 2012) review the studies of mobile social behavior. Last but not least, OSNs introduce new challenges Google+ 400 million (Sep. 2012) related to security and privacy. Malicious behav- iors, such as spam and Sybil attacks, take place LinkedIn 175 million (Jun. 2012) in OSNs and bring severe security threats. We show studies of malicious behavior. Foursquare 25 million (Sep. 2012)

CONNECTIVITY AND INTERACTION Table 1. Information about five popular OSNs. MOTIVATION AND CHALLENGES The social graph is an effective and widely-used Type Edge mathematical tool to represent the relationships among users in OSNs, which benefits the analy- Friendship graph Friendship between users sis of social interactions and user behavior char- acterization. Usually, social networks can be Interaction graph Visible interaction, such as posting on a wall modeled as undirected graphs (e.g., friendship graph, interaction graph) or directed graphs Latent graph Latent interaction, such as browsing profile (e.g., latent graph, following graph) according to the properties of OSNs. Table 2 lists four differ- Following graph Subscribe to receive all messages ent types of social graphs. Based on these graph types, we discuss the connectivity and interaction Table 2. Four different types of social graph. among OSN users. Moreover, the huge size of the social graph challenges the effectiveness of analysis. Thus, graph sampling and crawling techniques have been proposed to deal with this of a node shows the number of visitors to that problem. In this section, we investigate several user’s profile, while the out-degree reveals the measurement, analysis, and modeling works number of profiles that user has visited. A com- related to the social graph. parison between latent interactions and visible interactions is conducted based on ’s EXISTING SOLUTIONS AND DISCUSSION crawled data, which contains 42 million users Undirected Graph Model — For a friendship and 1.66 billion social links. There are three graph, every user is denoted as a node, and the major findings. First, latent interactions are sig- friendship between any user pair is represented nificantly more prevalent and frequent than visi- by an edge. Wilson et al. [1] try to find out ble interactions. Second, latent interactions are whether social links are valid indicators of user non-reciprocal in nature. Last but not least, the interactions. They define wall posts and photo profile popularity is uncorrelated with the fre- comments as interactions. Based on the crawled quency of content updates or number of friends data from Facebook, they have found that users for very popular users. The characteristics of tend to interact mostly with only a small subset latent graphs are shown to fall between visible of their friends, while often having no interac- interaction graphs and classical friendship tion with up to half of their friends. Therefore, graphs. friendship in OSNs can hardly be viewed the Hwak et al. [3] perform extensive measure- same as friendship in the real world. Corre- ment on Twitter, the world’s largest microblog- spondingly, a new interaction graph is proposed ging service, and reveals its power in information to reflect the real user interactions in social net- spreading on the news media level. In Twitter’s works, where only visible interaction between following graph, a directed edge from A to B two users can create an edge in the graph, indicates A has subscribed to receive B’s latest instead of being friends only. Using two repre- messages. The collected data is crawled over 24 sentative applications, spam and Sybil protec- days, with 41.7 million user profiles, 1.47 billion tions, they demonstrate that using an interaction relations, 4262 trending topics, and 106 million graph performs better than using a friendship tweets. It introduces a directed graph model to graph. give a basic informative overview of Twitter, studies the distribution of followers/followees, Directed Graph Model — Latent interactions and analyzes how the number of followers or fol- are passive actions of OSN users (e.g., profile lowees affects the number of tweets. Additional- browsing) that cannot be observed by traditional ly, in order to show how Twitter acts as a social measurement techniques. Jiang et al. [2] study medium and top users influence other users, this latent interactions based on the crawled data of article tries to rank the users by number of fol- Renren, the largest OSN provider in China. lowers, page rank, and retweets. The rankings by Renren tracks the most recent nine visitors to number of followers and page rank are almost every user’s profile, making the measurement of the same, and the top users in the rankings are latent interactions possible. In a directed latent either celebrities or news media accounts. This graph, a directed edge from A to B indicates A article also analyzes the trending topics in Twit- has visited B’s profile. Therefore, the in-degree ter and compares it with other media. It is found

IEEE Communications Magazine • September 2013 145 VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 146

A fast increase in the 2. Authentication number of users to all sites makes the size of social graphs larger 1. User login and larger, which presents researchers aggregator with a big challenge 3. OSN activity when performing any analysis with lim-

ited computation Data collection and storage capabili- Data collection through social network aggregator ty. Graph sampling techniques are used Figure 1. Data collection through a social network aggregator. to get a smaller but representative snap- that the majority (over 85 percent) of trending more interesting findings. There are several shot of social graphs. topics in Twitter are headline or persistent news challenges for performing dynamic analysis. One in nature, which reveals Twitter’s live broadcast- is fast data collection and timely processing, ing nature and confirms Twitter’s role as a news where an unbiased and efficient graph sampling medium. algorithm can play an important role. Also, col- lecting dynamic data raises challenges for infor- Graph Sampling — A fast increase in the num- mation storage; therefore, the temporal and ber of users makes the size of social graphs larg- spatial dependence between different data items er and larger, which presents researchers with a can be utilized for better compression. big challenge when performing any analysis with limited computation and storage capability. TRAFFIC ACTIVITY Graph sampling techniques are used to get a smaller but representative snapshot of social MOTIVATION AND CHALLENGES graphs, which preserves properties such as Different kinds of social graphs can reveal how degree distribution. As shown in [4], the sam- users connect and interact with each other. pling result of Breadth-First Sampling (BFS) and However, due to the limited information that the Random Walk (RW) are biased toward high- graph can represent, various types of users’ degree vertices, although they have been widely activities cannot be characterized (e.g., time used in social graph analysis. The Metropolis- duration of browsing a profile). An observation Hasting RW (MHRW) and a Re-Weighted RW from network operators can monitor such infor- (RWRW) are proposed and proved to perform mation easily, and interpret how users use OSNs uniformly in sampling Facebook. The article also better. Furthermore, for ISPs, they have strong introduces online convergence diagnostics to incentive to get better understanding of how the assess sample quality during the sampling pro- traffic pattern between end users and OSN sites cess. Frontier Sampling (FS) [5], which leverages will evolve, and take optimization actions accord- multidimensional RW, is proposed to achieve ing to the distribution and activities of OSN lower estimation errors than RW, especially in users. In this section, we review OSN user behav- the presence of disconnected or loosely connect- ior study from the perspective of network traffic ed graphs. Ribeiro et al. [5] show that FS is more analysis. suitable for estimating the tail of degree distri- bution than random vertex sampling. Moreover, EXISTING SOLUTIONS AND DISCUSSION FS can be made fully distributed without any Traffic Monitoring — Besides crawling, people coordination costs. can also study OSNs by monitoring the corre- sponding network traffic. Benevenuto et al. [6] FUTURE WORK analyze the user behavior of OSNs based on The dynamic feature is an important aspect to detailed clickstream data obtained from a social deeply understand an OSN’s user behavior. network aggregator, as illustrated in Fig. 1. Much of the existing work tries to investigate an In [6], the clickstream data was collected over OSN in a relatively static way, by collecting or 12 days with HTTP sessions of 37,024 users who studying a static snapshot dataset. However, the accessed popular social networks. This article growth of OSNs is extremely rapid. Every day defines and analyzes the OSN session character- new users join OSNs, while existing users make istics: new friends or end social connections, join or • The frequency of accessing OSNs leave groups, and so on. Considering this dynam- • Total time spent on OSNs ic can extract more inherent information than • Session duration of OSNs studying static data, not only revealing the situa- Through the clickstream data, user activities are tion at a certain time but also predicting some also identified. Forty-one types of user activities future activities. Also, studying different time are classified into nine groups, and the populari- intervals and time granularities would lead to ty of different activities and the traffic bytes are

146 IEEE Communications Magazine • September 2013 VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 147

analyzed. Interestingly, it is found that silent or rower subset of the web than search engines. latent interactions such as browsing account for While web sites related to games and video are We envision that more than 90 percent of user activities. Also, more commonly visited from OSNs, shopping they show how users have different activities in and reference sites are common for search OSN providers can different OSNs. They also characterize how engines. Finally, OSNs send users to less popular collaborate with users transit from one activity to another using a domains more often than search engines. These academia and indus- first-order Markov chain. findings can be useful to ISPs in network provi- Schneider et al. [7] also study clickstream sioning and traffic engineering. trial researchers in data, but their focuses are feature popularity, order to understand session characteristics, and the dynamics within FUTURE WORK OSN sessions. The distribution of HTTP Most existing measurement and analysis projects user behavior in an request-response pairs reveal the popularity of are led by either academic groups or ISPs, with- insightful way. This different features. The popularity of features out the active involvement of OSN service pro- can enhance the can be different among users from different viders. Such a situation limits the insight of the areas and of different OSNs. It can also differ study. On one hand, academic researchers always user experience by the time spent by the users. Besides, the dis- use extensive crawling to obtain the data, which interactively and tribution of transmission bytes per OSN session encounters many restrictions from the OSN pro- is given, which helps the ISPs learn the traffic viders, such as traffic control (how many mes- quickly. Also, this will pattern of different OSNs. Photo features sages per IP and/or per account can be fetched save operational account for most traffic bytes of OSNs. It also in one hour). Also, some users may use privacy costs for OSN shows the duration of sessions and number of options to make their data unavailable. Last but subsessions within a session. Moreover, the arti- not least, the huge number of users makes it providers. cle reveals the dynamics within OSN sessions. It almost impossible to get a timely snapshot, so is found that most users access web sites other data consistency cannot be guaranteed. On the than OSNs during OSN sessions for more than other hand, although an ISP is able to capture 1 min. That is, users can be inactive when and analyze all its traffic to/from an OSN site accessing the OSNs. through traffic monitoring, it can only get a par- Clickstream data contributes a lot in the user tial view of the whole site; that is, only users who behavior study of OSNs. However, it can be get access to OSNs through a specific ISP’s incomplete, which restricts its usage and perfor- infrastructure can be observed. As we have dis- mance. First, click-stream data is limited by the cussed, user behavior study can be beneficial for collection duration, and the behavior of inactive OSN providers themselves. We envision that users in the duration is not monitored. More- OSN providers can collaborate with academia over, the data is restricted by the monitoring and industrial researchers in order to understand locations. That is, only the behavior of users user behavior in an insightful way. This can using certain monitored ISPs is captured. enhance the user experience interactively and quickly. Also, this will save operational costs for Locality of Interest — Facebook is heavily OSN providers. dependent on centralized U.S. data centers to provide consistent service to users all over the MOBILE SOCIAL BEHAVIOR world. Therefore, users outside the United States experience slow response time. Also, a lot MOTIVATION AND CHALLENGE of unnecessary traffic is generated on the Inter- Nowadays, due to the wide use of mobile devices, net backbone. Wittie et al. [8] investigate the more and more web applications have been detailed causes of these two problems and - expanded to mobile platforms, as have OSN ser- tify mitigation opportunities. It is found that vices. We believe that it is the right time to high- OSN state is amenable to partitioning, and its light the importance of mobile social networks fine-grained distribution and processing can sig- (MSNs). In MSNs, mobile users can publish and nificantly improve performance without loss in share information based on the social connec- service consistency. Based on simulations of tions among them. On one hand, most major reconstructed Facebook traffic over measured OSN platforms such as Facebook, Twitter, and Internet paths, it is shown that user requests can LinkedIn release mobile applications to allow be processed 79 percent faster and use 91 per- users to access their services through mobile cent less bandwidth. Therefore, the partitioning devices. On the other hand, more mobile-centric of OSN state is an attractive scaling strategy for functions have been integrated into OSNs, such OSN service providers. as location-based services and mobile communi- cation. Understanding the user behavior in Navigation Characteristics — Nowadays, MSNs is very helpful for the design and imple- OSNs represent a significant portion of web traf- mentation of MSN systems, improving the sys- fic, comparable with search engines. Dunn et al. tem efficiency in mobile environments or [9] try to understand the similarities and differ- supporting better mobile-centric functions. In ences in the web sites users visit through OSNs this section, we focus on studies of user behav- vs. through search engines. Using web traffic iors in MSNs. logs from 17,000 digital subscriber line (DSL) subscribers of a Tier 1 ISP in the United States, EXISTING SOLUTIONS AND DISCUSSION it is found that OSN visitors are less likely to Mobile Social Application — A large number navigate to external web sites. But when they of interesting and useful mobile social applica- visit external web sites, OSN users will spend tions have been proposed. Social Serendipity [10] more time at those web sites compared to search is a mobile-phone-based system that combines engine users. Also, OSNs direct visitors to a nar- widely used mobile phones with the functionality

IEEE Communications Magazine • September 2013 147 VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 148

Friendship and Mobility in LBSN —Although human movement and mobility patterns have a Globe community high degree of freedom and variation, they also Destination exhibit structural patterns due to geographic and social constraints. Using cell phone location data, as well as data from two online location- Ranking based social networks, Cho et al. [12] aim to understand the basic laws that govern human motion and dynamics. It is found that humans experience a combination of strong short-range spatially and temporally periodic movement that is not impacted by the social network structure, while long-distance travel is more influenced by the social network ties. Furthermore, it is shown that social relationships can explain about 10 to 30 percent of all human movement, while peri- odic behavior explains 50 to 70 percent. Based Sub community on these findings, a model of human mobility is proposed that combines periodic short-range Source movements with travel due to the social network Illustration of the BUBBLE Rap algorithm structure and gives an order of magnitude better performance than previous models. Figure 2. Illustration of the BUBBLE Rap algorithm. Social-Based Routing in PSNs — Widely used smart devices with networking capability form novel networks, such as pocket switched network of online introduction systems to cue informal (PSN). Due to the mobility of devices, PSNs are face-to-face interactions between nearby users intermittently connected, and effective routing who do not know each other but probably protocols are essential in such networks. Previ- should. Serendipity uses Bluetooth to sense ous methods relied on building and updating nearby people and utilizes a centralized server to routing tables to deal with dynamic conditions. decide whether two users should be introduced Actually, the social structure and the interaction to each other. The system calculates a similarity of users of smart devices have a great influence score by extracting the commonalities between on the performance of routing protocols. BUB- two proximate users’ profiles and behavioral BLE Rap [13] is a social-based forwarding data, and sums them according to user-defined method for PSNs. Two social and structural met- weights. If the score is higher than the threshold rics, centrality and community, are used to effec- set by both users, the system will inform them tively enhance delivery performance. As shown that someone nearby might be interested in in Fig. 2, BUBBLE Rap first uses a centrality them. For instance, internal collaboration in metric to spread out the messages (i.e., sending large companies can be facilitated by Serendipity messages to more popular nodes), and then uses for introducing people who are working on simi- a community metric to identify the destination lar projects. It is emphasized that privacy issues community and focus the messages to the desti- are important and fundamental in Serendipity, nation. The evaluation shows that BUBBLE Rap and privacy-protecting tools should be designed has a similar delivery ratio, but much lower carefully. resource utilization than flooding, control flood- ing, and other social-based forwarding schemes. Geographical Prediction in OSN — Geogra- phy and social relationships are inextricably Content Distribution in MSN — Ioannidis et intertwined. As people spend more time online, al. [14] study the dissemination of dynamic con- data regarding these two dimensions are becom- , such as news and traffic information, over ing increasingly precise, allowing building reli- an MSN. In this application, mobile users sub- able models to describe their interaction. In scribe to a dynamic-content distribution service [11], the study of user-contributed address and offered by their service provider. To improve association data from Facebook shows that the coverage and increase capacity, it is assumed addition of social information produces that users share any content updates they receive improvement in accuracy of predicting physical with other users they meet. Reference 14 deter- location. First, friendship as a function of dis- mines how the service provider can allocate its tance and rank is analyzed. It is found that at bandwidth optimally to make the content at medium to long-range distances, the probability users as “fresh” as possible. Moreover, there is a of friendship is roughly proportional to the condition under which the system with high scal- inverse of distance. However, at shorter ranges, ability is specified: even if the total bandwidth distance does not influence much. Then the dedicated by the service provider remains fixed, maximum likelihood approach is presented to the expected content age at each user grows predict the physical location of a user, given the slowly (as log(n)) with the number of users n. known location of her friends. This method pre- dicts the physical location of 69.1 percent of the FUTURE WORK users with 16 or more located friends to within There are several fundamental issues that 25 mi, compared to only 57.2 percent using IP- require continuous exploration in the research based methods. related to user behavior in MSNs, including

148 IEEE Communications Magazine • September 2013 VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 149

incentive mechanism, identity management, 3.5 million Facebook users. The system detected trust, reputation and privacy, energy efficiency, 200,000 malicious wall posts with embedded Because of the methods for social network metrics estimation URLs originating from more than 57,000 and community detection, content distribution accounts. It is shown that more than 70 percent abundance of avail- and sharing protocols, and precise localization of all malicious wall posts advertise phishing able personal infor- techniques for geographic and semantic . sites. It is also found that more than 97 percent mation, OSNs suffer A comprehensive summary related to applica- are compromised accounts rather than”fake” tions, architectures, and protocol design issues accounts created solely for the purpose of spam- from the vital for MSNs can be found in [15]. ming. Finally, spamming dominates actual wall problem of privacy Furthermore, we believe social data delivery post activity in the early morning hours, when and social applications in mobile environments normal users are asleep. breach. Such attacks rouse challenges in several layers of the Internet Lumezanu et al. [18] perform a joint analysis may be caused by protocol stack. Let us list three examples here. of spam in email and social networks. Spam data three primary parties First, we need a better transport layer protocol from Yahoo’s web-based email service and Twit- to handle packet loss caused by wireless environ- ter are used to characterize the publishing behav- in the OSN: service ment and host mobility. Second, to efficiently ior and effectiveness of spam advertised across providers, malicious deliver popular content desired by multiple MSN both platforms. It is shown that email spammers users, we need to deploy social-aware proxies in that also advertise on Twitter tend to send more users, and third-party the network infrastructure to eliminate duplicate email spam than those advertising exclusively applications. transmission. The deployment of those proxies through email. Furthermore, sending spam on needs to carefully consider social connections, both email and Twitter has better exposure than users’ geolocations, and the topology of the spamming exclusively with email: spam domains underlying wired/wireless Internet. Third, con- appearing on both platforms are looked up by text-aware services will become very useful in an order of magnitude more networks than MSNs. Such services will let user express their domains using just one platform. demand for social activities in cyberspace in a human-readable fashion, thus making social Social-Graph-Based Sybil Defense — Sybil interaction among mobile users easier. All three attacks are the fundamental problem in peer-to- of these examples need lots of work in data ana- peer and other distributed systems. In a Sybil lyzing, modeling, and prototyping. attack, a malicious attacker creates multiple fake identities to influence the working of systems MALICIOUS BEHAVIOR that depend on open membership, such as rec- ommendation and delivery systems. Recently, a MOTIVATION AND CHALLENGES number of social network-based schemes, such The usage of OSNs introduces numerous securi- as SybilGuard, Sybillimit, SybilInfer, and SumUp, ty and privacy threats. For instance, as a user have been proposed to mitigate Sybil attacks. needs to interact with other users through an Viswanath et al. [19] develop a deep understand- OSN service provider, its activities and uploaded ing of these approaches. It shows that existing data can be tracked and stored by the OSN ser- Sybil defense schemes, which can be viewed as vice provider. These data (photos, articles, pub- graph partitioning algorithms, work by identify- lic posts, private messages, etc.) may be leaked ing local communities (i.e., clusters of nodes to a third party without the user’s explicit autho- more tightly knit than the rest of the graph) rization, even when the user regards some of around a trusted node. Therefore, the substan- these as confidential. Moreover, Sybil attacks are tial amount of prior research on general commu- very common in OSNs, as a user can register nity detection algorithms can be used to design multiple fake accounts maliciously. These fake effective and novel Sybil defense schemes. accounts can perform various malicious activities Usually, binary Sybil/non-Sybil classifiers have including spamming, obtaining privacy contact high false positives; thus, manual inspection lists, misleading crowd-sourcing results, and so needs to be involved in the decision process for on. Besides those, Gao et al. [16] list several suspending an account. SybilRank [20] aims to other attacks such as re-identification and de- efficiently derive a Sybil-likelihood ranking; only anonymization of anonymized OSN data, fetch- the most suspicious accounts need to be inspect- ing personal data through untrusted third-party ed manually. It is based on efficiently com- applications, cross-site profile cloning, social putable early-terminated RWs and is suitable for spamming, and phishing. Due to space limita- parallel implementation on a framework such as tion, this survey mainly focuses on malicious Map Reduce, uncovering Sybils in OSNs with behavior in OSNs, including spam and Sybil millions of accounts. SybilRank is deployed and attacks. tested in the operation center of , which is the largest OSN in Spain with 11 million users. EXISTING SOLUTIONS AND DISCUSSION Almost 100 and 90 percent of the 50K and 200K Social Spam — OSNs are popular collabora- accounts, which SybilRank regards as the most tion tools for millions of users and their friends. suspicious, are indeed fake. In contrast, the hit Unfortunately, they also become effective tools rate of the current user-report-based approach is for executing spam campaigns and spreading only 5 percent. Thus, SybilRank represents a sig- malware. Intuitively, a user is more likely to nificant step toward practical Sybil defense. respond to a message from a friend than from a stranger; thus, social spamming is a more effec- FUTURE WORK tive distribution mechanism than traditional Because of the abundance of available personal email. Gao et al. [17] study a large dataset com- information, OSNs suffer a vital problem of pri- posed of over 187 million wall messages among vacy breach, and such attacks may be caused by

IEEE Communications Magazine • September 2013 149 VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 150

three primary parties in the OSN: service pro- [13] P. Hui, J. Crowcroft, and E. Yoneki, “BUBBLE Rap: Social- viders, malicious users, and third-party applica- Based Forwarding in Delay Tolerant Networks,” IEEE Trans. We envision that this Mobile Computing, vol. 10, Nov. 2011, pp. 1576–89. tions. Decentralized OSN is a potential research line will [14] S. Ioannidis, A. Chaintreau, and L. Massoulie, “Optimal architecture to protect sensitive information and Scalable Distribution of Content Updates over a enhance the user from leaking out to service providers and third- Mobile Social Network,” Proc. INFOCOM, 2009. party applications. However, how to provide [15] N. Kayastha et al., “Applications, Architectures, and experience from Protocol Design Issues for Mobile Social Networks: A incentives to encourage users to switch to a Survey,” Proc. IEEE, vol. 99, no. 12, 2011, pp. 2130–58. various aspects, decentralized OSN is challenging, especially for [16] H. Gao et al., “Security Issues in Online Social Net- as well as satisfy users who do not care much about their privacy works,” IEEE Internet Computing, vol. 15, no. 4, 2011. and security. Furthermore, the recipients of [17] H. Gao et al., “Detecting and Characterizing Social different players, Spam Campaigns,” Proc. IMC, 2010. shared information should be controlled by the [18] C. Lumezanu and N. Lumezanu, “Observing Common including the users themselves. Instead of sharing information Spam in Tweets and Email,” Proc. IMC, 2012. [19] B. Viswanath et al., “An Analysis of Social Network- infrastructure based on the virtual links in OSNs, real-life rela- tionship between users should also be taken into Based Sybil Defenses,” Proc. SIGCOMM, 2010. providers, service [20] Q. Cao et al., “Aiding the Detection of Fake Accounts account. Finally, Sybil defense is still a hot topic in Large-Scale Online Social Services,” Proc. NSDI, 2012. providers, and more solid work are expected to conduct in this area. We foresee that semantic information BIOGRAPHIES and end users. extracting from user profiles and social behavior LONG JIN [email protected]) is currently a Ph.D. student can be used for Sybil detection, and should be in the Department of Computer Science and Engineering, utilized collaboratively with existing schemes. University of California, San Diego. He received his B.S. and M.S. degrees in Electronic Engineering from Tsinghua Uni- versity in 2010 and 2013, respectively. He visited Carnegie Mellon University and Microsoft Research Asia in 2012. His CONCLUSION research interests include social networks, mobile comput- In this survey, we study user behavior in OSNs ing and wireless networks. from four different perspectives: connection and YANG CHEN ([email protected]) is a postdoctoral associate interaction, traffic activity, mobile social behav- in the Department of Computer Science, Duke University. ior, and malicious behavior. We review the exist- From September 2009 to March 2011, he was a research ing representative schemes and also provide associate at the University of Goettingen, Germany. He received his B.S. and Ph.D. degrees from the Department potential future directions. We envision that this of Electronic Engineering, Tsinghua University, in 2004 and research line will enhance the user experience 2009, respectively. He visited Stanford University (in 2007) from various aspects, as well as satisfy different and Microsoft Research Asia (2006-2008) as a visiting stu- players, including the infrastructure providers, dent. His research interests include Internet architecture and protocols, cloud computing, and online/mobile social service providers, and end users. We believe that networks. further research of user behavior in OSNs will generate more interesting research problems and TIANYI WANG ([email protected]) is now pursu- exciting solutions in this area. ing his Ph.D. degree in the Department of Electronic Engi- neering, Tsinghua University, and his advisor is Prof. Xing Li. He received his B.S. degree in electronic engineering REFERENCES from Tsinghua University in 2011. His research interests include analysis of online social networks and data min- [1] C. Wilson et al., “User Interactions in Social Networks ing. and Their Implications,” Proc. EuroSys, 2009. [2] J. Jiang et al., “Understanding Latent Interactions in PAN HUI ([email protected]) received his Ph.D degree Online Social Networks,” Proc. IMC, 2010. from the Computer Laboratory, University of Cambridge, [3] H. Kwak et al., “What Is Twitter, a Social Network or a and earned his M.Phil. and B.Eng. from the Department News Media?,” Proc. WWW, 2010. of Electrical and Electronic Engineering, University of [4] M. Gjoka et al., “Practical Recommendations on Crawl- Hong Kong. He is currently a faculty member of the ing Online Social Networks,” IEEE Trans. Commun. Spe- Department of Computer Science and Engineering at the cial Issue on Measurement of Internet Topologies, vol. Hong Kong University of Science and Technology where 29, no. 9, Oct. 2011. he directs the System and Media Lab. He also serves as a [5] B. Ribeiro and D. Towsley, “Estimating and Sampling Distinguished Scientist of Telekom Innovation Laborato- Graphs with Multidimensional Random Walks,” Proc. ries (T-labs) Germany and an adjunct professor of social IMC, 2010. computing and networking at Aalto University, Finland. [6] F. Benevenuto et al., “Characterizing User Behavior in Before returning to Hong Kong, he spent several years at Online Social Networks,” Proc. IMC, 2009. T-labs and Intel Research Cambridge. He has published [7] F. Schneider et al., “Understanding Online Social Network more than 100 research papers, and has several granted Usage from a Network Perspective,” Proc. IMC, 2009. and pending European patents. He has founded and [8] M. Wittie et al., “Exploiting Locality of Interest in Online chaired several IEEE/ACM conferences/workshops, and Social Networks,” Proc. CoNext, 2010. served on the technical program committees of numer- [9] C. W. Dunn et al., “Navigation Characteristics of Online ous international conferences and workshops including Social Networks and Search Engines Users,” Proc. IEEE INFOCOM, SECON, MASS, GLOBECOM, WCNC, and WOSN, 2012. ITC. [10] N. Eagle and A. Pentland, “Social Serendipity: Mobiliz- ing Social ,” IEEE Pervasive Computing, vol. 4, ATHANASIOS V. VASILAKOS ([email protected], vasi- no. 2, 2005, pp. 28–34. [email protected]) is currently professor at Kuwait Uni- [11] L. Backstrom, E. Sun, and C. Marlow, “Find Me If You versity. He has served or is serving as an Editor for many Can: Improving Geographical Prediction with Social and technical journals, such as IEEE TNSM, IEEE TC,IEEE TSMC- Spatial Proximity,” Proc. WWW, 2010. PART B, IEEE TITB, ACM TAAS, and IEEE JSAC Special Issues [12] E. Cho, S. Myers, and J. Leskovec, “Friendship and in May 2009, and January and March 2011. He is Chairman Mobility: User Movement in Location-Based Social Net- of the Council of Computing of the European Alliances for works,” Proc. KDD, 2011. Innovation.

150 IEEE Communications Magazine • September 2013