An Introduction
Total Page:16
File Type:pdf, Size:1020Kb
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides Available at: http://dmml.asu.edu/smm Chapter 10 Behavior Analytics What motivates individuals to join an online group? When individuals abandon social media sites, where do they migrate to? Can we predict box office revenues for movies from tweets posted by individuals? These questions are a few of many whose answers require us to analyze or predict behaviors on social media. Individuals exhibit different behaviors in social media: as individuals or as part of a broader collective behavior. When discussing individual be- havior, our focus is on one individual. Collective behavior emerges when a population of individuals behave in a similar way with or without coordi- nation or planning. In this chapter we provide examples of individual and collective be- haviors and elaborate techniques used to analyze, model, and predict these behaviors. 10.1 Individual Behavior We read online news; comment on posts, blogs, and videos; write reviews for products; post; like; share; tweet; rate; recommend; listen to music; and watch videos, among many other daily behaviors that we exhibit on social media. What are the types of individual behavior that leave a trace on social media? We can generally categorize individual online behavior into three cate- gories (shown in Figure 10.1): 319 Figure 10.1: Individual Behavior. 1. User-User Behavior. This is the behavior individuals exhibit with re- spect to other individuals. For instance, when befriending someone, sending a message to another individual, playing games, following, inviting, blocking, subscribing, or chatting, we are demonstrating a user-user behavior. 2. User-Community Behavior. The target of this type of behavior is a community. For example, joining or leaving a community, becoming a fan of a community, or participating in its discussions are forms of user-community behavior. 3. User-Entity Behavior. The target of this behavior is an entity in social media. For instance, it includes writing a blogpost or review or uploading a photo to a social media site. As we know, link data and content data are frequently available on social media. Link data represents the interactions users have with other users, and content data is generated by users when using social media. One can 320 think of user-user behavior as users linking to other users and user-entity behavior as users generating and consuming content. Users interacting with communities is a blend of linking and content-generation behavior, in which one can simply join a community (linking), read or write content for a community (content consumption and generation), or can do a mix of both activities. Link analysis and link prediction are commonly used to analyze links, and text analysis is designed to analyze content. We use these techniques to analyze, model, and predict individual behavior. 10.1.1 Individual Behavior Analysis Individual behavior analysis aims to understand how different factors affect individual behaviors observed online. It aims to correlate those behaviors (or their intensity) with other measurable characteristics of users, sites, or contents that could have possibly resulted in those behaviors. First we discuss an example of behavior analysis on social media and demonstrate how this behavior can be analyzed. After that, we outline the process that can be followed to analyze any behavior on social media. Community Membership in Social Media Users often join different communities in social media; the act of becom- ing a community member is an example of user-community behavior. Why do users join communities? In other words, what factors affect the community-joining behavior of individuals? To analyze community-joining behavior, we can observe users who join communities and determine the factors that are common among them. Hence, we require a population of users U = u ; u ;:::; un , a commu- f 1 2 g nity C, and community membership information (i.e., users ui U who are members of C). The community need not be explicitly defined.2 For instance, one can think of individuals buying a product as a community, and people buying the product for the first time as individuals joining the community. To distinguish between users who have already joined the community and those who are now joining it, we need community mem- berships at two different times: t1 and t2, with t2 > t1. At t2, we determine users such as u who are currently members of the community, but were not members at t1. These new users form the subpopulation that is analyzed for community-joining behavior. 321 Figure 10.2: Probability of Joining a Community (with Error Bars) as a Function of the Number of Friends m Already in the Community (from [19]). To determine factors that affect community-joining behavior, we can de- sign hypotheses based on different factors that describe when community- joining behavior takes place. We can verify these hypotheses by using data available on social media. The factors used in the validated hypotheses describe the behavior under study most accurately. One such hypothesis is that individuals are inclined toward an activity when their friends are engaged in the same activity. Thus, if the hypothesis is valid, a factor that plays a role in users joining a community is the number of their friends who are already members of the community. In data mining terms, this translates to using the number of friends of an individual in a community as a feature to predict whether the individual joins the community (i.e., class attribute). Figure 10.2 depicts the probability of joining a community with respect to the number of friends an individual has who are already members of the community. The probability increases Diminishing as more friends are in a community,but a diminishing returns property is also Returns observed, meaning that when enough friends are inside the community, more friends have no or only marginal effects on the likelihood of the individual’s act of joining the community. 322 Figure 10.3: User Community-Joining Behavior Features (from [19]). Thus far we have defined only one feature. However, one can go beyond a single feature. Figure 10.3 lists the comprehensive features that can be used to analyze community-joining behavior. As discussed, these features may or may not affect the joining behavior; thus, a validation procedure is required to understand their effect on the joining behavior. Which one of these features is more relevant to the joining behavior? In other words, which feature can help best determine whether individuals will join or not? To answer this question, we can use any feature selection algorithm. Fea- ture selection algorithms determine features that contribute the most to the prediction of the class attribute. Alternatively, we can use a classification algorithm, such as decision tree learning, to identify the relationship be- tween features and the class attribute (i.e., joined= Yes, No ). The earlier a feature is selected in the learned tree (i.e., is closerf to the rootg of the tree), the more important to the prediction of the class attribute value. By performing decision tree learning for a large dataset of users and the features listed in Figure 10.3, one finds that not only the number of friends inside a community but also how these friends are connected to each other affect the joining probability. In particular, the denser the subgraph of friends inside a community, the higher the likelihood of a user joining the community. Let S denote the set of friends inside community C, and let ES denote the set of edges between these S friends. The maximum number of S j j S edges between these S friends is j2j . So, the edge density is φ(S) = Es= j2j . 323 Figure 10.4: Decision Tree Learned for Community-Joining Behavior (from [19]). One finds that the higher this density, the more likely that one is going to join a community. Figure 10.4 shows the first two levels of the decision tree learned for this task using features described in Figure 10.3. Higher level features are more discriminative in decision tree learning, and in our case, the most important feature is the density of edges for the friends subgraph inside the community. To analyze community-joining behavior, one can design features that are likely to be related to community joining behavior. Decision tree learning can help identify which features are more predictive than oth- ers. However, how can we evaluate if these features are designed well and whether other features are not required to accurately predict joining behavior? Since classification is used to learn the relation between features and behaviors one can always use classification evaluation metrics such as accuracy to evaluate the performance of the learned model. An accurate model translates to an accurate learning of feature-behavior association. A Behavior Analysis Methodology The analysis of community-joining behavior can be summarized via a four-step methodology for behavioral analysis. The same approach can be followed as a general guideline for analyzing other behaviors in social media. Commonly, to perform behavioral analysis, one needs the following four components: 324 1. An observable behavior. The behavior that is analyzed needs to be observable. For instance, to analyze community-joining behavior, it is necessary to be able to accurately observe the joining of individuals (and possibly their joining times). 2. Features. One needs to construct relevant data features (covariates) that may or may not affect (or be affected by) the behavior. An- thropologists and sociologists can help design these features. The intrinsic relation between these features and the behavior should be clear from the domain expert’s point of view.