Evaluating Classification Schemes for Second Screen Interactions
Total Page:16
File Type:pdf, Size:1020Kb
2015 International Conference on Computing, Networking and Communications, Social Computing and Semantic Data Mining Evaluating Classification Schemes for Second Screen Interactions Partha Mukherjee Bernard J. Jansen College of Information Science and Technology College of Information Science and Technology Pennsylvania State University Pennsylvania State University University Park, PA, USA University Park, PA, USA [email protected] [email protected] Abstract— We analyze the performance of classification quantification of information interchange within the groups. schemes on information collected from social conversation posted The quantification of information is expressed in terms of in Twitter among audiences of a popular US based TV show. In social, temporal, and second screen device features of the this research, we consider entropy as a measure of information conversation. exchange in a group conversation that is related to social, temporal, and second screen device features. The group This research is important, as degree of second screen conversations are identified by hashtags present in tweets where interaction will eventually determine the predominant group the number of members in the group interacting is at least two. conversations with high information content. This insight will We apply different classification schemes to more than 4,900 lead to the emergence of popular topics, which in turn can groups identified from 318,000 tweets. The result shows that 5-nn facilitate the personalization of TV content and advertising. algorithm outperformed the other procedures in terms of The channel owners and advertisers need to correctly identify misclassification error to identify the informed groups. the paramount groups that corroborate the show or ad-related information with higher probabilities in order to prioritize their Keywords— entropy; informed groups; misclassification; PCA; personalization efforts. Therefore, the findings assist both the second screen; social TV broadcasters and advertisers to classify the leading interacting groups with minimum error that will help them to formulate I. INTRODUCTION new strategies for TV airing, launching product ads to engage With the advent of Internet technology and the emergence more viewers, promote sales, and earn revenues. of online social networks, the social possibility of TV has greatly expanded, as the merging of technologies now allow a II. RELATED WORK number of social activities and interactions concerning TV content via social networks (e.g., Facebook, Twitter, Weibo A. Information prediction using social network etc.). The social media sites (e.g., Twitter, Facebook, Weibo) The viewers do exchange information related to the TV become increasing powerful communication medium for the show via second screen in terms of posting of tweets [1]. The users to share and interchange the feelings and information of exchange of information can happen live during show time or real world events. Becker, Namaan and Gravano [2] explored when the show is not transmitted live. The information in the the Twitter message streams to identify the real and non-real social interaction may relate to different aspects of the show time events using clustering approach over similarity of tweets. content (e.g., the actors, directors, costumes, characters, It was observed that group communication has a very high themes, etc.) or aspects of the products advertised (e.g., brand, correlation with re-tweetability of information [3]. In other sale, customer preference etc.). These attributes surrounding to research, the social network structure was used to classify and a TV show in the discussion are identified by means of the predict the popular domain of discussion or trending topics [4]. hashtags (#) before a relevant keyword or phrase in the tweets to categorize the tweets as part of the popular topics marked by B. Integration of prior work those keywords or phrases. Audiences reciprocating the tweet Though the prior literature discusses research on mentioning a particular hashtag to the commentary form a information sharing, and event prediction using social group engaing in a conversation. The group size is determined networks, there has been a limited research on information- by the number of members interact within that group. The theoretic classification of informed groups by analyzing social, greater the interaction within a group, the more informed group temporal, and second screen device features of group it will be. Highly informed groups promote diffusion of more conversations related to TV shows. Ghosh, Surachawala and information about the aspects concerning a TV show. Lerman [5] classified political campaign based retweeting In this research, we investigate the degree of information activity using time interval and user entropy as features. Yang exchange in group interactions regarding a TV show using and Counts [6] analyzed information diffusion in Twitter in second screens. The investigation involves the classification of terms of speed, scale and range using mention (‘@’) as user the groups into informed categories depending on the interaction but did not include exploration of second screen 978-1-4799-6959-3/15/$31.00 ©2015 IEEE 879 2015 International Conference on Computing, Networking and Communications, Social Computing and Semantic Data Mining device features into consideration. Loriche and Coulton [7] TABLE 1 investigate the role of social media to facilitate the second ENTROPY BASED CATEGORIZATION OF INFORMED GROUPS screen for TV, thus enabling audience interaction, but they did not examine social or temporal features of interaction. Neither Entropy Range Class Group did these prior research considered the measurement of > 0 to 1.0 0 Least Informed information during group communication via second screens. > 1.0 to 2.0 1 Lower Informed > 2.0 to 3.0 2 Informed III. RESEARCH QUESTIONS > 3.0 to 4.0 3 Better Informed In our research, we identify groups by means of hashtags > 4.0 4 Most Informed (#) that can be present anywhere in the comments posted by the TV viewers in Twitter tweets. The collection of viewers with IV. DATA COLLECTION AND RESEARCH DESIGN communications messages sharing the same hashtag forms a We select “Game of Thrones” as the TV show to evaluate group. The groups may be small or large depending on the our research questions and collect audience interactions in number of interactants within the groups. The probability of form of tweets from Twitter. Game of Thrones is a popular TV information sharing is higher if there are more members within show that is broadcast by channel HBO. The tweet collection is a group. In our study, we express the information shared within continued with a span from 5th February 2014 to 4th March a group as a function of social, temporal, and second screen 2014 and we amassed a total of 317,775 tweets. We use a PHP features. The theoretical understanding of our research is script to access the Twitter API to collect data concerning the characterized by theory of information exchange [11] that TV show by using the TV show name as the Twitter API delineates an information system as a network of query. The tweets retrieved as JSON objects are pulled into a interconnected information nodes, allowing flow of MySQL database. As Twitter is one of the most popular micro- information. The social features deal with the viewers’ patterns blogging sites, we use it as the platform for the second screen of conversation within groups and group size. The temporal based social interaction. Most micro-blogging services share features identify the volume of real time or non-real time commonalities such as: a) short text messages, b) instantaneous communication. The second screen features include the type of message delivery, and c) subscription to message updates [8]. the device people use for group communication. Information So, although we use Twitter as the platform of interaction, we sharing in active discussion via second screen by forming believe that our findings are applicable to other micro-blogging groups among viewers lead to our first research question:- applications. 1) Can group interactions using second screen concerning Once the tweets were collected, we retrieve those tweets a TV show using second screen be classified in terms of social, filtered by the presence of hashtags (#). In our research, we temporal, and second screen features? identify the potential groups by means of these hashtags. In an informed group, there should be at least two or more users We term clusters of individuals as informed groups as the interacting with each other and mentioning the same hashtag members share information about TV show in their group within tweets. A user can be a member of multiple groups. based interactions. The research question is important because From our data collection, 4973 informed groups were commercial and other organizations are increasingly keen to identified where the range of unique group members is capitalize on prediction of higher informed groups to measure between 2 to 7435. the viewing habits and reactions of the audience. After formation of the informed groups, we quantify the The first research question eventually leads to exploration expected value of the information (i.e., entropy) contained in of different classification schemes to accurately identify degree the tweets posted within a particular group. We use equation 1 of information sharing within the groups. The accuracy of to determine the total