IEEE SYSTEMS JOURNAL, VOL. 10, NO. 3, SEPTEMBER 2016 1193 An Integrated Approach of Sensing -Oriented Activities in Online Participatory Media Yunji Liang, Xingshe Zhou, Daniel Dajun Zeng, Senior Member, IEEE, Bin Guo, Member, IEEE, Xiaolong Zheng, Member, IEEE, and Zhiwen Yu, Senior Member, IEEE

Abstract—With the embracing of mobile Internet of Things, I. INTRODUCTION participatory media is becoming a new battlefield for tobacco wars. However, how to automatically collect and analyze the ITH the development of Internet of Things (IoT), vari- large-scale user-generated data that are relevant to tobacco in W ous objects are being connected and a huge amount of participatory media is still unexplored. In this paper, we pro- data are generated each day, including the sensing data from pose an integrated approach of sensing collective activities in embedded sensors and the user-generated data in online partic- tobacco-related participatory media. Meanwhile, we compare the temporal patterns, topological patterns, and collective emotion ipatory media such as social networking sites and discussion patterns among protobacco, antitobacco, and quit-tobacco groups. forums. All these human-centered or human-contributed data Based on the proposed framework, a prototype was implemented can reflect the digital footprints/activities of citizens and are to collect large-scale tobacco-related data sets from Facebook. opening a new paradigm of intelligent IoT. Various studies have This demonstrates that the proposed framework is feasible and been done on extracting urban and social intelligence from the reasonable. Our preliminary findings reveal that, first, tobacco- related content grows exponentially. In particular, the growth of collected data from participatory media, such as social relation- the protobacco group follows the exponential law, whereas there ship mining, social recommendation, location recommendation, is linear growth for the group. Second, the con- etc. In this paper, we address a different problem over intelligent nection between the tobacco control community and the tobacco IoT. We aim to automatically collect and analyze large-scale cessation community is tight, whereas the nodes in the protobacco tobacco-related data sets with the help of the unprecedented community are connected sparsely. Finally, the collective emotion in tobacco communities is negative. sensing capability of intelligent IoT. Many studies demonstrate that tobacco-related content on Index Terms—Facebook, Internet of Things (IoT), participatory participatory media is accumulating. For the protobacco com- media, public health, tobacco control, tobacco surveillance. munity, tobacco companies stand to greatly benefit from the marketing potential of social media without being at a signif- icant risk of being implicated in violating any laws [1], such as promotion on Facebook [2], protobacco video clips on YouTube [1], and mobile applications (“ishisha” and “Cigar Boss”) for tobacco promotion. In response to the “provoca- Manuscript received September 22, 2013; revised December 30, 2013; tion” from the protobacco community, many laws are enforced accepted January 22, 2014. Date of publication March 3, 2014; date of current version August 23, 2016. This work was supported in part by the National to regulate the promotion activities of the . Basic Research Program of China under Grant 2012CB316400; by the National In addition, many social media services (e.g., Facebook and Natural Science Foundation of China under Grants (71103180, 91124001, YouTube) have policies that outlaw the promotion of tobacco 71025001, 91024030, 61332005, 61373119, 61222209, 71272236); by the Beijing Natural Science Foundation under grant 4132072; and by the Doctorate products on their advertising networks. For YouTube, users are Foundation of Northwestern Polytechnical University. allowed to flag “inappropriate” videos. It is undeniable that Y. Liang is with the School of Computer Science and Technology, North- western Polytechnical University, Xi’an 710072, China, and also with the De- participatory media is becoming the new battlefield for tobacco partment of Management Information Systems, Eller College of Management, wars. University of Arizona, Tucson, AZ 85721-0108 USA (e-mail: liangyunji@ With the progress of tobacco wars on participatory media, gmail.com). X. Zhou, B. Guo, and Z. Yu are with the School of Computer Science the seamless monitoring of user interaction on tobacco-related and Technology, Northwestern Polytechnical University, Xi’an 710072, China content is vitally important to properly assess the current (e-mail: [email protected]; [email protected]; [email protected]). situation, the potential risks, and what kind of countermea- D. D. Zeng is with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, sures has to be taken. However, the following reasons pose Beijing 100190, China, and also with the Department of Management Informa- challenges in characterizing the user interaction on tobacco- tion Systems, Eller College of Management, University of Arizona, Arizona, oriented participatory media. First, large-scale tobacco-related AZ 85721-0108 USA. X. Zheng is with the State Key Laboratory of Management and Control data sets are not available. The existing studies mainly rely on for Complex Systems, Institute of Automation, Chinese Academy of Sciences user survey, and manual data extraction based on questionnaires (CASIA), Beijing 100190, China, and also with the Dongguan Research is time- and labor-consuming. Second, most existing studies Institute of CASIA, Cloud Computing Center, Chinese Academy of Sciences, Dongguan 523808, China. are case studies, and there is a lack of quantification methods. Digital Object Identifier 10.1109/JSYST.2014.2304706 Manual content analysis remains the most common method

1937-9234 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 1194 IEEE SYSTEMS JOURNAL, VOL. 10, NO. 3, SEPTEMBER 2016 used to analyze the social media content by the tobacco control Meanwhile, the connected sensing devices integrated the web community. This approach obviously will not scale when facing and sensing technologies together to extend the ecosystem the exploding social media “big data.” of smart objects and to extract intelligence and knowledge To address those questions, we propose an online tobacco leveraging the interaction of human and objects [4]. surveillance system (OTSS) to automatically collect tobacco- IoT could be applied to many application areas, such as related content data from participatory media and to investigate healthcare, urban planning, and real-world search. For health- the user interaction on tobacco-related participatory media. care, many sensors are utilized to enhance the sensing capa- Although a global tobacco surveillance system (GTSS) is devel- bility and to provide proactive services for users. Chen et al. oped and maintained to collect data on the tobacco use among [5] elaborated a knowledge-driven system to detect fine-grained adults and adolescents1, the data are manually extracted based daily activities with radio-frequency identification tags. For on questionnaires, which is time-consuming and extremely elder adults, a smart home prototype was implemented to costly. Different from a GTSS, an OTSS makes full use of overcome the barriers due to cognitive decline, particularly the data in the tobacco-related participatory media, including memory, hearing, and visual impairment [6], [7]. For urban structured data (such as social relationship) and unstructured planning, a number of studies have extracted citywide human data (such as textual data). Based on the multidimensional mobility patterns using the large-scale data from smart vehicles data in participatory media, many far-reaching studies could and mobile phones. The Real Time Rome project of the Mas- be conducted, including solutions to inhibit/boost the spread of sachusetts Institute of Technology, Cambrige, MA, USA, uses protobacco/antitobacco messages based on topology analysis, the aggregated data from buses and taxis to better understand the evaluation of tobacco policies based on textual data, and the urban dynamics in real time [8]. investigation about the marketing strategies in the protobacco community. B. Collective Activities From Participatory Media The main contributions of this paper are twofold. First, the framework of an OTSS is proposed, through which large-scale Participatory media can be also regarded as a social sensor data sets could be extracted from the tobacco-related participa- in IoT for the analysis of collective activities. For emergencies, tory media. Specifically, a prototype system is implemented to Twitter has been reported to support the real-time mining of capture the tobacco-related data sets from Facebook. To the best natural disasters such as earthquakes [9] and to uncover unusual of our knowledge, it is the first prototype system to automati- regional social activities, such as sports games, thunderstorm, cally collect large-scale tobacco-related data from participatory and fireworks festivals [10]. For social activities, the textual media. Second, we explore the collective interaction to find data in participatory media are utilized to predict political the differences among tobacco communities. Technically, based elections [11], [12] and to uncover the sentiment of the crowd on the large-scale data sets, we investigate the patterns of [13], [14], such as positive versus negative and objective versus collective activities including the temporal, topological, and subjective. Kramer et al. [15] proposed a model to analyze the textual patterns in tobacco communities. “Gross National Happiness” from Facebook textual data. In ad- The remainder of this paper is organized as follows. dition, participatory media is also applied for the public health Section II summarizes the existing work about IoT, participa- area. Through the analysis of the textual data on participatory tory media, and tobacco surveillance. In Section III, the system media, an influenza outbreak [16], [17] can be predicted. requirements for an OTSS are presented, and the framework For the tobacco control communities, the value of the of an OTSS is elaborated as well. Section IV presents how to tobacco-related data in participatory media is emerging. It is capture the tobacco-related data from Facebook. We investigate found that British American Tobacco employees were taking the temporal patterns, topological patterns, and content patterns advantage of social networking sites such as Facebook to pro- to reveal the collective activities in tobacco communities. Our mote the company’s products [2]. On YouTube, many users are preliminary findings are presented in Sections V and VI, re- exposed to protobacco videos ranging from product reviews to spectively. Section VII summarizes our findings and discusses fetish imagery to tobacco-related scenes [1]. However, more reasonable solutions for the tobacco control community. the size of the data set is very small, and manual content analy- Finally, we conclude this paper in Section VIII. sis remains the most common method used to analyze the social media content by the tobacco control community. To the best of II. RELATED WORK our knowledge, this is the first time to propose an integrated solution to automatically sense the large-scale tobacco-related A. Intelligent IoT content in the participatory media with the unprecedented sens- IoT refers to the emerging trend of augmenting physical ing capability of Intelligent IoT. More importantly, we conduct objects and devices with sensing, computing, and communi- a comprehensive analysis from two perspectives (the topology cation capabilities, connecting them to form a network and patterns and the textual patterns) with the introduction of social making use of the collective effects of the networked objects network analysis and text mining. [3]. The augmented sensing capability results from the informa- tion sharing among the connected devices, including wearable C. Tobacco Surveillance System sensors, sensor-enhanced mobile phones, and smart vehicles. For the tobacco control, the GTSS aims to enhance the 1http://www.cdc.gov/tobacco/global/gtss/. country capacity to design, implement, and evaluate tobacco LIANG et al.: APPROACH OF SENSING TOBACCO-ORIENTED ACTIVITIES IN ONLINE PARTICIPATORY MEDIA 1195 control interventions. The data in the GTSS is manually col- lected through questionnaires, which is time-consuming and extremely costly. More importantly, with the embracing of intelligent IoT, people interwave between online activities and offline activities. The digital footprints left in the participatory media are ignored in the GTSS. Our work differs from existing works in three aspects. First, from the point of view of data sources, the OTSS makes full use of the data in the tobacco-related participatory media, which benefits the automatic data collection with less time consumption and labor cost. On the other hand, the data in the participatory media are multidimensional, including structured data (such as social relationship) and unstructured data (such as textual data). The diversity of the data in the participatory media offers the chance to provide more insights for the tobacco control community. Second, different from the manual data collection based on the questionnaire, we present an integrated solution to automatically collect large-scale tobacco-related Fig. 1. System architecture of the tobacco surveillance system. data from the participatory media. Finally, manual content analysis remains the most common method used to analyze the 3) Data Analysis: Data analysis in the tobacco surveillance social media content by the tobacco control community. This system enables the automatic analysis of the large-scale social approach obviously will not scale when facing the exploding media data sets that are relevant to tobacco. As the user- social media “big data.” The OTSS enables the automatic anal- generated textual data and online interaction data are dominat- ysis of the large-scale social media data sets that are relevant to ing, data analysis should be conducted from two perspectives, tobacco. i.e., social network analysis and content mining. From the view of social network analysis, the topological patterns of tobacco III. FRAMEWORK OF ONLINE TOBACCO communities could be extracted based on user interaction ac- SURVEILLANCE SYSTEM tivities such as post like, post comment, and post sharing. Meanwhile, other issues such as information dissemination, A. System Requirements social influences, and communities could be investigated to 1) Content Filtering: A huge amount of content is generated reveal the distinct patterns in the tobacco-related communities. in the participatory media. The diversity of user-generated In addition to the topological patterns, we also conduct content data makes it difficult to extract the tobacco-related data. analysis for the user-generated textual data that are relevant Therefore, content filtering is a requisite component for the to tobacco. Based on the content analysis, we may reveal the tobacco surveillance system. Fortunately, keyword searches can main topics in the tobacco-communities; on the other hand, the be conducted on many social media sites, including Facebook collective emotion could be mined. and YouTube. Content filtering could be constructed based on a keyword searching engine. The key issue for the keyword- B. System Architecture based searching engine is how to establish a set of keywords. To address this problem, in this paper, we establish a set According to the system requirements, we propose the sys- of tobacco-related keywords for content filtering. The set of tem architecture of the tobacco surveillance system. As illus- keywords consists of tobacco brands, tobacco-related synthetic trated in Fig. 1, the system architecture consists of three layers, words, and common knowledge. More details about the set of i.e., sensing, data analysis, and application. keywords are elaborated in Section IV. Sensing, which is the basis of the tobacco surveillance sys- 2) Web Crawling: Web crawling obtains compressed data tem, aims to implement the automatic collection of tobacco- feeds from the content filtering and provides near real-time data related social media contents and collective activities. The updates for other components. A huge amount of information is diversity of the user-generated data makes the content filtering embedded in the participatory media. Web crawling should be necessary to enable the automatic collection of the large-scale capable of dealing with a heterogeneous data stream. On one data sets that are relevant to tobacco. The content filtering hand, user profiles should be accessed to characterize users. serves as data feeds for the web crawling. The web crawling On the other hand, collective activities are significantly im- automatically collects the tobacco-related data and provides portant for the social science. The user interaction relationship near real-time updates for other components. In this paper, embedded in the collective activities benefits the investigation we mainly focus on two kinds of data, i.e., the structured on user influence, information dissemination, and topology interaction relationship and the unstructured textual content. patterns. Meanwhile, group attitudes could be mined from the Data analysis is the core of the tobacco surveillance system user-generated textural data. The heterogeneity of data in the to uncover implicit facts and fundamental laws, and to predict participatory media poses a challenge for the design of web potential trends. More importantly, through the data analysis, crawling. we could get more insights about the tobacco-related collective 1196 IEEE SYSTEMS JOURNAL, VOL. 10, NO. 3, SEPTEMBER 2016 activities on the participatory media for the tobacco control communities. According to the property of large-scale data sets, the data analysis could be conducted from two perspectives, i.e., social network analysis for the structured interaction data and text mining for the unstructured textual content. Application integrates a variety of programs with the sup- port of data analysis. Visualization provides a methodology to present the findings visually. Furthermore, more programs could be developed to find insights for the tobacco control communities. For example, to evaluate the tobacco policies, the textual data that are related to the tobacco policies could be collected, and the content analysis, such as the sentimental analysis, could be conducted to find the collective attitudes toward the policies. Fig. 2. Flow diagram of data collection from Facebook fan pages. OSGi Framework. Scalability is vitally important for the system due to heterogeneous data and diverse components. To address this issue, the OSGi framework is introduced to implement the prototype system. The OSGi framework is an outstanding service-oriented framework, which supports the module management and benefits the loose coupling of mod- ules [18]. Furthermore, it supports a hot plug, which means that the services could be installed or stopped without restarting the server. The OSGi framework guarantees the scalability and weightlessness of the architecture. The advantages of the OSGi framework make it reasonable for the construction of the prototype system.

IV. COLLECTION OF TOBACCO-RELATED DATA Fig. 3. Tobacco brands on Facebook.

In this paper, we present how to extract tobacco-related Due to the diversity of tobacco brands, we integrate tobacco content from Facebook. Facebook is an open platform, where brands from sources, including the official tobacco brand list2, Facebook users generate posts and interact with Facebook users tobacco-brand-related Wikipedia web pages3, tobacco review through liking, commenting, and sharing, which are the most websites4, and online tobacco shops.5 The integration of to- common interaction activities on Facebook. A Facebook fan bacco brands from heterogeneous sources facilitates the cov- page is a public profile that enables users to share their business erage of tobacco brands. Based on the integration of tobacco and products with Facebook users [2]. Meanwhile, a keyword brands, we selected the top 70 brands according to the fre- searching engine is provided, through which we could search quency of tobacco brands. Among the given 70 cigarette brands, fan pages that are related with the searching requests. 112 Facebook fan pages have been created that are named after The data collection on Facebook consists of two steps, i.e., 44 cigarette brands (See Fig. 3). offline data preparation and online data collection. The offline For the synthetic keywords, we synthesize many tobacco- data preparation aims to find tobacco-oriented fan pages using related words with tobacco and different roots (as shown in Facebook application programming interfaces (APIs) and clas- Table I). The synthetic keywords not only contain tobacco- sify the fan pages according to fan page profiles. In addition related products, including “,” “cigar,” “snuff,” “hookah,” to the basic information of fan pages such as page likes and and “pipe smoking,” but also cover keywords for tobacco con- post volume, the online data collection gathers the interaction trol and cessation. Meanwhile, some famous tobacco-related records on fan pages, including post likes and post comments fan pages on Facebook, such as “quitnet,” “BecomeAnEX,” and as well. The workflow of the data collection is illustrated “VchangeU,” are included as well. All those three sources con- in Fig. 2. sist of the tobacco-related keywords. Based on this keyword list, we automatically conduct the keyword searching on Facebook A. Offline Data Preparation through Facebook APIs. Through the keyword searches, the fan pages that are related The offline data preparation dedicates to find the tobacco- with those keywords are retrieved. Then, we check if those oriented fan pages on Facebook. For Facebook, a keyword retrieved fan pages are truly related with tobacco. All fan pages searching engine is available. Therefore, we first need to estab- lish a set of tobacco-related keywords. To maximize the cov- 2 erage of the keyword list, we manually construct the keyword http://www.revenue.ne.gov/cig/cig_alpha.html. 3http://en.wikipedia.org/wiki//List_of_cigarette_brands. list from three sources, i.e., tobacco brands, synthetic tobacco- 4http://www.cigreviews.com/find-by-brand. related terms, and existing knowledge. 5http://www.orixo.com/cigarettes/. LIANG et al.: APPROACH OF SENSING TOBACCO-ORIENTED ACTIVITIES IN ONLINE PARTICIPATORY MEDIA 1197

TABLE I protobacco fan pages succeed in attracting users to interact on RULES FOR THE SYNTHETIC TOBACCO-RELATED KEYWORDS pages and that many Facebook users are exposed to protobacco content. More importantly, the structured data embedded in the fan pages imply the user interaction relationship. With regard to the online interaction in fan pages, we mainly focus on two kinds of online activities, i.e., post likes and comments. As shown in Fig. 4, user B interacts with user A when user B likes or comments the posts issued by user A. To obtain user interaction records, we collected the post like list and the comment list through Facebook APIs. According to the user interaction records, we construct an undirect weighted inter- action graph G = {V,E}, where V = {v1,v2,...,vi,...,vn} indicates the users who interact with posts, and E = {e1,e2, ...,ei,...,em} is the set of edges that connect vi and vj with TABLE II OVERVIEW OF THE THREE GROUPS OF TOBACCO-ORIENTED FAN PAGES interaction frequency wij .

V. S TRUCTURE ANALYSIS FROM STRUCTURED DATA A. Evolution of Tobacco-Related Content on Participatory Media are manually classified into four types according to the page We disclose the evolution of post volume on Facebook in profiles or content of fan pages. Specifically, we classified three groups, respectively. As shown in Fig. 5, the tobacco- the retrieved results into four types by two coders manually, related content on Facebook has been explosively increasing i.e., 0: unrelated to tobacco; 1: tobacco promotion; 2: tobacco since 2008. In general, the year 2011 witnessed the break- control; and 3: tobacco cessation. Pearson correlation coeffi- through of three groups with more than 1000 posts per month. cients (PCCs) were measured based on the classification results The growth of tobacco-related content follows the exponential of the two coders. The coding had a high level of reliability distribution, which demonstrates that tobacco-related content (PCCs ρ =0.9646). A third coder coded the fan pages, for has been widely spread on the social media and may be ac- which there was no agreement between the first two coders. cessed by a huge number of Facebook users. The prevalence of If the third coder disagreed with each of the first two coders, tobacco-related content indicates that the importance of online those fan pages are excluded. Finally, the tobacco-related fan social marketing for tobacco-related products and services is pages are regarded as the initial data for the following online emerging. data collection. In total, we got 2149 tobacco-related fan pages On the other hand, the post volume for tobacco promotion (708 for tobacco promotion, 684 for tobacco control, and 757 outperforms the other two groups with a higher growth rate. for tobacco cessation). Specifically, 557 of the protobacco fan As illustrated in Fig. 5, the growth of the protobacco and ∗ pages are about tobacco brands (such as New Port, Camel, quit-tobacco groups follows exponential distribution y(t)=a b(t−t ) Black Devil, etc.), tobacco-related products, such as smoking e 0 , whereas for the tobacco control group, it follows linear ∗ − pipe and tobacco promotion information (such as duty-free distribution y(t)=a (t t0)+b. In particular, the explosive and cheap online tobacco shops), and protest for growth happened in July 2011 with the rapid expansion of the tobacco control. One hundred fifty-one of the protobacco fan gap between the protobacco and antitobacco groups. The gap pages are concerned about electronic cigarettes. between the protobacco and tobacco control groups indicates that tobacco control is losing ground to tobacco promotion in B. Online Data Collection the participatory media and that we are faced with a tremendous challenge for tobacco control. Therefore, more effort should be A huge amount of information is embedded in fan pages. made for tobacco control online. The online data collection captures the following three types of data: 1) fan page features, including page like, post volume, B. Topological Patterns of Tobacco Communities comment volume, and post like; 2) unstructured data, i.e., the user-generated text, including post creator, create time, posts, We analyze the interaction among the tobacco communities. and comments, where create time is utilized for the analysis of If fan page A likes, posts, or comments on fan page B, an the temporal patterns, and posts indicates the topics of posts; interaction between the fan pages occurs. Based on the inter- and 3) structured data, i.e., the social relationship embedded in action among fan pages, we construct the interaction network, the online interaction activities. as shown in Fig. 6, where each node represents a fan page. The We summarize the fan page features in Table II. It is obvious nodes in red are affiliated with the protobacco community; the that the fan pages in the protobacco group overwhelm the other nodes in green, with tobacco control; and the nodes in yellow, two groups in page like and post like. This indicates that the with tobacco cessation. 1198 IEEE SYSTEMS JOURNAL, VOL. 10, NO. 3, SEPTEMBER 2016

Fig. 4. Extraction of the interaction matrix based on interaction records. As illustrated in Fig. 6, the fan pages for tobacco control and tobacco cessation are highly connected. On the contrary, the protobacco fan pages are less connected. The tight connection between the antitobacco and quit-tobacco fan pages benefits the information sharing and facilitates the promotion of cessation services and tobacco control campaign. For example, the node marked with Q22 shared posts from BecomeAnEX to encourage people to quit smoking with the assistance of Quit Tea, which is a blend of herbs and spices that helps replace the habit of smoking with drinking herbal tea.6 The sparse connection between protobacco fan pages may result from the competition in the tobacco industry and the common benefits of fan pages. In the set of keywords, many tobacco brands, such as “Camel,” “Lucky Strike,” and “Pall Mall” are introduced. We found that most nodes named after tobacco brands do not connect with Fig. 5. Temporal patterns of post volume for the three tobacco-oriented other tobacco brands, which may result from the competi- groups. tion among tobacco rivals. On the other hand, the connection among protobacco nodes may rely on the common benefits. For example, the neighbors of node P3 (standing for Fantasia Hookah) are linked with tobacco seed P5, hookah accessories and hookah pipes P4, and distribution broker P2 connecting to local vendors P25.

VI. CONTENT ANALYSIS FROM TEXTUAL DATA A. Collective Topics in Tobacco Communities In order to find the distribution of topics in the tobacco- oriented groups, we compare the topics between the protobacco and antitobacco groups. In this paper, the posts from each group are extracted as a corpus, respectively. Then, a topic model is employed to each corpus to reveal the topic distribution for each group. In machine learning and natural-language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. We utilize latent Dirichlet allocation [19] to extract the keywords for each group. Fig. 6. Interaction network among the tobacco communities. (Note: Isolated nodes are not presented). 6https://www.facebook.com/QuitTea/info. LIANG et al.: APPROACH OF SENSING TOBACCO-ORIENTED ACTIVITIES IN ONLINE PARTICIPATORY MEDIA 1199

Fig. 7. Comparison of topics between the protobacco group and the antitobacco group. The topic distributions of the protobacco and antitobacco groups are shown in Fig. 7. The position of keywords in Fig. 7 represents the rate of word frequencies in two corpuses, as shown in (1), where fp(wj ) and fc(wj ) indicate the word frequencies of keyword wj in the protobacco corpus and the antitobacco corpus, respectively. The font size indicates the word frequency of keyword wj in the dominating corpus. max (f (w ),f (w )) Position = p j c j . (1) min (fp(wj ),fc(wj ))

As shown in Fig. 7, in the tobacco promotion corpus, many tobacco-related products, such as hookah, snuff, cigar, pipe, rolling, and ecigarette, are presented. Meanwhile, many to- bacco brands, such as del, kasbah, and shisha, are uncovered as well. Furthermore, many keywords, including shipping, order, deal, stock, discount, sale, coupon, and price, are related with business activities. What is worse is that a large number of Fig. 8. Sentiment analysis of the textual content in the tobacco-related positive words, such as delicious, lucky, favorite, and sexy, communities. are used in the protobacco corpus. According to the keywords extracted from the protobacco corpus, we could conclude that B. Collective Emotion in Tobacco Communities many tobacco-derived products, including cigar, hookah, and In addition to the topics expressed in the posts, we also study e-cigarette, are widely discussed in the social media. On the collective emotions through the sentiment analysis according other hand, the frequent emergence of business-related key- to the method introduced in [20]. We processed the text of each words may shed light on the undergoing online promotion post using SentiWordNet [21], which is a lexical resource for activities for tobacco. the extraction of emotional content from a short text. We use For the antitobacco corpus, it is obvious that it focuses SentiWordNet’s output to classify each post as positivity (ep > more on being tobacco free and on tobacco quitting. According 0), objectivity (ep =0), and negativity (ep < 0). For each fan to cancer, lung, and die, intuitively, we would realize that page, we aggregate the emotions expressed in the set of posts many of the health issues resulting from are of a fan page f by calculating the ratios of positive posts Pf = | | | | emphasized to raise public awareness for tobacco control. In ( p∈f ep > 0)/ f , negative posts Nf =( p∈f ep < 0)/ f , | | addition, many words in law, including legislation, regulation, and neutral posts Uf =( p∈f ep =0)/ f . These three values vote, and ban, are involved in the antitobacco corpus, which Pf , Nf , Uf allow us to map each fan page in a ternary plot. We implies that more regulations are discussed for tobacco control. plot each fan page as a point in the ternary plot, with a distance In particular, tobacco control among the youth is a hot issue. to the vertices that is inversely proportional to the three origin Obviously, many issues, ranging from the bad health effects of points. tobacco, the legislation for tobacco control, to tobacco issues As shown in Fig. 8, the emotion in the tobacco-related textual for the youth, are involved in the antitobacco corpus. content is negative both in the protobacco community and the 1200 IEEE SYSTEMS JOURNAL, VOL. 10, NO. 3, SEPTEMBER 2016 antitobacco community. For the whole corpus, we calculate the and Lucky Strike, on Facebook. In their study, there were 44 dominating emotion of fan pages with the maximum among Facebook fan pages related with Lucky Strike, with 28 309 fans. Pf , Nf , and Uf . In the experimental results, 82.98% of fan For Dunhill, it was six pages with 1903 fans. Compared with pages are negative, which demonstrates that the emotion in the their findings, there are 116 765 fans on two pages for Lucky tobacco communities is negative. Strike and 79 215 fans on five pages for Dunhill, respectively. Although the Facebook fan pages could be deleted by their VII. EXPERIMENTAL FINDINGS AND DISCUSSION creators or administrators, the volume of fans who like those two tobacco brands grew rapidly. A. Preliminary Findings Furthermore, the volume of protobacco content in the partic- The OTSS benefits the automatic collection and analysis of ipatory media grows explosively. Many social media services the large-scale tobacco-related data sets from the online partic- have policies that outlaw the promotion of tobacco products on ipatory media. In this section, we summarize our preliminary their advertising networks. However, those policies do not work findings based on the analysis of structured and unstructured well as they require seamless monitoring manually. The lack data in the large-scale data sets. of consistent regulation about tobacco promotion, directly or a) Prevalence of Tobacco Brands: In the set of tobacco- indirectly, on social media means that protobacco information related keywords, the top 70 tobacco brands are selected is likely to be accessed and shared by anyone, anywhere, no from four heterogeneous sources, including the official tobacco matter what age. To address the aforementioned questions, brand list, Wikipedia web pages, tobacco review websites, first, new regulations of tobacco control on social media are and online tobacco shops. Among the 70 tobacco brands, necessitated; on the other hand, new technologies or mecha- 44 cigarette brands (62.86%) have created 112 fan gages on nisms should be introduced for the tobacco surveillance on the Facebook, with 430 909 page likes. The prevalence of tobacco participatory media. brands on Facebook demonstrates that the commercial potential From a legal perspective, we are still at a preliminary stage of participatory media is utilized by the tobacco industry for for tobacco control on participatory media. Although there is tobacco promotion. a tobacco advertising ban for participatory media, it mainly b) Explosive Growth of Tobacco-Related Content: The focuses on the advertising that appears as a click-through adver- volume of tobacco-related content in Facebook has followed an tisement that is displayed on the sidebar of website pages. With exponential growth since 2008. In particular, the growth of the the explosive growth of user-generated protobacco content, protobacco group follows the exponential law, whereas it is the we should pay more attention to the content with potential linear growth for the tobacco control group. The widening gap effects for tobacco promotion, as prosmoking content online, between the protobacco group and the tobacco control group regardless of whether it is commercial or personal in origin, demonstrates that tobacco control is losing ground to tobacco could equally influence users [22]. promotion in the participatory media. We are faced with a From a technical view, the OTSS is necessitated for the formidable challenge for tobacco control on the participatory automatic collection and analysis of content that is relevant to media. tobacco. Technically, the progress of text mining will benefit the c) Connection Gap Among Tobacco Communities: From uncovering of the potential protobacco content that is generated the perspective of network analysis, the nodes in the tobacco by users. However, the user interaction patterns and the influ- control community and the tobacco cessation community are ence of protobacco fan pages are still unexplored. To explore highly connected, whereas the connection in the protobacco the user interaction in the protobacco communities, we could community is sparse. The connection gap among the tobacco analyze the dynamics as a complex network [23]; to evaluate communities may be a result of the competition among the the influence of protobacco content on social media, we aim tobacco rivals and common benefits. to measure the social influence in the process of information d) Negative Emotion in Tobacco Communities: From the dissemination [24]. On one hand, we may provide quantified perspective of content analysis, the topics in the protobacco proofs for the regulations or laws against tobacco promotion community cover many tobacco-related products (hookah, on social media; on the other hand, new methodologies could cigar, and ecigarette), tobacco brands (del, kasbah, and shisha), be established to minimize the social influence and delay the and business-related keywords, such as coupon, discount, and information dissemination process for tobacco promotion. stock. On the other hand, the collective emotion in the tobacco communities is negative. VIII. CONCLUSION Tobacco wars are heating up in participatory media. An B. Discussion OTSS is necessitated to collect user interaction records and an- Our preliminary results demonstrate the prevalence of to- alyze user attitudes toward tobacco policies and user influence, bacco brands on Facebook. Although it is difficult to quantify etc. In this paper, we have proposed an integrated approach to how the protobacco data impact users’ attitude toward tobacco sensing the collective activities in tobacco-related participatory products, we are informed that the presence of tobacco brands media. Through the proposed approach, we can capture the on social media is recognized by the tobacco industries or large-scale data sets of collective activities toward tobacco- protobacco agencies due to the online word of mouth. Freeman related content. Meanwhile, we investigate the temporal pat- and Chapman [2] investigated two tobacco brands, i.e., Dunhill terns of tobacco content and reveal the topic patterns in tobacco LIANG et al.: APPROACH OF SENSING TOBACCO-ORIENTED ACTIVITIES IN ONLINE PARTICIPATORY MEDIA 1201

communities. In addition, the preliminary experimental re- [20] D. Garcia, F. Mendez, U. Serdlt, and F. Schweitzer, “Political polarization sults demonstrate that the tobacco-related content exponentially and popularity in online participatory media: An integrated approach,” in Proc. 1st Ed. workshop Politics, Elections Data, Maui, HI, USA, 2012, grows and that the collective emotion in tobacco communities pp. 3–10. is negative. Our results, on one hand, demonstrate the tobacco [21] A. Esuli and F. Sebastiani, “Sentiwordnet: A publicly available lexical brand prevalence on social media and that protobacco content is resource for opinion mining,” in Proc. 5th Conf. Lang. Res. Eval., Genoa, , 2006, pp. 417–422. exponentially growing with a huge number of people involved. [22] B. Freeman, “New media and tobacco control,” Tob. Control, vol. 21, On the other hand, our analysis about topological patterns and no. 2, pp. 139–144, Mar. 2012. interaction patterns benefits the methodologies to retard the [23] X. Zheng, Y. Zhong, D. Zeng, and F. Wang, “Social influence and spread dynamics in social networks,” Front Comput. Sci., vol. 6, no. 5, pp. 611– protobacco activities and boost the antitobacco campaigns. The 620, Oct. 2012. proposed system could be applied to monitor the development [24] D. Zeng, H. Chen, R. Lusch, and S. Li, “Social media analytics and of tobacco-related “ecosystems” and provide proofs to evaluate intelligence,” IEEE Intell. Syst., vol. 25, no. 6, pp. 13–16, Nov./Dec. 2010. the efficiency of tobacco control policies.

REFERENCES [1] L. Elkin, G. Thomson, and N. Wilson, “Connecting world youth with tobacco brands: YouTube and the Internet policy vacuum on Web 2.0,” Tob. Control, vol. 19, no. 5, pp. 361–366, Oct. 2010. Yunji Liang received his B.S. and M.S. degrees in [2] B. Freeman and S. Chapman, “British American Tobacco on Facebook: computer science and technology from Northwestern Undermining article 13 of the global World Health Organization Frame- Polytechnical University, Xi’an, China. He is cur- work Convention on tobacco control,” Tob. Control, vol. 19, no. 3, pp. e1– rently working toward the Ph.D. degree in the School e9, Jun. 2010. of Computer Science and Technology, Northwestern [3] B. Guo, D. Zhang, Z. Yu, Y. Liang, Z. Wang, and X. Zhou, “From the Polytechnical University, Xi’an, China. Internet of things to embedded intelligence,” World Wide Web, vol. 16, Since 2012, he has been working as a Visiting no. 4, pp. 399–420, Jul. 2013. Scholar with the Department of Management Infor- [4] D. Guinard and T. Vlad, “Towards the web of things: Web mashups for mation Systems, Eller College of Management, Uni- embedded devices,” in Proc. 18th Int. Conf. World Wide Web,Madrid, versity of Arizona, Tucson, AZ, USA. His current , 2009, pp. 1–8. research interests include pervasive computing, so- [5] L. Chen, J. Hoey, C. Nugent, D. Cook, and Z. Yu, “Sensor-based activity cial computing, and intelligent information system. recognition,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 42, no. 6, pp. 790–808, Nov. 2012. [6] Y. Liang, X. Zhou, Z. Yu, H. Wang, and B. Guo, “A context-aware multimedia service scheduling framework in smart homes,” EURASIP J. Wireless Commun. Netw., vol. 67, pp. 1–15, 2012. [7] L. Tang, X. Zhou, Z. Yu, Y. Liang, D. Zhang, and H. Ni, “MHS: A multimedia system for improving medication adherence in elderly care,” IEEE Syst. J., vol. 5, no. 4, pp. 506–517, Dec. 2011. [8] F. Calabrese, M. Colonna, P. Lovisolo, D. Parata, and C. Ratti, “Real-time Xingshe Zhou received the M.E. degree in computer urban monitoring using cell phones: A case study in Rome,” IEEE Trans. science and technology from Northwestern Poly- Intell. Transp. Syst., vol. 12, no. 1, pp. 141–151, Mar. 2011. technical University, Xi’an, China, in 1983. [9] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes Twitter users: He is currently a Professor with the School of Real-time event detection by social sensors,” in Proc. 19th Int. Conf. Computer Science and Technology, Northwestern World Wide Web, Raleigh, NC, USA, 2010, pp. 851–860. Polytechnical University. He is the author of more [10] R. Lee, S. Wakamiya, and K. Sumiya, “Discovery of unusual regional than 100 papers in his areas of interest, including social activities using geo-tagged microblogs,” World Wide Web, vol. 14, embedded computing, distributed computing, and no. 4, pp. 321–349, Jul. 2011. pervasive computing. [11] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Predicting Mr. Zhou is currently an Executive Director of the Elections with Twitter: What 140 Characters Reveal about Political Sen- China Computer Federation and a Program Commit- timent,” in Proc. 4th Int. AAAI Conf. Weblogs Social Media, Washington, tee Member of the National Natural Science Foundation of China. DC, USA, 2010, pp. 178–185. [12] A. O. Larsson and H. Moe, “Studying political microblogging: Twitter users in the 2010 Swedish election campaign,” New Media Soc., vol. 14, no. 5, pp. 729–747, Aug. 2012. [13] O. Kucuktunc, B. B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu, “A large-scale sentiment analysis for Yahoo! Answers,” in Proc. 5th ACM Int. Conf. Web Search Data Mining, Seattle, WA, USA, 2012, pp. 633–642. [14] G. Paltoglou and M. Thelwall, “Twitter, MySpace, Digg: Unsupervised Daniel Dajun Zeng (M’04–SM’09) received the sentiment analysis in social media,” ACM Trans. Intell. Syst. Technol., B.S. degree in economics and operations research vol. 3, no. 4, p. 66, Sep. 2012. from the University of Science and Technology of [15] A. D. I. Kramer, “An unobtrusive behavioral model of ‘Gross National China, Hefei, China, and the M.S. and Ph.D. degrees Happiness’,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., in industrial administration from Carnegie Mellon Atlanta, GA, USA, 2010, pp. 287–290. University, Pittsburgh, PA, USA. [16] B. H. Scott, T. W. Kesler, G. G. Christophe, W. H. Joshua, and He is a Professor with the Department of Man- B. D. Michae, “Right time, right place health communication on twit- agement Information Systems, Eller College of Man- ter: Value and accuracy of location information,” J. Med. Internet Res., agement, University of Arizona, Tucson, AZ, USA, vol. 14, no. 6, p. e156, Nov. 2012. and with the State Key Laboratory of Intelligent [17] V. Lampos and N. Cristianini, “Nowcasting events from the social web Control and Management for Complex Systems, In- with statistical learning,” ACM Trans. Intell. Syst. Technol., vol. 3, no. 4, stitute of Automation, Chinese Academy of Sciences, Beijing, China. His pp. 72:1–72:22, Sep. 2012. research interests include multiagent systems, collaborative information and [18] C. Lee, D. Nordstedt, and S. Helal, “Enabling smart spaces with OSGi,” knowledge management, recommender systems, and automated negotiation IEEE Pervasive Comput., vol. 2, no. 3, pp. 89–95, Jul.–Sep. 2003. and auction. Recently, he has been interested in tobacco control, infectious [19] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” disease informatics, social media analytics and intelligence, information and J. Mach. Learn. Res., vol. 3, no. 4/5, pp. 993–1022, May 2003. security informatics, and emergency management. 1202 IEEE SYSTEMS JOURNAL, VOL. 10, NO. 3, SEPTEMBER 2016

Bin Guo (M’07) received the Ph.D. degree in com- Zhiwen Yu (M’06–SM’11) received the Ph.D. puter science from Keio University, Tokyo, , degree in computer science and technology from in 2009. Northwestern Polytechnical University, Xi’an, During 2009–2011, he was a Postdoctoral Re- China, in 2005. searcher with Institut Telecom SudParis, Évry, He is a Professor with and the Vice-Dean of . He is currently an Associate Professor with the School of Computer Science and Technology, the School of Computer Science and Technology, Northwestern Polytechnical University. He was a Re- Northwestern Polytechnical University, Xi’an, China. search Fellow with Kyoto University, Kyoto, Japan, His research interests include ubiquitous comput- from February 2007 to January 2009; a Postdoctoral ing, mobile crowd sensing, and human–computer Researcher with Nagoya University, Nagoya, Japan, interaction. in 2006-2007; and a Visiting Researcher with the Dr. Guo has served as an Editor for the IEEE Communications Magazine and Institute for Infocomm Research (I2R), Singapore, in 2004-2005. He was an Personal and Ubiquitous Computing, and as a Guest Editor for the Association Alexander von Humboldt Fellow with the University of Mannheim, Mannheim, for Computing Machinery TRANSACTIONS ON INTELLIGENT SYSTEMS AND , from November 2009 to October 2010. He has published more TECHNOLOGY,andtheJournal of Network and Computer Applications.Heis than 100 scientific papers in refereed journals and conferences, e.g., Associ- the recipient of the Best Paper Award of the International Conference on Active ation for Computing Machinery (ACM) Computing Surveys, IEEE Pervasive Media Technology 2012, the International Conference on Grid and Pervasive Computing, the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGI- Computing 2012, and the IEEE International Conference on Cyber, Physical NEERING, ACM International Joint Conference on Pervasive and Ubiquitous and Social Computing 2013. Computing (UbiComp), and the IEEE International Conference on Pervasive Computing and Communications (PerCom). His research interests include pervasive computing, context-aware systems, human-computer interaction, and Xiaolong Zheng (M’12) received his B.S. degree personalization. in Mechanical Manufacturing and Automation from China Jiliang University, Hangzhou, China, the M.S. degree in Traffic Information Engineering & Con- trol from Beijing Jiaotong University, Beijing, China and Ph.D. degree in Control Theory and Control Engineering from Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is an Associate Professor with the State Key Laboratory of Intelligent Control and Management for Complex Systems, Institute of Automation, Chi- nese Academy of Sciences (CASIA), Beijing, China. He is also with the Dongguan Research Institute of CASIA, Cloud Computing Center, Chinese Academy of Sciences, Dongguan, China. His research interests include social media analytics and intelligence, social computing, and data mining. Specif- ically, he is interested in social influence and spread dynamics, behavioral finance and network economics, tobacco control, infectious disease informatics, and emergency management.