Public Opinion Spamming:A Model for Content and Users on Sina Weibo
Total Page:16
File Type:pdf, Size:1020Kb
Session: The Reality of Social Media WebSci’18, May 27-30, 2018, Amsterdam, Netherlands Public Opinion Spamming: A Model for Content and Users on Sina Weibo Ziyu Guo Liqiang Wang∗ Yafang Wang† Shandong University Shandong University Shandong University Jinan, China Jinan, China Jinan, China Guohua Zeng Shijun Liu Gerard de Melo Chinese Academy of Social Sciences Shandong University Rutgers University – New Brunswick Beijing, China Jinan, China USA ABSTRACT 1 INTRODUCTION Microblogs serve hundreds of millions of active users, but have Social network services provide platforms for massive information also attracted large numbers of spammers. While traditional spam dissemination and sharing between hundreds of millions of users. often seeks to endorse specific products or services, nowadays there Unfortunately, they also have led to new opportunities for malicious are increasingly also paid posters intent on promoting particular users. This is particularly true of the most well-known Chinese views on hot topics and influencing public opinion. In this work, microblogging platform Sina Weibo, which reportedly has a larger we fill an important research gap by studying how to detect such base of daily active users than Twitter. Hot, trending topics on this opinion spammers and their micro-manipulation of public opinion. platform attract remarkable public interest and have substantial Our model is unsupervised and adopts a Bayesian framework to significance for business and society. As a result, it has attracted distinguish spammers from other classes of users. Experiments spammers with malicious intent. Widespread spamming threatens on a Sina Weibo hot topic dataset demonstrate the effectiveness the quality and credibility of the user-generated content on social of the proposed approach. A further diachronic analysis of the media platforms, and erodes the publicness of these platforms. Thus, collected data demonstrates that public opinion spammers have it is important to develop techniques to detect such spammers and developed sophisticated techniques and have seen success in subtly to examine their impact on the formation of public opinion. manipulating the public sentiment. User Feature User Analytics SOC VIP users feature CCS CONCEPTS ROF common users sentiment spammers DBM reaction • Information systems → Spam detection; • Human-centered del fans more... Microblogs mo computing → Social media; • Applied computing → Sociol- Feature (b) ogy; ROM Spamicity MTV CKW hot topics KEYWORDS Replies related users Feature Opinion Spam, Public Opinion, User Classification DUP microblogs user feature NUR (a) replies microblogs ACM Reference Format: NMR Ziyu Guo, Liqiang Wang, Yafang Wang, Guohua Zeng, Shijun Liu, and Ger- ard de Melo. 2018. Public Opinion Spamming: A Model for Content and Figure 1: Data capture and model. Users on Sina Weibo. In WebSci ’18: WebSci ’18 10th ACM Conference on The publicness of social media platforms has long been a major Web Science, May 27–30, 2018, Amsterdam, Netherlands. ACM, New York, concern in academia. Early research highlights the political and NY, USA, 5 pages. https://doi.org/10.1145/3201064.3201104 social polarization [13, 18], and the impact of the underlying al- gorithms [14]. More recent research focuses on the mechanisms ∗The first two authors contributed equally. and impact of “fake news” created and circulated on social media †Corresponding author: [email protected] [1, 17]. However, how opinion spammers subtly steer and manipu- late the public opinion on such platforms and what impact these micro-techniques may exert on society remains underexplored. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed There has been ample research on detecting spammers, including for profit or commercial advantage and that copies bear this notice and the full citation specifically for Sina Weibo [2, 3, 9, 15], as well as several supervised on the first page. Copyrights for components of this work owned by others than ACM learning methods [5, 12, 16] to detect instances of opinion spam. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a However, such models are largely based on a dichotomy of fake vs. fee. Request permissions from [email protected]. non-fake labels. Unsupervised methods have as well been proposed WebSci ’18, May 27–30, 2018, Amsterdam, Netherlands [4, 6, 8, 10, 11, 20]. Despite this progress, such previous work has © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5563-6/18/05...$15.00 focused on identifying spammers seeking to place ads for products https://doi.org/10.1145/3201064.3201104 or services, as well as detecting imposters, extremists and the like. 210 Session: The Reality of Social Media WebSci’18, May 27-30, 2018, Amsterdam, Netherlands WebSci ’18, May 27–30, 2018, Amsterdam, Netherlands Z. Guo, L. Wang, Y. Wang, G. Zeng, S. Liu and G. de Melo This paper instead focuses on the unique problem of public opin- divorced#” (in Mandarin) and related ones frequently emerged in ion spamming, i.e. identifying spammers that seek to influence the Top Topics lists. public opinion on hot topics. As we will explain in further detail, Thus, we collected pertinent data from Sina Weibo from 14 Au- such actors operate quite differently both from traditional ad-like gust to 15 December 2016, a period when this event attracted mas- product promotion spammers [7, 19] as well as from the kind of sive public attention. Sina Weibo contains Voting posts, as well as opinion spammers that post fake product reviews. We propose a Topic posts, in which certain keywords are marked with ##. We novel and principled model to detect public opinion spamming. The first crawled the 440 most popular microblog postings about the model is an unsupervised one that does not require labeled train- hot topic #Wang Baoqiang divorced# as seeds, as well as replies to ing data and overcomes the limitations of existing work discussed them. We then retrieved all relevant users for these comments and above. We adopt a fully Bayesian modeling approach. This setting replies. From their home pages, we then crawled their postings in allows us to model the spamicity of users as latent, while treating the same time window. After data cleaning, we chose 2,000 users other observed behavioral features as known. The spamicity here for our experiment, with data from December 2016. The posting refers to the degree to which the exhibited behavior can be regarded features are computed for a user’s postings posted from August as public opinion spamming. Our key ideas hinge on the hypothesis to December 2016. In May 2017, we re-crawled the data again to that opinion spammers differ from others on behavioral dimen- check if these users were banned or the topic-related postings were sions. This creates a separation margin between the population deleted. Finally, in August 2017, we re-crawled the data once again distributions of three naturally occurring groups: spammers, fans, to determine whether any such ban had been lifted. and regular users. The inference procedure enables us to learn the 3 MODEL distributions of these groups by means of the behavioral features. Figure 1(b) illustrates our approach. Based on pertinent social 3.1 Observed Features media data, we extract features from user profiles, postings, and Users participating in a hot topic are categorized into four different replies under hot topics. For user classification, we identify VIP subsets: regular users, fans (i.e., enthusiastic devotees or admirers users based on the official certification on the platform, while other of one of the parties), spammers (i.e., paid posters specifically seek- features are used to train our model to assess a user’s degree of ing to sway public opinion), and VIP users (i.e., those verified by spamicity and categorize them with respect to the three remaining Sina Weibo). In the following, we propose some characteristics of clusters. Subsequently, we proceed to explore how and to what abnormal behavior that may prove useful as observed features in extent spammers shape the public opinion on specific topics. our model to learn to distinguish these clusters of users. In summary, this paper makes the following contributions: User Reply Features: Replies here refer to a user’s responses (1) It proposes a novel and principled method to exploit ob- to hot topic postings. Spammers often post multiple replies that are served behavioral footprints to classify users and detect pub- duplicate or near-duplicate versions of previous replies or replies of lic opinion spammers in an unsupervised Bayesian frame- others on the same topic (DUP). The number of user replies (NUR) work, without the need for laborious manual labeling, which and number of postings that a user responded to (NMR) are two is both time-consuming and error-prone. Unlike existing important features to detect spammers, due to the more limited work, this allows both detection and analysis to occur in a time and effort spent online by regular users. single framework, providing deeper insights into the data. Posting Features: Posting features are based on all postings (2) We conduct a comprehensive set of experiments to evaluate that a user has made within a given time period, beyond just those the proposed model based on human expert judgments. pertaining to the hot topic under consideration. Regular users tend (3) On the basis of the spammer behavior detection, we conduct to express their personal opinion in original postings, while spam- a diachronic analysis of a specific case, “Wang Baoqiang’s mers tend to copy template postings for efficiency. To highlight divorce", to examine the effectiveness of the spammers’ be- their arguments, spammers also post or repost more topic post- havior in shaping public opinion.