Battling the Internet Water Army: Detection of Hidden Paid Posters

Battling the Internet Water Army: Detection of Hidden Paid Posters Cheng Chen Kui Wu Venkatesh Srinivasan Xudong Zhang Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Victoria University of Victoria University of Victoria Peking University Victoria, BC, Canada Victoria, BC, Canada Victoria, BC, Canada Beijing, China Abstract—We initiate a systematic study to help distinguish on different online communities and websites. Companies are a special group of online users, called hidden paid posters, or always interested in effective strategies to attract public atten- termed “Internet water army” in China, from the legitimate tion towards their products. The idea of online paid posters ones. On the Internet, the paid posters represent a new type of online job opportunity. They get paid for posting comments is similar to word-of-mouth advertisement. If a company hires and new threads or articles on different online communities enough online users, it would be able to create hot and trending and websites for some hidden purposes, e.g., to influence the topics designed to gain popularity. Furthermore, the articles opinion of other people towards certain social events or business or comments from a group of paid posters are also likely markets. Though an interesting strategy in business marketing, to capture the attention of common users and influence their paid posters may create a significant negative effect on the online communities, since the information from paid posters is usually decision. In this way, online paid posters present a powerful not trustworthy. When two competitive companies hire paid and efficient strategy for companies. To give one example, posters to post fake news or negative comments about each other, before a new TV show is broadcast, the host company might normal online users may feel overwhelmed and find it difficult to hire paid posters to initiate many discussions on the actors or put any trust in the information they acquire from the Internet. actresses of the show. The content could be either positive or In this paper, we thoroughly investigate the behavioral pattern of online paid posters based on real-world trace data. We design negative, since the main goal is to attract attention and trigger and validate a new detection mechanism, using both non-semantic curiosity. analysis and semantic analysis, to identify potential online paid We would like to remark here that the use of paid posters posters. Our test results with real-world datasets show a very extends well beyond China. According to a recent news promising performance. report in the Guardian [9], the US military and a private Index Terms—Online Paid Posters, Behavioral Patterns, De- tection corporation are developing a specific software that can be used to post information on social media websites using fake online identifications. The objective is to speed up the distribution of I. INTRODUCTION pro-American propaganda. We believe that it would encourage According to China Internet Network Information Center other companies and organizations to take the same strategy to (CNNIC) [6], there are currently around 457 million Internet disseminate information on the Internet, leading to a serious users in China, which is approximately 35% of its total problem of spamming. population. In addition, the number of active websites in China However, the consequences of using online paid posters are is over 1:91 million. The unprecedented development of the yet to be seriously investigated. While online paid posters Internet in China has encouraged people and companies to can be used as an efficient business strategy in marketing, arXiv:1111.4297v1 [cs.SI] 18 Nov 2011 take advantage of the unique opportunities it offers. One core they can also act in some malicious ways. Since the laws issue is how to make use of the huge online human resource to and supervision mechanisms for Internet marketing are still make the information diffusion process more efficient. Among not mature in many countries, it is possible to spread wrong, the many approaches to e-marketing [4], we focus on online negative information about competitors without any penalties. paid posters used extensively in practice. For example, two competitive companies or campaigning Working as an online paid poster is a rapidly growing job parties might hire paid posters to post fake, negative news or opportunity for many online users, mainly college students information about each other. Obviously, ordinary online users and the unemployed people. These paid posters are referred may be misled, and it is painful for the website administrators to as the “Internet water army” in China because of the to differentiate paid posters from the legitimate ones. Hence, large number of people who are well organized to “flood” it is necessary to design schemes to help normal users, the Internet with purposeful comments and articles. This new administrators, or even law enforcers quickly identify potential type of occupation originates from Internet marketing, and it paid posters. has become popular with the fast expansion of the Internet. Despite the broad use of paid posters and the damage they Often hired by public relationship (PR) companies, online have already caused, it is unfortunate that there is currently no paid posters earn money by posting comments and articles systematic study to solve the problem. This is largely because 2 online paid posters mostly work “underground” and no public Example 2: On July 17, 2009, a Chinese IT company Qihu data is available to study their behavior. Our paper is the first 360, also known as 360 for short, released a free anti-virus work that tackles the challenges of detecting potential paid software and claimed that they would provide permanent anti- posters. We make the following contributions. virus service for free. This immediately made 360 a super 1) By working as a paid poster and following the in- star in anti-virus software market in China. Nevertheless, on structions given from the hiring company, we identify July 29 an article titled “Confessions from a retired employee and confirm the organizational structure of online paid of 360” appeared in different websites. This article revealed posters similar to what has been disclosed before [17]. some inside information about 360 and claimed that this 2) We collect real-world data from popular websites regard- company was secretly collecting users’ private data. The links ing a famous social event, in which we believe there are to this post on different websites quickly attracted hundreds of potentially many hidden online paid posters. thousands of views and replies. Though 360 claimed that this 3) We statistically analyze the behavioral patterns of poten- article was fabricated by its competitors, it was sufficient to tial online paid posters and identify several key features raise serious concerns about the privacy of normal users. Even that are useful in their detection. worse, in late October, similar articles became popular again 4) We integrate semantic analysis with the behavioral pat- in several online communities. 360 wondered how the articles terns of potential online paid posters to further improve could be spread so quickly to hundreds of online forums in a the accuracy of our detection. few days. It was also incredible that all these articles attracted The rest of the paper is organized as follows. We present a huge amount of replies in such a short time period. more background information and identify the organizational In 2010, 360 and Tencent, two main IT companies in China, structure of online paid posters in Section II. Section III were involved in a bigger conflict. On September 27, 360 presents our data collection method. In Section IV, we sta- claimed that Tencent secretly scans user’s hard disk when its tistically analyze non-semantic behavioral features of online instant message client, QQ, is used. It thus released a user paid posters. In Section V, we introduce a simple method privacy protector that could be used to detect hidden operations for semantic analysis that can greatly help the detection of of other software installed on the computer, especially QQ. In online paid posters. In Section VI, we introduce our detection response, Tencent decided that users could no longer use their method and evaluation results. Related work is discussed in service if the computer had 360’s software installed. This event Section VII. We conclude the paper in Section VIII. led to great controversy among the hundreds of thousands of the Internet users. They posted their comments on all kinds II. HOW DO ONLINE PAID POSTERS WORK? of online communities and news websites. Although both 360 and Tencent claimed that they did not hire online paid posters, A. Typical Cases we now have strong evidence suggesting the opposite. Some To better understand the behavior and the social impact special patterns are definitely unusual, e.g., many negative of online paid posters, we investigated several social events, comments or replies came from newly registered user IDs which are likely to be boosted by online paid posters. We but these user IDs were seldom used afterwards. This clearly introduce two typical cases to illustrate how online paid posters indicates the use of online paid posters. could be an effective marketing strategy, in either a positive Since a large amount of comments/articles regarding this or a negative manner. conflict is still available in different popular websites, we in Example 1: On July 16, 2009, someone posted a thread this paper focus on this event as the case study. with blank content and a title of “Junpeng Jia, your mother asked you to go back home for dinner!” on a Baidu Post Com- B. Organizational Structure of Online Paid Posters munity of World of Warcraft, a Chinese online community 1) Basic Overview: These days, some websites, such as for a computer game [14].

Battling the Internet Water Army: Detection of Hidden Paid Posters

Address Munging: the Practice of Disguising, Or Munging, an E-Mail Address to Prevent It Being Automatically Collected and Used

Freedom on the Net 2016

Adversarial Web Search by Carlos Castillo and Brian D

The History of Digital Spam

Image Spam Detection: Problem and Existing Solution

Understanding the Business Ofonline Pharmaceutical Affiliate Programs

Internet Security

Administrator's Guide for Synology Mailplus Server

An Empirical Assessment of the CAN SPAM Act

Producers of Disinformation – Version

Criminal Abuse of Domain Names Bulk Registration and Contact Information Access

Anti-Spam Methods