Detection and Analysis of Water Army Groups on Virtual Community
Total Page:16
File Type:pdf, Size:1020Kb
Detection and Analysis of Water Army Groups on Virtual Community Guirong Chen1,2, Wandong Cai1, Jiuming Huang3, Huijie Xu1, Rong Wang2, Hua Jiang2, Fengqin Zhang2 1 Department of Computer Science, Northwestern Polytechnical University , 710029 Xi’an, China 2 School of Information and Navigation, Air Force Engineering University of PL, 710077 Xi’an, China 3 College of Computer, National University of Defense Technology, 410073 Changsha, China {guirongchen315, rongwang66}@163.com [email protected] [email protected] [email protected] [email protected] [email protected] Abstract. Water army is prevalent in social networks and it causes harmful effect to the public opinion and security of cyberspace. This paper proposes a novel water army groups detection method which consists of 4 steps. Firstly, we break the virtual community into a series of time windows and find the suspicious periods when water army groups are active. Then we build the user cooperative networks of suspicious periods according to user’s reply behaviors and cluster them based on their Cosine similarity. After that, we prune the cooperative networks by just remaining the edges whose weight is larger than some threshold and get some suspicious user clusters. Finally, we conduct deeper analysis to the behaviors of the cluster users to determine whether they are water army groups or not. The experiment results show that our method can identify water army groups on virtual community efficiently and it has a high accuracy. Keywords: Social Networks; Detection of Water Army Groups; Empirical Analysis of Water Army Activities; Virtual Community 1 Introduction With the maturity and popularization of Web2.0 technology, kinds of interactive networks such as online forums, blogs, microblogs and other social networks have become more and more popular. According to the report of China Internet Network Information Center (CNNIC), there are around 618 million Internet users in China, which is approximately 45.8% of its total population [1]. Not only does the Internet become the platform in which people communicate with each other, but also it has been the window through which people observe and supervise the management of the social and the country. However, the inherent characteristics of Internet such as freedom, openness and anonymity and the immature of the laws and supervision mechanisms for Internet management make it possible for some malicious people to abuse it. Water army is one typical example. To attract public attention towards their products or enhance their reputation or attack their competitor’s, companies may hire many people to ‘flood’ the Internet with huge number of purposeful comments and articles. The Internet users who make money by posting comments and articles on different online communities or websites are called ‘water army’ in China. Sometimes water army can be used as an efficient way in business marketing. For example, before a new movie or TV serial is released, the host company may hire water army to submit articles, posts and comments about the movie or TV serial, the actors or actress of it on kinds of online communities to attract people’s attention and trigger their curiosity. But sometimes water army can be used in malicious ways. For example, some people or companies may hire water army to spread negative, false and concoctive information about their competitors. By initiating and taking part in discussions on Internet and submitting a great number of comments, water army can influence the opinions of other people towards some product or social event. Further more, some people or organizations who have an ulterior motive even might hire water army to slander the government and our country. In a word, water army is not only harmful to the benefits of ordinary internet users, normal companies, but also affects the social stability. Hence, detection of water army becomes one of the most important scientific issues today. However, study on detection and analysis of water army is still in the initial stage and few papers about this problem has been published. There are 3 reasons for this: (1) water army is a new phenomenon of the Internet, and studies on this problem have just begun; (2) there is no public datasets in which users have been labeled as water army or not, and it is very difficult to recognize water army accounts by manually reading the comments they submit; (3) it is hard to verify the accuracy of the detection algorithm. Despite the difficulties, researchers at home and abroad have begun to conduct studies on this problem and obtained some achievements. Chen et al. conduct empirical research on the behavior patterns of water army by analyzing the ‘3Q conflict’ real data which they get from several online communities and news websites. They get some meaningful conclusions: (1) water army prefers to submit new topics than replies; (2) the posting time intervals of water army are small; (3) water army would like to use different temporary accounts to log in virtual communities and most of them will discard the accounts when the task is completed; (4) in order to finish the task as quickly as they can, water army incline to submit duplicate or nearly duplicate comments [3]. Fan et al. analyze a real dataset of one online forum manually and find that there are many water army accounts on that online forum and they are well organized [4]. Li et al. try to use text sentiment analysis technology to identify network cheaters who are similar to water army [5]. Review spammer refers to people who submit spam reviews or spam opinions on social network or electronic commerce websites. As water army and review spammers exist similarity in many aspects such as purpose, behavior patterns, organization and so on, so the achievements on review spammer detection have important significance to water army detection. Mo et al. present an overview of spammer detection technologies and give some suggestions for possible extensions [6]. Professor Liu Bing and his research team conduct deep research on detection of review spam and spammers on electronic commerce websites [7-11]. Paper [7] takes advantage of text classification and pattern matching technology to detect review spam. By analyzing the product or brand rating behaviors of different users, paper [8] finds some abnormal rating patterns and detects some review spammers via mining abnormal rating behaviors. Paper [9] presents a novel review graph to describe the relationship of all the reviews, reviewers and stores, and proposes an iterative computation model to identify suspicious reviewers on the electronic commerce website. In view of the fact that spammers on electronic commerce websites usually appear in groups or twists, paper [10] constructs a dataset including spammer groups manually, and analyzes the spammers’ behaviors on individual and population levels, and then proposes a new spammer group detection algorithm. The research findings of Professor Liu Bing and his team have great implications for the paper. Spam and spammer detection problem has also attracted the attention of other researchers at home and abroad. Husn et al. conduct empirical analysis on the behavior characteristics of spam botnets [12]. Brendel et al. construct a social network of online users and detect spammers on that network through computing and analyzing the network properties [15]. Inspired by the PageRank algorithm, zhou et al. propose a novel ranking algorithm which can distinguish malicious users from normal ones. Like paper [8-9], this algorithm is also based on the rating results of different users. Xu et al. analyze the collaborative behaviors of spammers [15], Lu et al. propose a spam and spammer detection algorithm using graph models [16]. Lin et al. present a new method to identify spam using social network analysis technology [17]. To cheat online, some internet users may use fake identities to create the illusion of support to some product or people pretending to be a totally different person. The different identities of the same person online is called ‘sock puppet’. Recently, the problem of sock puppet detection attracts researchers’ attention. Bu et al. present a novel sock puppet detection algorithm which combines link analysis and authorship identification techniques [18]. Based on the fact that sock puppets appear in pairs, Zheng et al. propose a simple method which detects sock puppets by analyzing the relationship of users [19]. Jiang et al. consider water army accounts as outliers and use the outlier detection algorithm to detect them [20]. In a word, the problem of malicious user detection has attracted wide attention [21]. Although researchers have carried on beneficial exploration on the problem of water army detection, there are still many problems to be solved. The findings of Chen et al. can detect water army whose purpose is to spread some specific information wildly [3], but they can’t identify other type of water army. The method proposed by Li et al. can find water army whose comments have intense and consistent emotion or attitude[5], but most of the time, in order to create hot online topics, attract people’s attention, trigger their curious, water army might release many comments whose emotion are ambiguous or contradictory with each other. The existing methods can’t detect this type of water army. The research findings about spammer and sock puppet detection are of great significance to the study of water army detection, while the methods can’t be used to detect water army accounts on online forums directly. There are 3 reasons. Firstly, there is no rating mechanism on forums, so we can’t identify water army accounts via analyzing users’ rating behaviors. Secondly, the contents submitted by forum users are very complex, very difficult to describe, that the detection algorithm based on text analysis can’t get a good accuracy when they are used to detect water army accounts on forums.