Troll Detection a Comparative Study in Detecting Troll Farms on Twitter Using Cluster Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Troll Detection a Comparative Study in Detecting Troll Farms on Twitter Using Cluster Analysis DEGREE PROJECT IN COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2016 Troll Detection A comparative study in detecting troll farms on Twitter using cluster analysis FELIX DE SILVA MARTIN ENGELIN KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Trolldetektion En jämförande studie i att upptäcka trollfarmar på Twitter med hjälp av klusteralgoritmer FELIX DE SILVA MARTIN ENGELIN Examensarbete inom datalogi, DD143X Handledare: Dilian Gurov Examinator: Örjan Ekeberg CSC, KTH 2016-05-11 Abstract The purpose of this research is to test whether clustering algorithms can be used to detect troll farms in social networks. Troll farms are profes- sional organizations that spread disinformation online via fake personas. The research involves a comparative study of two different clustering algo- rithms and a dataset of Twitter users and posts that includes a fabricated troll farm. By comparing the results and the implementations of the K- means as well as the DBSCAN algorithm we have concluded that cluster analysis can be used to detect troll farms and that DBSCAN is better suited for this particular problem compared to K-means. Sammanfattning Målet med denna rapport är att testa om klusteringalgoritmer kan användas för att identifiera trollfarmer på sociala medier. Trollfarmer är professionella organisationer som sprider desinformation online med hjälp av falska identiteter. Denna rapport är en jämförande studie med två olika klusteringalgoritmer och en datamängd av Twitteranvändare och tweets som inkluderar en fabrikerad trollfarm. Genom att jämföra resultaten och implementationerna av algoritmerna K-means och DBSCAN får vi fram slutsatsen att klusteralgoritmer kan användas för att identifiera trollfar- mar och att DBSCAN är bättre lämpad för detta problem till skillnad från K-means. Contents 1 Introduction 1 1.1 Problem definition . .1 1.2 Scope and constraints . .1 2 Background 3 2.1 Twitter . .3 2.1.1 Twitter REST API . .3 2.2 IFTTT . .3 2.3 Troll . .4 2.3.1 History . .4 2.3.2 Characteristics . .4 2.4 Cluster Analysis . .5 2.4.1 What is a cluster? . .5 2.4.2 Similarity between data points . .6 2.4.3 Hierarchical Clustering . .6 2.4.4 Partitive Clustering . .7 2.4.5 Model-Based Clustering . .8 2.4.6 Density-based Clustering . .9 3 Method 10 3.1 Generate trolls . 10 3.2 Construct the cluster algorithms . 10 3.3 Collect Twitter data . 11 3.4 Generate results . 12 3.4.1 DBSCAN . 12 3.4.2 K-means . 12 3.5 Method reasoning . 12 4 Results 13 4.1 Twitter Data . 13 4.2 Algorithms . 13 4.2.1 K-means . 13 4.2.2 DBSCAN . 15 5 Discussion 18 5.1 Future research . 18 5.2 Method discussion . 19 6 Conclusion 20 7 Appendix 22 7.1 Twitter data . 22 7.2 K-means results - multiple run . 25 7.3 DBSCAN results . 26 1 Introduction For millions of people around the world social media sites are an integrated part of their daily life. There are hundreds of different social media sites supporting a wide range of practices and interests [5]. Social networks such as Facebook and Twitter have become a source for news and a platform for political and moral debate for a lot of users. Stories with different degrees of truthfulness are spread and little source criticism is applied by regular people as well as journalists. [10] The act of spreading disinformation on social media has developed from being caused by bored youths to being commercialized by organisations and political blocks in the form of troll farms. A troll farm is an organization whose sole purpose is to affect public opinion with the means of social media. A practical implementation of a system or a software that can identify troll farms could be used in order to stop them and therefore avoid the spread of disinformation. Such an implementation would be interesting to the politicians, media, social networks or organizations that are targeted since it could be used to clear their names. 1.1 Problem definition The aim of this project is to investigate ways to detect troll farms on Twitter with clustering algorithms and the Twitter API. The approach will be to study clustering algorithms, apply them to a database of tweets and analyze them. Clustering algorithms are very dependent on the cluster structure, and there is therefore no one algorithm that works on every problem instance. The goal is to research the Twitter REST API to uncover what kind of cluster algorithms is the most appropriate when trying to use cluster twitter users. The research will also involve different kinds of clustering models and appropriate algorithms for them, in order to find out if there is any comparable advantages or disadvantages between different clustering models. Therefore, the problem statement is: • Which clustering model is the most appropriate for clustering twitter data in search of troll farms? 1.2 Scope and constraints For this project we will research what types of clustering models there are and based on that research choose two models to use and analyze. Our search will be based on the activity of users rather than the content of their statuses. The main parameters to be analyzed are: • Time of day activity • Rate of tweets 1 Future research on this topic can analyze more models as well as other search parameters. A more detailed description of the search parameters can be found in section 3. 2 2 Background 2.1 Twitter Twitter is a social network platform based on one-to-many communication which enables users to post messages using 140 or fewer characters. These posts, called Tweets, can include plain text, links or other media hosted on different web servers. This simple design enables several different uses of Twitter, making it a sort of mashup of text, email, IM, news forum, microblog and social network. [13] Twttr, which was the original name for Twitter, was created in 2006. In the beginning it was a text message-based communication tool for groups. Text mes- sages (SMS) are for historical reasons limited to 160 characters, which resulted in Twitter’s character limit of 140: 20 characters for username, 140 characters for the message. The first real success for the platform was at SXSW Interactive in 2007 where it won the SXSW Web Award in the Blog category. At this point smartphones had not become a big hit yet and the user base for phones that could only text was big. This was an important reason for Twitters success, the fact that anyone could engage in social media without a computer. Today Twitter has evolved into a web-based product with simple but smart APIs. [13] 2.1.1 Twitter REST API API stands for Application Programming Interface, and is a set of routines, protocols, and tools for building software applications and it’s what allows an application to share its data to the rest of the world. Like a website, it is accessed through URL requests but instead of returning web pages it returns structured data. The Twitter API was originally divided into two REST APIs and a Streaming API. REpresentational State Transfer (REST) is an architectural style that makes sure that data is stateless, layered and well defined. This increases scalability and flexibility as well as ease of development. [9] The two REST APIs of Twitter were due to historical reasons. A company called Summize, Inc provided search capability for Twitter data. When Summize later on was acquired by Twitter it proved difficult to fully integrate Twitter Search and its API into the Twitter codebase. It took several years to do this, but today they are both integrated into a single REST API. [13] The REST API uses OAuth to identify Twitter users and applications. [8] 2.2 IFTTT IFTTT or "If This Then That" is a web service that can connect and aggregate many other web apps into one platform and then perform a specific action given some certain criteria. IFTTT give creative control over app and products to create recipes to perform these actions. A recipe simply connects apps and web services together to create an action that can be performed under the condition that some criteria has occurred. There are two types of recipes IF-recipes and DO-recipes that do actions in different manners. [7] 3 • IF-recipes runs automatically in the background and do its action when the recipe’s IF-condition has been fulfilled. • DO-recipes simply runs its action when it’s manually executed. 2.3 Troll An Internet Troll does not in any way resemble the original mythical creature from the old Scandinavian folklore. An Internet Troll (hence refereed to as Troll) is a person that interrupts, harasses or tries to impose his/her own opinions to others. [16] In early days trolls where mostly considered as a small nuisance on online forums. Since then the Internet has grown and the problem of trolling with it. From being an activity primarily performed by bored individuals, it has evolved to be industrialized by states and terrorist organizations. These professional groups are sometimes called Troll farms. [2] Trolls and their activities, trolling, can exist in any kind of social media. Twitter has become a popular platform for trolling activity due to its role as a news forum as well as the fact that anyone can create multiple accounts. 2.3.1 History Troll farms and their activities are often criminal, or at least morally question- able adn because of this fact their history is not completely clear. Professional troll farms have been known to exist in Russia since at least 2008, but possibly even before that and probably all around the world.
Recommended publications
  • Trolls Can Sing and Dance in the Movies
    can sing and dance in the Movies TrollsBut let’s be clear! Internet Trolls are not cute or funny! In Internet slang, a troll is a person who creates bad feelings on the Internet by starting quarrels or upsetting people, or by posting inflammatory, extraneous, or off-topic messages with the intent of provoking readers into an emotional outburst. 01. The Insult Troll 06. The Profanity and All-Caps Troll The insult troll is a pure hater, plain and simple. They will This type of troll spews F-bombs and other curse words with his often pick on everyone and anyone - calling them names, caps lock button on. In many cases, these types of trolls are just accusing them of certain things, doing anything they can to bored kids looking for something to do without needing to put too get a negative emotional response from them. This type of much thought or effort into anything. trolling can be considered a serious form of cyberbullying. 02. The Persistent Debate Troll 07. The One Word Only Troll This type of troll loves a good argument. They believe they're There's always that one type of troll who just says "LOL" or "what" or right, and everyone else is wrong. They write long posts and "k" or "yes" or "no." They may not be the worst type of troll online, they're always determined to have the last word - continuing but when a serious or detailed topic is being discussed, their one- to comment until that other user gives up. word replies are just a nuisance.
    [Show full text]
  • Online Harassment: a Legislative Solution
    \\jciprod01\productn\H\HLL\54-2\HLL205.txt unknown Seq: 1 11-MAY-17 15:55 ONLINE HARASSMENT: A LEGISLATIVE SOLUTION EMMA MARSHAK* TABLE OF CONTENTS I. INTRODUCTION .......................................... 501 II. WHY IS ONLINE HARASSMENT A PROBLEM?................ 504 R a. The Scope of the Problem ............................ 504 R b. Economic Impact .................................... 507 R i. Lost Business Opportunities ...................... 507 R ii. Swatting ........................................ 510 R iii. Doxxing ........................................ 511 R III. CURRENT LAW .......................................... 512 R a. Divergent State Law ................................. 512 R b. Elements of the Law ................................. 514 R IV. LAW ENFORCEMENT AND INVESTIGATIVE PROBLEMS ........ 515 R a. Police Training ...................................... 515 R b. Investigative Resources .............................. 519 R c. Prosecutorial Jurisdiction ............................ 520 R V. SOLUTION ............................................... 521 R a. Proposed Legislation ................................ 521 R b. National Evidence Laboratory ........................ 526 R c. Training Materials ................................... 526 R VI. CONCLUSION ............................................ 528 R VII. APPENDIX ............................................... 530 R I. INTRODUCTION A journalist publishes an article; rape threats follow in the comments.1 An art curator has a conversation with a visitor to her gallery;
    [Show full text]
  • Address Munging: the Practice of Disguising, Or Munging, an E-Mail Address to Prevent It Being Automatically Collected and Used
    Address Munging: the practice of disguising, or munging, an e-mail address to prevent it being automatically collected and used as a target for people and organizations that send unsolicited bulk e-mail address. Adware: or advertising-supported software is any software package which automatically plays, displays, or downloads advertising material to a computer after the software is installed on it or while the application is being used. Some types of adware are also spyware and can be classified as privacy-invasive software. Adware is software designed to force pre-chosen ads to display on your system. Some adware is designed to be malicious and will pop up ads with such speed and frequency that they seem to be taking over everything, slowing down your system and tying up all of your system resources. When adware is coupled with spyware, it can be a frustrating ride, to say the least. Backdoor: in a computer system (or cryptosystem or algorithm) is a method of bypassing normal authentication, securing remote access to a computer, obtaining access to plaintext, and so on, while attempting to remain undetected. The backdoor may take the form of an installed program (e.g., Back Orifice), or could be a modification to an existing program or hardware device. A back door is a point of entry that circumvents normal security and can be used by a cracker to access a network or computer system. Usually back doors are created by system developers as shortcuts to speed access through security during the development stage and then are overlooked and never properly removed during final implementation.
    [Show full text]
  • In Re Grand Jury Subpoena Gj2020111968168and Applicationof The
    Case 1:20-sc-03082-BAH Document 3 Filed 03/10/21 Page 1 of 16 UNITEDSTATESDISTRICT COURT FOR THE DISTRICT OF COLUMBIA ) IN RE GRAND JURY SUBPOENA ) SC NO. 1:20-sc-03082 GJ2020111968168AND APPLICATIONOF ) THE UNITEDSTATESOF AMERICAFOR ) AN ORDER PURSUANT TO 18 U.S.C. ) Filed Under Seal § 2705(B) ) ) ) Twitter Account: @NunesAlt ) ) TWITTER, INC.’S MOTIONTO QUASH SUBPOENA AND VACATE NONDISCLOSUREORDERAND MEMORANDUMINSUPPORT INTRODUCTION The government has issued a subpoena (the “Subpoena”) for “[a]ll customer or subscriber account information” for the Twitter user @NunesAlt (the “Account”) from October 1, 2020 to present. Under the First Amendment, the government cannot compel Twitter to produce information related to the Account unless it “can show a compelling interest in the sought-after material and a sufficient nexusbetween the subject matter of the investigation and the information it seek[s].” Inre Grand Jury Subpoena No. 11116275,846 F. Supp. 2d 1, 4 (D.D.C.2012)(internal quotation marksomitted).While Twitter does not have visibility into the purpose of the Subpoena, Twitter has serious concerns whether the government can meet this standard given the context in which it has received the Subpoena. It appears to Twitter that the Subpoena may be related to Congressman Devin Nunes’s repeated efforts to unmask individuals behind parody accounts critical of him. His efforts to suppress critical speech are as well-publicized as they are unsuccessful.He recently sued Twitter, attempting to hold it liable for speech by the parody Twitter accounts @DevinCow, @DevinNunesMom,@fireDevinNunes,and @DevinGrapes, and asking the court in that case to Case 1:20-sc-03082-BAH Document 3 Filed 03/10/21 Page 2 of 16 order Twitter to disclose information identifying those accounts.
    [Show full text]
  • Can Public Diplomacy Survive the Internet?
    D C CAN PUBLIC DIPLOMACY SURVIVE THE INTERNET? BOTS, ECHO CHAMBERS, AND DISINFORMATION Edited by Shawn Powers and Markos Kounalakis May 2017 TRANSMITTAL LETTER Tothe President, Congress, Secretary of State and the American People: Established in 1948, the U.S. Advisory Commission on Public Diplomacy (ACPD) is authorized pur­ suant to Public Law 114- 113 to appraise all U.S. government efforts to understand, inform and in­ fluence foreign publics. We achieve this goal in a variety of ways, including, among other efforts, offering policy recommendations, and through our Comprehensive Annual Report, which tracks how the roughly $1.8 billion in appropriated funds is spent on public diplomacy efforts throughout the world. Part of the Commission’s mandate is to help the State Department prepare for cutting edge and transformative changes, which have the potential to upend how we think about engaging with foreign publics. This report aims to achieve precisely that. In order to think carefully about public diplomacy in this ever and rapidly changing communications space, the Commission convened a group of private sector, government, and academic experts at Stanford University’s Hoover Insti­ tution to discuss the latest research and trends in strategic communication in digital spaces. The results of that workshop, refined by a number of follow-on interviews and discussions with other organizations interested in similar questions, are included in this report. Can Public Diplomacy Survive the Internet? features essays by workshop participants that focus on emergent and potentially transformative technology and communication patterns. The essays also highlight the potential challenges and opportunities these changes create for public diplomacy practitioners in particular and the U.S.
    [Show full text]
  • What to Do When Confronted by an Internet Troll. “One Negative Voice Aimed at Me Has the Incredible Power to Drown out a Thousand Positive Ones
    Internet Trolling What to do when confronted by an internet troll. “One negative voice aimed at me has the incredible power to drown out a thousand positive ones. One of the greatest things I can achieve is to never let it.” – Dan Pearce, Single Dad Laughing The key with online Trolls, just like any bully, is to report their behaviour and avoid responding to their comments. You can control who can contact you online and it’s important to make use of the muting, reporting and blocking features available on these platforms. However, it can be immensely frustrating and hurtful to know that people are spreading falsities and lies about you or your loved ones online. This may leave you feeling overwhelmed, stressed and unsure of how to make it stop. We’ve compiled some guidance on the steps you can take in response to this behaviour on Twitter and Facebook. What is an internet ‘Troll’ UrbanDictionary.com defines Trolling as: “The deliberate act, (by a Troll – noun or adjective), of making random unsolicited and/or controversial comments on various internet forums with the intent to provoke an emotional knee jerk reaction from unsuspecting readers to engage in a fight or argument.” The anonymous nature of the internet allows for the bullies to operate without accountability or fear of punishment. Indeed, the self-creation of profiles allows user to pose as anyone (or no-one) and the nature of public platforms allows one post, or one user, to reach millions. Undoubtedly, these platforms can be used as a force for good, yet at the same time, can cause significant harm.
    [Show full text]
  • 'Please Read the Comments': Commenting Cultures
    Selected Papers of #AoIR2020: st The 21 ​ Annual Conference of the ​ Association of Internet Researchers D u b l i n Virtual, Irela nEventd / 2 8/ - 27-3131 Oc Octobertober 2 0202020 ‘PLEASE READ THE COMMENTS’: COMMENTING CULTURES ACROSS PLATFORMS Crystal Abidin Curtin University Platform-specific commenting cultures An old adage about the internet goes “Don’t Read The Comments”. It is a cynical word of caution from supposedly more experienced and savvy internet users, against a slew of negative, abusive, and unhelpful comments that are usually rampant online, stemming from trolling behaviour (Phillips 2015). “Don’t Read The Comments” has become an internet meme. Alongside parody websites (i.e. @AvoidComments n.d.), trawling through the comments section in search of ludicrosity has become an internet genre in and of itself. This comprises the likes of meme factory ‘The Straits Times Comment Section’ which collates absurd comments from users on a specific newspaper’s Facebook page (STcomments n.d.), as well as internet celebrity troll commentators like ‘American Ken’ M (Know Your Meme n.d.) and Singaporean ‘Peter Tan’ (Yeoh 2018), who post comments on a network of social media and fora in stealthily satirical ways that have even been co-opted for advertorials (Vox 2016). Such vernacular practice has in turn provoked a counter-genre of memes known as “I’m just Here For The Comments” (Tenor n.d.), in which users closely follow social media posts mainly for the resulting discussion and engagement in the comments section rather than the actual post itself. It is on this point of departure that this panel turns its focus to commenting cultures across platforms.
    [Show full text]
  • Cyber -Trolling, Cyber-Impersonation and Social Adjustment Among Secondary School Students in Calabar Education Zone, Cross River State, Nigeria
    British Journal of Education Vol.7, Issue 10, pp.44-52, October 2019 Published by ECRTD- UK Print ISSN: ISSN 2054-6351 (print), Online ISSN: ISSN 2054-636X (online) CYBER -TROLLING, CYBER-IMPERSONATION AND SOCIAL ADJUSTMENT AMONG SECONDARY SCHOOL STUDENTS IN CALABAR EDUCATION ZONE, CROSS RIVER STATE, NIGERIA Denwigwe, C.P., Uche, R.D., Asuquo, P.N., and Ngbar, M.W. Department of Guidance and Counselling, University of Calabar, Calabar Nigeria. ABSTRACT: This study is an investigation of cyber-trolling, cyber-impersonation and social adjustment among secondary school students in Calabar Education Zone of Cross River State, Nigeria. Two hypotheses were formulated to guide the discovery of the influence of cyber-trolling and cyber-impersonation on social adjustment of SS1 students which is the main purpose of the study. The research design adopted was the ex-post facto research design. 8829 public secondary school students formed the study population. A total sample of 579 students was selected through the purposive sampling technique. The instrument for data collection was the researcher-made Cyber bullying Assessment Questionnaire (CAQ), constructed on a four-point Likert scale of strongly agreed, agreed, disagreed and strongly disagreed. The Cronbach Alpha Reliability method was used to establish the reliability coefficient of the instrument with a range of 0.71 to 0.79. The statistical tool for data analysis was the One-way Analysis of Variance (ANOVA). The findings of the study revealed after the testing of the two hypotheses at 0.05 level of significance were that cyber-trolling and cyber- impersonation have negative influence on the social adjustment of secondary school students in Calabar Education Zone of Cross River State, Nigeria.
    [Show full text]
  • FCJ-167 Spraying, Fishing, Looking for Trouble: the Chinese Internet and a Critical Perspective on the Concept of Trolling
    The Fibreculture Journal issn: 1449-1443 DIGITAL MEDIA + NETWORKS + TRANSDISCIPLINARY CRITIQUE issue 22 2013: Trolls and the Negative Space of the Internet FCJ-167 Spraying, fishing, looking for trouble: The Chinese Internet and a critical perspective on the concept of trolling Gabriele de Seta Department of Applied Social Sciences, The Hong Kong Polytechnic Abstract: Internet research has dealt with trolls from many different perspectives, framing them as agents of disruption, nomadic hate breeders and lowbrow cynics spawned by the excessive freedoms of online interaction, or as legitimate and necessary actors in the ecology of online communities. Yet, the question remains: what is a troll, where it come from and where does it belong? Presenting the results of a brief troll-hunt on the Chinese Internet and discussing the features of troll-like figures in Chinese digital folklore, I argue in favour of a localised understanding of Internet cultures, presenting trolling as a culture-specific construct that has come to embody disparate kinds of online behaviour and to function as an umbrella term for different kinds of discourse about the Internet itself. ‘There is always need for a certain degree of civilisation before it is possible to understand this kind of humor” Wang Xiaobo, Civilisation and Satire’ 301 FCJ-167 fibreculturejournal.org FCJ-167 The Chinese Internet and a critical perspective on the concept of trolling Introduction: Why trolls, why China?** As an interdisciplinary field, Internet research is in the challenging position of having to work out useful concepts and categories from precarious jargons, concepts and categories that are constantly tested against, and challenged by, the magmatic and unpredictable development of Internet cultures.
    [Show full text]
  • Machine Learning Model That Successfully Detects Russian Trolls 24 3
    Human–machine detection of online-based malign information William Marcellino, Kate Cox, Katerina Galai, Linda Slapakova, Amber Jaycocks, Ruth Harris For more information on this publication, visit www.rand.org/t/RRA519-1 Published by the RAND Corporation, Santa Monica, Calif., and Cambridge, UK © Copyright 2020 RAND Corporation R® is a registered trademark. RAND Europe is a not-for-profit research organisation that helps to improve policy and decision making through research and analysis. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors. Limited Print and Electronic Distribution Rights This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited. Permission is given to duplicate this document for personal use only, as long as it is unaltered and complete. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial use. For information on reprint and linking permissions, please visit www.rand.org/pubs/permissions. Support RAND Make a tax-deductible charitable contribution at www.rand.org/giving/contribute www.rand.org www.randeurope.org III Preface This report is the final output of a study proof-of-concept machine detection in a known commissioned by the UK Ministry of Defence’s troll database (Task B) and tradecraft analysis (MOD) Defence Science and Technology of Russian malign information operations Laboratory (Dstl) via its Defence and Security against left- and right-wing publics (Task C). Accelerator (DASA).
    [Show full text]
  • Chapter 3 Section 5
    SECTION 5: CHINA’S DOMESTIC INFORMATION CONTROLS, GLOBAL MEDIA INFLUENCE, AND CYBER DIPLOMACY Key Findings • China’s current information controls, including the govern- ment’s new social credit initiative, represent a significant es- calation in censorship, surveillance, and invasion of privacy by the authorities. • The Chinese state’s repression of journalists has expanded to target foreign reporters and their local Chinese staff. It is now much more difficult for all journalists to investigate politically sensitive stories. • The investment activities of large, Chinese Communist Par- ty-linked corporations in the U.S. media industry risk under- mining the independence of film studios by forcing them to consider self-censorship in order to gain access to the Chinese market. • China’s overseas influence operations to pressure foreign media have become much more assertive. In some cases, even without direct pressure by Chinese entities, Western media companies now self-censor out of deference to Chinese sensitivity. • Beijing is promoting its concept of “Internet sovereignty” to jus- tify restrictions on freedom of expression in China. These poli- cies act as trade barriers to U.S. companies through both cen- sorship and restrictions on cross-border data transfers, and they are fundamental points of disagreement between Washington and Beijing. • In its participation in international negotiations on global Inter- net governance, norms in cyberspace, and cybersecurity, Beijing seeks to ensure continued control of networks and information in China and to reduce the risk of actions by other countries that are not in its interest. Fearing that international law will be used by other countries against China, Beijing is unwilling to agree on specific applications of international law to cyberspace.
    [Show full text]
  • Downloaded It.1 Choosing a Dataset That Could Be Compared with Troll Tweets Turned out Not to Be a Trivial Task Because a Number of Assumptions Had to Be Met
    Trolljäger: Understanding Troll Writing as a Linguistic Phenomenon Sergei Monakhov, Friedrich Schiller University Jena (Germany) [email protected] Abstract: Te current study yielded a number of important fndings. We managed to build a neural network that achieved an accuracy score of 91 per cent in classifying troll and genuine tweets. By means of regression analysis, we identifed a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypothesised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of many diferent language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with those topics, we showed some very pronounced distributional anomalies, thus confrming our prediction. TROLLJÄGER: UNDERSTANDING TROLL WRITING AS A LINGUISTIC PHENOMENON Introduction In February 2018, the U.S. Justice Department indicted 13 Russian nationals associated with the Internet Research Agency (IRA), based in St. Petersburg, for interfering with the 2016 U.S. presidential election (Barrett, Horwitz, & Helderman, 2018). Those individuals were accused of creating false U.S. personas and operating social media pages and groups designed to attract U.S. audiences and to sow discord in the U.S. political system, which included posting derogatory information about a number of candidates, supporting the presidential campaign of then-candidate Donald J.
    [Show full text]