Detecting Multipliers of Jihadism on Twitter
Total Page:16
File Type:pdf, Size:1020Kb
2015 IEEE 15th International Conference on Data Mining Workshops Detecting Multipliers of Jihadism on Twitter Lisa Kaati Enghin Omer Nico Prucha Amendra Shrestha FOI/Uppsala University Uppsala University ICSR Uppsala University Stockholm, Sweden Uppsala, Sweden London, UK Uppsala, Sweden [email protected] [email protected] [email protected] [email protected] Abstract—Detecting terrorist related content on social media and reach out via social media. Content, Arabic and non- is a problem for law enforcement agency due to the large amount Arabic is not often removed, for example, the first video of of information that is available. This work is aiming at detecting the self-proclaimed ”Caliph” Abu Bakr al-Baghdadi, published tweeps that are involved in media mujahideen - the supporters in June 2014 is still available to download as of writing.1 of jihadist groups who disseminate propaganda content online. The speech, given in al-Baghdadi’s mother tongue Arabic To do this we use a machine learning approach where we make was published in English, German, French and Russian. The use of two sets of features: data dependent features and data independent features. The data dependent features are features English version of a video by the Ansar Bayt al-Maqdis that are heavily influenced by the specific dataset while the data group showing attacks against Egyptian soldiers published independent features are independent of the dataset and can be in February 2014 is still available to download on the links used on other datasets with similar result. By using this approach published by the group to highlight the operations on the Sinai we hope that our method can be used as a baseline to classify Peninsula2 - in the meantime the group has merged with ISIS violent extremist content from different kind of sources since data while Sinai was already announced a ”province” (walaya)of dependent features from various domains can be added. a future Islamic State in the February video [4]. Take the last In our experiments we have used the AdaBoost classifier. The big al-Qaeda speech featuring Ayman al-Zawahiri declaring results shows that our approach works very well for classifying a franchise in India published September 2014 in multiple English tweeps and English tweets but the approach does not languages, including Bangladeshi, all links are still functioning perform as well on Arabic data. with the only difference to ISIS videos the download- and view count.3 I. INTRODUCTION With abundant jihadist material freely available online and Detecting and removing terrorist related content on the with an ever present crowd of sympathizers at hand, the nature Internet is an important and difficult task for law enforcement of jihadist spheres on the Internet have changed, adapted, agencies all over the world. Jihadist groups, and specifically and increased with the technical developments of the World ISIS, have been able to maintain a persistent online presence Wide Web over time. It is al-Qaeda who are the pioneers by sharing content through a broad network of ”media mu- on the front lines and in attacking western capitals, while jahideen”. The internet has been identified by senior Sunni maintaining for decades ideologically consistent materials, extremists as a ”battlefield for jihad, a place for missionary published to indoctrinate and recruit potential new members. work, a field of confronting the enemies of God” [1]. This AQ has created the very foundation of Sunni jihadist ideology was further encouraged by a ”Twitter Guide” (dalil Twitter) - and managed to ensure it’s persistent and take-down resilient posted on the Shumukh al-Islam forum which outlined reasons presence on the Internet [5]. for using Twitter as an important arena of the electronic front Within the framework of the turmoil in the wake of (ribat) [2]. Since 2011 the Syrian conflict, recognized as the the ”Arab Spring”, and especially with the conflict in Syria most ”socially mediated” in history, has developed into the growing in intensity and scope, AQ has been able to re- new focal point for jihadi media culture [3]. emerge with two linked groups, Jabhat al-Nusra (JN) and The Facilitating the Internet as the prime and most effective Islamic State of Iraq and al-Sham (ISIS) [6]. While JN has (as well as cost-effective) communication facility to lure con- pledged allegiance ( bay’a) to Ayman al-Zawahiri, ISIS under sumers into their specific interpretation or world perception the rule of Abu Bakr al-Baghdadi has morphed into the greatest is not restricted to the jihadi web. Militant and hate groups success of al-Qaeda iconography and doctrine by adhering to of all colors employ similar means to gain sympathy through the ideology without being subjected to its formal leadership. modern and pop-cultural elements. However, the quantity as As thus, the renaissance of al-Qaeda doctrine is largely based well as quality, not to neglect the multi-lingual capacity, of on its most powerful tool that allows the network to morph jihadi media departments is unmatched and unprecedented. and spread in many directions: the professional and coherent The rhetoric is inseparable from the (audio-) visual content decentralized use of the Internet and the continuous and tireless and enforces key elements while reaching out to the audience deployment of media-workers and media-dedicated brigades to get active, empower the ”Islamic community” (umma)by 1Khutba wa-salat al-jum’a fi l-jami’a al-kabir bi madinat Mosul, Mu’assasat individual response. al-Furqan. Avaliable at https://archive.org/download/KhotbaJomaa/ 2Sawlat al-ansar - ’amaliyat al-mujahidin dudd jaysh al-ridda al-masriyya, Extremist content appeals to Arabs and non-Arabs, while Fursan al-Balagh. Available at http://justpaste.it/fursan-trv-swlansar the latter group is directly approached by militant media oper- 3I’lan insha’ far’a jadid l-jama’a qa’idat al-jihad fi shiba al-qara al-hindiyya, atives who provide translations of Arabic language materials Mu’assasat al-Sahab. Available at https://archive.org/details/Indiann 978-1-4673-8493-3/15 $31.00 © 2015 IEEE 954 DOI 10.1109/ICDMW.2015.9 embedded with fighting jihadi units in the real-life battle arenas domain experts annotate the collected messages independently [6]. - this is one step that we have been able to avoid due to the nature of the dataset we use. Three classification techniques Twitter has become an important way of communicating are used: SVM, Naive Bayes and Adaboost. In their result for jihadist groups, most likely since it is easy to use and SVM outperformed the other classifiers and the results showed provide rapid updates to a large amount of users. Twitter has that by adding more feature sets (i.e., syntactic, stylistic, also been used live at the battlefield to report about injuries or content-specific, and lexicon features) the effectiveness of deaths of fighters and battle outcomes without any censorship. the classifiers for radical opinion identification was improved One of the most common approaches to stop terrorist groups significantly. A conclusion that can be drawn from this paper is from spreading propaganda and other forms of terrorist related that content-specific features are important features in opinion content is to suspend accounts when they are discovered. mining, something that can be seen in our experiments as well. Usually this approach requires that human analysts manually read and analyze an enormous amount of information on social Machine learning on ISIS related tweets is also done in media. In this work we use machine learning to automatically [10] where the authors used data from twitter to classifying detect tweeps (user accounts on Twitter) that are supporters users as potentially supporting or opposing ISIS before the of jihadist groups and disseminate propaganda content online. user explicitly write a tweet identifying his/her stance. The We also use machine learning to detect individual tweets that dataset that they use consist of a collection of Arabic tweets contains propaganda. referring to ISIS, the tweets are then classified into pro-ISIS and anti-ISIS. The classification into pro-ISIS and anti-ISIS A. Outline is done automatically using the name variants that is used to refer to the organization: the full name and the description as This paper is outlined as follows. In Section II work related ”state” (in Arabic: dawla) is associated with support, whereas to ours is described. In Section III we described our approach abbreviations (da’ish) usually indicate opposition. In our work to detect tweeps that are involved in media mujahideen. we use a network of known jihadists accounts whom we We also described the data dependent features and the data have identified and graded as such by reading and manually independent features that we use in our experiments. In Section assessing their Arabic and non-Arabic tweets. The features IV the experimental setup is described, including the datasets used in [10] are similar to the features that we use in this work: that we have used. Our experimental results are reported in bag-of-words features, including individual terms, hashtags Section V and a discussion about the results is presented in and user mentions. What can be noted is that the results Section VI. Finally, some conclusions and directions for future from [10] shows that supporters and opposers of ISIS can work is presented in SectionVII. be separated with high accuracy using features that are data dependent. II. RELATED WORK In [11] experiments are done on classifying ISIS related Analyzing terrorist related content on the Internet has individual tweets. The dataset that is used is a subset of the previously done in many variations. The approach that we dataset that we have used in this paper and it consist of use is text classification, an approach that has been used in English tweets from a network of known jihadists accounts.