<<

How to characterise the discourse of the far-right in digital media? Interdisciplinary approach to preventing terrorism. S. Alava, N. Chaouni

To cite this version:

S. Alava, N. Chaouni. How to characterise the discourse of the far-right in digital media? Interdisci- plinary approach to preventing terrorism.. Procedia Computer Science, Elsevier, 2020, 176, pp.2515- 2525. ￿10.1016/j.procs.2020.09.324￿. ￿hal-03092971￿

HAL Id: hal-03092971 https://hal.archives-ouvertes.fr/hal-03092971 Submitted on 3 Jan 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons CC0 - Public Domain Dedication| 4.0 International License Available online at www.sciencedirect.com Available online at www.sciencedirect.com

Available onlineScienceDirect at www.sciencedirect.com ScienceDirect Procedia Computer Science 00 (2020) 000–000 www.elsevier.com/locate/pr Procedia ScienceDirectComputer Science 00 (2020) 000–000 ocedia Procedia Computer Science 176 (2020) 2515–2525 www.elsevier.com/locate/pr ocedia 24th International Conference on Knowledge-Based and Intelligent Information & Engineering 24th International Conference on KnowledgeSystems-Based and Intelligent Information & Engineering Systems How to characterise the discourse of the far-right in digital media? How to Icharacterinterdisciplinaryse the discourse approach ofto thepreventing far-right terrorism. in digital media? Interdisciplinary approach to preventing terrorism. S. Alavaa,*, N. Chaounib, Y. Charlesc, aUniversity of Toulouse II Jean Jaurès, Project Leadera,* of PRACTICIES EUROPA,b 5 allée Antonioc, Machado 31078 Toulouse France, S. Alava seraphin.alava@univ, N. Chaouni-tlse2.fr, Y. * Charles a UniversitybUniversity of Toulouse of Toulouse II Jean II Jaurès, Jean Jaurès, Project 5 Leader allée Antonio of PRACTICIES Machado EUROPA,31078 Toulouse 5 allée France, Antonio nawel.chaouni@univ Machado 31078 Toulouse-tlse2.fr France, * cUniversity of Toulouse II Jean Jaurès, 5 alléeseraphin.alava@univ Antonio Machado- tlse2.fr31078 Toulouse France, [email protected] bUniversity of Toulouse II Jean Jaurès, 5 allée Antonio Machado 31078 Toulouse France, [email protected] cUniversity of Toulouse II Jean Jaurès, 5 allée Antonio Machado 31078 Toulouse France, [email protected] Abstract Abstract The fight against extremist discourse on the Internet and on social media is paramount in countering terrorism and radical recruitment. The approach could be simple and elected representatives and authorities seem to want to legislate quickly on this subject. However, fromThe fight a sci againstentific point extre ofmist view, discourse things onare the not Internet so simple. and Characterisingon social media a is discourse paramount that in iscountering coherent, terrorismrepetitive and and radical identifiable recruitment. might beThe easy,approach but couldradical, be simpleterrorist and discourse elected representativesis a very complex and authoritieslinguistic and seem sociolinguistic to want to legislate phenomenon. quickly Moreover,on this subject. modes However, of its difromssemination a scientific and point communication of view, things are arecomplex. not so Withinsimple. the Characterising framework ofa discoursea French researchthat is coherent, project repetitive(ANR: Défense) and identifiable we analysed mig theht publicbe easy, productions but radical, of extremeterrorist rightdiscourse-wing isgroups a very in complexorder to linguisticcollect pieces and sociolinguisticof discourse phenomenon.and try to characterise Moreover, themmodes with of anits interdidisseminationsciplinary and approach communication (sociology, are complex. political Withinscience, the linguistics, framework sociolinguistics, of a French researchcommunication). project A(ANR: mathematical Défense) weand analysed algorith micthe modellingpublic productions will allow of an extremeautomation right of- wingsearches groups in order in order to set toup collecta warning pieces mechanism, of discourse with aand sociological try to characterise objective of themident ifwithication an andinterdi thussciplinary validation approach for digital (sociology, content producerspolitical science, and access linguistics, providers. sociolinguistics, communication). A mathematical and algorithmic modelling will allow an automation of searches in order to set up a warning mechanism, with a sociological objective of identification and thus validation for digital content producers and access providers.

© 2019 The Author(s). Published by Elsevier B.V. © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This© 2019 is an The open Author(s). access Publishedarticle under by theElsevier CC BY-NC-NDB.V. license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of KES International. Peer-reviewThis is an open under access responsibility article under of thethe scientificCC BY-NC committee-ND license of ( https://creativecommons.org/licenses/bythe KES International. -nc-nd/4.0/)

PeerKeywords:-review radicalisation, under responsibility terrorism, hate of KESspeech, International xenophobia,. anti-Semitism, masculinism, conspiracy, supremacism, racism

I.Keywords: radicalisation,Background terrorism, hate speech, xenophobia, anti-Semitism, masculinism, conspiracy, supremacism, racism

InI. just a fewBackground years, the Internet has become the central medium of the communication sphere. It has gradually become almost the only daily medium for young people aged 15 to 25. Nothing is possible without the Internet. The best and In just a few years, the Internet has become the central medium of the communication sphere. It has gradually become sadlyalmost the the worst. only daily In the medium UNESCO for reportyoung wepeople have aged produced 15 to on25. the Nothing links between is possible social without media the andInternet. radicali Thes ationbest ,and we noted the growing importance of social networks in propaganda and recruitment of young activists, which is the case sadly the worst. In the UNESCO report we have produced on the links between social media and radicalisation, we notedfor all theextremist growing movements importance (ALAVA of social S. al.networks 2017). inThe propaganda connection betweenand recruitment terrorist of groups young andactivists the ,media which hasis the a caselong history (HUGHES, F-B, 2011). It emerged in the late 19th century with the anarchist attacks and reached its apogee in for all extremist movements (ALAVA S. al. 2017). The connection between terrorist groups and the media has a long France with the small newspaper and acts committed by the Bonnot gang; terrorism comprises two phases (dynamite history (HUGHES, F-B, 2011). It emerged in the late 19th century with the anarchist attacks and reached its apogee in andFrance the with front the page) small which newspaper fuel one and another acts committed on the terrorist by the scene Bonnot, the gang fact; terrorismperfectly comprisesunderstood two by phases extremist (dynamite groups (DEBRAY, R. 2002) and which soon made way for other terrorist temporalities such as propaganda and recruitment. and the front page) which fuel one another on the terrorist scene, the fact perfectly understood by extremist groups (DEBRAY, R. 2002) and which soon made way for other terrorist temporalities such as propaganda and recruitment.

* Corresponding author. Tel.: +33 6 30 74 90 03. E-mail address: [email protected] * Corresponding author. Tel.: +33 6 30 74 90 03. E-mail address: [email protected] 1877-0509 © 2019 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) 1877Peer--review0509 ©under 2019 responsibility The Author(s). of KESPublished International by Elsevier. B.V. 1877-0509This is an © open 2020 Theaccess Authors. article Published under by theElsevier CC B.V.BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) ThisPeer is-review an open under access responsibilityarticle under the of CC KES BY-NC-ND International license. (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of the KES International. 10.1016/j.procs.2020.09.324

10.1016/j.procs.2020.09.324 1877-0509 2516 S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 2 Author name / Procedia Computer Science 00 (2020) 000–000 From 2001, in the wake of the attacks in the United States, the Internet quickly became the centre of concern for intelligence services and international organisations. The fight against terrorism involves combating extremist theories, discourses and propaganda present on the Internet. Removing links, websites, videos with explicitly terrorist or extremist content is becoming a research priority and we are lacking a technological response. This fight against radical discourse online gave rise as well to another policy known as counter-discourse. Thought-up in the United States, the concept of counter-discourse stems from argumentation. Counter-discourse only works when a discourse to be fought against is identified. The American analysis of this era, which was proudly integrated into European thought, added value to civilizational explanation of terrorism to the detriment of its political or geostrategic reading. The terrorist conveys a discourse of hatred, exclusion and rejection. Preventing terrorism means building tools to counter-argue, thus falsify these speeches (AUBOUSSIER, J; 2014). The logic is therefore argumentative and the reason, the validity and the proof are always on the side of the counter- speech that must be valued and supported in a voluntary communication approach. Indeed, since extremist groups are flooding the Internet and social media with propaganda and recruitment speeches, for institutions and NGOs it will be a matter of producing counter-discourse without knowing very precisely what it is and, moreover, without knowing what radical speeches really are. The winning strategy is then based solely on the confrontation between discourse and counter-discourse (Plantin 1996). Controversy is therefore at the heart of counter-discourse (Kerbrat-Orecchioni 1980), but must be based on a relevant and informed process of analysis of extremist discourse in order to better dissect it (Amossy, Burger 2012). This sociolinguistic and communicational requirement was at the heart of preventing extremism through discourse over the last 10 years. In 2018, in the 14531 report, the Council of Europe Assembly noted the exclusive choice to focus the counter- propaganda actions solely on the counter-discourse. "The report concludes that counter-speeches against terrorist speeches are not enough. It is essential to develop a positive, preventive and credible alternative discourse at EU level, promoting common values and facilitating dialogue, encouraging awareness and dispelling misinformation" Council of Europe, April 19th, 2018. In order to do so, we first need to know what terrorist discourse is and what its components are. In the framework of the FLYER project (Artificial Intelligence for Analysing Extremist Content on the Internet) co-financed by the French National Research Agency in partnership with the Defence Innovation Agency (AID) (a service with national competence attached to the Directorate General of Armaments (DGA)) we set ourselves an objective of identifying and developing a database containing two types of radicalisation (Salafist Jihadism and far-right wing). Our project intends to characterise these discourses and develop, thanks to artificial intelligence, technological tools that will allow us to detect these discourses on the Web and thus alert user services (police, defence, intelligence, access providers, hosts). Our methodology uses models and resources created by researchers in Social sciences, these models will allow us to characterise content according to lexical, discursive and semantic criteria. The project adopts supervised adaptive learning, a hybrid paradigm enhancing learning algorithms with adaptive mechanisms capable of acquiring new descriptors, in order to recognise and follow the progression of concepts. The project thus implements advanced functionalities for a systematic exploration of cyberspace.

Multiple questions for a sensitive topic

2.1- How to define the ultra-right?

In order to produce relevant results it is necessary to answer several questions that will condition the way we work. In this article we focus on one form of radicalisation that Europol in their 2019 report calls right-wing terrorism or extreme right-wing radicalisation in order to distinguish it from lawful opinions, parties, public actions or movements claiming to be nationalist or right-wing extremist, but also to distinguish violent extremist actions from extremist opinions that are not illegal. The same report suggests characterising violent radical movements not only by terrorist acts but also by violent and hateful behaviour expressed in the public space or in the media. It is therefore the public sphere and the media that make it possible to identify radical extreme right groups. The scene of right-wing extremism varies considerably among EU Member States. There is a wide variety of spaces for expressions and viewing modes, but the “fachosphere” (fascist, reactionary sphere) includes all these groups in their various expressions: groups of National Socialists, neo-Nazis or neo-fascists, revisionists, racist or anti-Semitic groups, skinheads and extremist hooligans, nativists or identity-based groups, paramilitary groups and xenophobic or anti-migrant groups. Characterising the discourse of these groups faces a great methodological difficulty, since they operate in a nebulous and constantly interacting environment. We suggest focusing on the ultra-right groups as defined by Jean-Yves Camus (Camus J.Y., 2006) – groups or individuals who do not act within democratic structures or centralized organizations. They are most often in opposition to what they call the system or democracy. These groups have three characteristics S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 2517

2 Author name / Procedia Computer Science 00 (2020) 000–000 Author name / Procedia Computer Science 00 (2020) 000–000 3 From 2001, in the wake of the attacks in the United States, the Internet quickly became the centre of concern for that will allow us to identify them and to include or exclude them from the databases based on our research. The ultra- intelligence services and international organisations. The fight against terrorism involves combating extremist theories, right activists are groupings that operate on the fringes of political parties that accept the workings of democracy. discourses and propaganda present on the Internet. Removing links, websites, videos with explicitly terrorist or extremist They are active in the field (stadium, demonstrations, autonomous actions) where they publicly display their signs, content is becoming a research priority and we are lacking a technological response. This fight against radical discourse slogans and rituals. They are very active on the Internet (DUPIN, E. 2017) online gave rise as well to another policy known as counter-discourse. Thought-up in the United States, the concept of counter-discourse stems from argumentation. Counter-discourse only The ultra-right's repertoire of actions gives prominence to digital actions (CASTELLI GATTINARA, P. & FROJO, C. 2018) works when a discourse to be fought against is identified. The American analysis of this era, which was proudly (PRUNEAU, C. 2014). We present here all practices that have been observed by other researchers working on ultra- integrated into European thought, added value to civilizational explanation of terrorism to the detriment of its political left groups or environmental activists. (ALLARD, C. 2007) or geostrategic reading. The terrorist conveys a discourse of hatred, exclusion and rejection. Preventing terrorism means building tools to counter-argue, thus falsify these speeches (AUBOUSSIER, J; 2014). • Slacktivism: simple way of taking a political stand using harmless emojis that identify people's opinions (glass The logic is therefore argumentative and the reason, the validity and the proof are always on the side of the counter- of milk – supremacist). Most extremist groups use images and signs to recognise themselves and to convey speech that must be valued and supported in a voluntary communication approach. Indeed, since extremist groups are their ideas on social media. flooding the Internet and social media with propaganda and recruitment speeches, for institutions and NGOs it will be • Clicktivism: virtual modality of action consisting in organising demonstrations by clicking "Like or Hate" and a matter of producing counter-discourse without knowing very precisely what it is and, moreover, without knowing what retweeting to support a group or to prevent it from speaking. radical speeches really are. The winning strategy is then based solely on the confrontation between discourse and counter-discourse (Plantin 1996). • Hashtivism: a way of taking a stand digitally by including a hashtag in a message. All these hashtags lead Controversy is therefore at the heart of counter-discourse (Kerbrat-Orecchioni 1980), but must be based on a relevant to virtual demonstration. and informed process of analysis of extremist discourse in order to better dissect it (Amossy, Burger 2012). This • Online petition: a modality that has taken off in recent years. Barely used by ultra-right groups, it has been sociolinguistic and communicational requirement was at the heart of preventing extremism through discourse over the much used by supporters of “La Manif pour tous” (Protest for all, created to protest against gay marriage). last 10 years. • Trolling: a modality that consists in participating in open forums or political blogs by voluntarily initiating In 2018, in the 14531 report, the Council of Europe Assembly noted the exclusive choice to focus the counter- polemics with the sole aim of provoking other participants and generating reactions. propaganda actions solely on the counter-discourse. • Sockpuppeting: modality which is very common in ultra-right groups that consists in either opening fake web "The report concludes that counter-speeches against terrorist speeches are not enough. It is essential to develop a positive, pages of a personality to attack or participating in social media under a false identity (”faux nez” false nose preventive and credible alternative discourse at EU level, promoting common values and facilitating dialogue, encouraging in French) or spreading false information. awareness and dispelling misinformation" Council of Europe, April 19th, 2018. • Hacktivism: violent modality which consists in attacking with viruses, worms or other digital techniques to block, In order to do so, we first need to know what terrorist discourse is and what its components are. In the framework of destroy or hijack targeted sites. The extreme right-wing hacker "ORBIT" made headlines in Germany by the FLYER project (Artificial Intelligence for Analysing Extremist Content on the Internet) co-financed by the French hacking into accounts of political figures and making them public. National Research Agency in partnership with the Defence Innovation Agency (AID) (a service with national competence • Alternative media or re-information: a modality widely used by ultra-right groups to create independent attached to the Directorate General of Armaments (DGA)) we set ourselves an objective of identifying and developing media or parodies of well-known media to disseminate alternative or rectified information. The "Equality and a database containing two types of radicalisation (Salafist Jihadism and far-right wing). Our project intends to Reconciliation" site is to this day the first re-information site in France. characterise these discourses and develop, thanks to artificial intelligence, technological tools that will allow us to detect • Flash mobbing or action alert: modality which consists in using networks to quickly bring together followers these discourses on the Web and thus alert user services (police, defence, intelligence, access providers, hosts). Our of a movement to carry out an action (parody or not). methodology uses models and resources created by researchers in Social sciences, these models will allow us to • Crowd-enabled organisation: characterise content according to lexical, discursive and semantic criteria. The project adopts supervised adaptive modality that allows a quick gathering of people or an organisation of learning, a hybrid paradigm enhancing learning algorithms with adaptive mechanisms capable of acquiring new modalities bringing activists together. This technique is used by black blocks but also by many groups of descriptors, in order to recognise and follow the progression of concepts. The project thus implements advanced hooligans. functionalities for a systematic exploration of cyberspace. • Cop watching: a modality that consists in filming police without their knowledge and broadcasting these videos online. This technique is used in "yellow vest" demonstrations by activists. Multiple questions for a sensitive topic • Mail-bombing: one of the first modalities of digital political action, which consists of overflowing an institution's mailbox with e-mails in order to claim or block messaging service. 2.1- How to define the ultra-right? • Doxing: an act of hacking that consists in revealing personal data in order to harm people or denounce their actions. In order to produce relevant results it is necessary to answer several questions that will condition the way we work. In • Cyber graffiti: an act of hacking that consists in modifying the source code of a site in order to tag political this article we focus on one form of radicalisation that Europol in their 2019 report calls right-wing terrorism or extreme slogans or make the site inaccessible. right-wing radicalisation in order to distinguish it from lawful opinions, parties, public actions or movements claiming to • Phone-zap: an action that consists in flooding the switchboard with calls to a person or institution in order to be nationalist or right-wing extremist, but also to distinguish violent extremist actions from extremist opinions that are block the switchboard and make people talk about it. not illegal. The same report suggests characterising violent radical movements not only by terrorist acts but also by • Meme: a digital modality that consists in disseminating humorous hateful content on the Internet using violent and hateful behaviour expressed in the public space or in the media. It is therefore the public sphere and the photomontages, GIFs, short videos, recurring jokes and secret language on forums and community sites in order media that make it possible to identify radical extreme right groups. The scene of right-wing extremism varies to provoke a viral dissemination of information. considerably among EU Member States. There is a wide variety of spaces for expressions and viewing modes, but the “fachosphere” (fascist, reactionary sphere) includes all these groups in their various expressions: groups of National Socialists, neo-Nazis or neo-fascists, revisionists, racist or anti-Semitic groups, skinheads and extremist hooligans, nativists 2.2 What is discourse? or identity-based groups, paramilitary groups and xenophobic or anti-migrant groups. In order to identify and collect elements of ultra-right discourse, it is important not only to characterise these groups, Characterising the discourse of these groups faces a great methodological difficulty, since they operate in a nebulous but also to know the elements that give structure to a discourse. The French term “discours” (Eng. speech) in everyday and constantly interacting environment. We suggest focusing on the ultra-right groups as defined by Jean-Yves Camus language refers to an oral production carried out in front of an assembly or an audience, however in linguistics this (Camus J.Y., 2006) – groups or individuals who do not act within democratic structures or centralized organizations. term encompasses all written or oral productions of a person. The word discourse becomes very broad and seems They are most often in opposition to what they call the system or democracy. These groups have three characteristics similar to the definition in the “Littré” dictionary, which encompasses all the words used by a person. 2518 S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525

4 Author name / Procedia Computer Science 00 (2020) 000–000 These definitions are not useful for us because it is not a question of knowing and characterising all the productions of an ultra-group, but rather of identifying the types of discourse that have an objective of or an impact in the radicalisation of an individual. In this sense, we follow the analysis of Vincent D. (2005) for whom "language productions form a coherent whole that can only be interpreted through the superimposition of multiple layers of analysis; a laminate made according to modes of production that are both repetitive and unique at the same time, each interaction being seen as a structured and structuring social activity". For us, it is not a question of identifying right-wing extremist "discourse" but identifying right-wing extremist "discourses" not only by distinguishing them by their explicit or implicit contents but also by their strategic aims, their rhetorical characteristics and their typologies. Identifying, collecting and characterising right-wing extremist discourse in cyberspace therefore requires a threefold identification (contents, typologies, rhetoric).

2.3 What is the “fachosphere”? The involvement of extremist groups in cyberspace has existed since Web 1.0 with the first far-right political sites of the British National Party (BNP) (JACKSON, P. 2020). Each stage of the development of the Internet (wiki, social networks, alternative media, citizen journalism, mainstream media openness, participatory media) was accompanied by the development of a structured, significant, active presence of ultra-right political groups and an increasingly visible space called the “fachosphere”. The term "fachosphere" is criticised and other names are sometimes preferred (reactionary sphere, “reinfosphere” or “patriosphère” – “homeland sphere”) (Parliamentary Report, 2020), but everyone agrees that there is a space in the Internet for communication, propaganda, action, interaction, recruitment and coordination of extreme right-wing groups. This space is not limited to the tools of expression of formed political groups accepting or not the democratic voice, but it is inhabited by informal or non-formal groups, isolated or not isolated individuals, public or secret persons who broadcast speeches that are our subject of interest.

III. Methodological aspects This article presents a specific work of interdisciplinary cooperation that should be amplified and generalized in the work related to the phenomena of violence and in the work related to security. This research studies are based on two simultaneous projects: A) The PRACTICIES project – European project H2020 Security which allowed us to build a database of jihadist discourses (20000 sentences), to develop methodologies of construction of the database and principles of sentence labelling. B) The ongoing FLYER ANR project in which we are widening the field of radicalisation by the groups of French ultra-right. The primary purpose of FLYER is to develop artificial intelligence methods to analyse extremist content, messages and conversations on the Internet for French-speaking communities. These analysed contents will be coming from indoctrination sources, but also from discussion threads on digital platforms. FLYER aims to implement methods for extensive characterisation of online content in order to produce a rich description on the lexical, terminological and semantic level. The characterisation will highlight domain-specific concepts (radicalisation, right-wing extremism, violence, threat) and subjective commitment (support, rejection, preference, disagreement) and go beyond the limitations of keyword descriptions, in order to provide an in- depth understanding of extremist content online via highlighting characteristics of propaganda, incitement to violence or subjective commitment (support, disagreement, rejection) in relation to a specific topic or event. A second objective is to develop innovative methods for supervised adaptive learning. This is a hybrid paradigm of artificial intelligence, augmented learning algorithms with adaptive mechanisms enabling them to acquire new descriptors, in order to cope with the development in the field.

Our collective project has three technological obstacles: - The first obstacle consist in the definition and characterisation of important concepts to be taken into account when analysing extremist content online, that are linked to a constellation of notions such as: extremism, terrorism, online radicalisation, violent speech or online hate. - A second obstacle concerns the characterisation of digital content (messages, conversations) from the lexical, discursive and semantic perspective, in order to provide an extensive understanding of extremist content online. - The third technical obstacle stems from the dynamics of extremist content. From a content analysis perspective, extremist content on the Internet is often associated with weak signals, with rapid rise and variable noise levels. The main challenge is to develop models capable of capturing and describing conversations and messages conveying extremist ideas, while adapting to their constant development.

S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 2519

Author name / Procedia Computer Science 00 (2020) 000–000 5 In this article we present a method of building a base targeted at automation and detection. Conducting a research in the “fachosphere” presents several methodological characteristics that are important to note. First of all, there is a strong ramification between sites, blogs, contents, authors and ideology propagators, therefore it is difficult to clarify the links between groups. It is possible to characterise and identify this ramification on the basis of events common to several groups, identical publications shared on different sites, names of authors or propagators found on different sites, or links to partner sites. Similarities are also found in common symbols (the fleur-de-lis for the royalists, the symbol of Marianne, the colours of the French flag...). Our research has also shown that some authors and propagators are initiating several groups. They thus appear through publications on several sites. Moreover, there is a desire to conceal the content which sometimes makes it difficult to identify the ideological positioning and characterisation of the website: white supremacy, anti-capitalism and anti-globalisation, xenophobia, opposition to immigration, anti-Zionist.

On the other hand, we note identical shapes in several structural points. Indeed in these sites we can see a disciplined discourse concerning political parties and movements but also we can almost always observe a questioning of the media (disinformation) and a constant reference to the conspiracy theory characterised by a will to revise History and to question it (revisionism). Some of the sites have in common the fight against the "Islamisation of France" with the revival of Muslim concepts that are extensively analysed, very profound Islamophobia and mockery of Muslim beliefs.

When we analyse the sites we note a great importance of the claims. Whether they are "showcase sites", blogs, press reviews passing on general information, there is a willingness to make demands very often tinged with political opposition to the government in power. Numerous national symbols are highlighted (the colours of the flag, the symbol of Marianne, images of the revolution...) and the history of France and especially the Algerian war is a redundant subject with a nostalgia for colonialism.

Audience figures are key elements on these sites. Sometimes they take shape of a tally at the top of the page adding up "unique" users of the site and sometimes the figures are mentioned directly in the text content. The audience serves as a guarantee for the legitimacy of the site and the publications, the same goes for the date of the site's launch.

We find similarities in the expressions used on different sites as well as the use of expressions that reveal a line of argument that is often of a racist nature: "native French", "non-European population", “non-native”. Some groups clearly state a close link between migration or religion and criminality (sexual abuse, harassment, theft, etc.). Finally, it is often very difficult to identify the perpetrators (lack of signature, use of pseudonym). The more extremist and violent the content is, the more difficult it is to trace the identity of the perpetrators and propagators.

IV. Methodology for discourse characterisation The process of building a far-right database is underway. It will follow the methodology already experimented in the European project PRACTICIES that we led. This methodology allowed us to build a database of Salafist jihadist discourses which then fed a data characterisation tool thanks to a software architecture built from tools previously developed by the Gradiant company (Spain). In this section we show the methodology which is based on a disciplinary interaction between sociologists able to decipher (label) contents as radical or non-radical, linguists able to characterise the discourses and help us find the data repositories, and computer scientists specialised in artificial intelligence able to build a learning system to learn from manual databases and to build a crawler able to find and characterise new data. The objective of the cooperation is to build an interface for the identification of discourses allowing the detection of these radical discourses and thus to better prevent the spread of these discourses and this way to prevent the recruitments that they trigger.

4.1 Description of the tools and technical logic: (technology from the PRACTICIES Project, Project #: 740072)

The Database Tagger system is divided into three main components, explained below: - Data persistence layer (Database). - RESTful Web Service (Server). - Front end application (Client). 2520 S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 6 Author name / Procedia Computer Science 00 (2020) 000–000

Figure 1: System Architecture

The Database Tagger is a computer application used to automate and facilitate the process of ascription of a dataset. The content of these sets of data can be text or image tagged with requested tags associated with the analysis tools. The data sets required for the development of the text analysis module can be created within this tool. In addition, it will allow these datasets to be expanded in the future by easily adding and tagging more texts, thereby contribute to improving the performance and reliability of the database and developing analysis tools. The persistence layer contains all information related to the Database Tagger such as tagging, projects, users, text and image datasets or tagging results.

• MySQL was chosen as the data model for the application because it is fully relational. • The RESTful web service includes the back-end of the tagging application and the user's login. • The web service is implemented using the Dropwizard framework and is connected to the data persistence layer using the Hibernate†

The database feeds a search and speech characterisation engine (a crawler). The objective of the crawler is to generate probable interactions between users, taking into account certain contexts. The crawler works with a model that is built with data extracted from the database containing manually characterised data. This database of labelled discourse must be provided during a training phase.

† The main technologies used to develop the database tagger are explained in the next subsections. Angular: Angular (https://angularjs.org/) is an open-source (MIT license) structural framework for dynamic web applications mainly maintained by Google. It aims to simplify both the development and the testing of web applications by providing a framework for client-side model-view-controller (MVC) architecture. Hibernate ORM: Hibernate (http://hibernate.org/orm/) Object/Relational Mapping (ORM) is an open-source (GNU Lesser General Public License) idiomatic persistence for Java and relational databases. It provides a framework for mapping an object-oriented domain model to a relational database. MySQL: MySQL (https://www.mysql.com/) is an open-source (GNU General Public License) relational database management system (RDBMS). Dropwizard: Dropwizard (http://www.dropwizard.io) is an open-source (Apache license) Java framework for developing RESTful web services. S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 2521 Author name / Procedia Computer Science 00 (2020) 000–000 7

The crawler has a usual set-up and implementation phase. During the implementation phase, the Data Preparation Module takes the tagged discourses and stores them in a knowledge database. This step is only performed if the configuration does not refer to an existing knowledge base (for example, the first time the system is used). Once this step is completed, the system is ready to receive queries (normal implementation phase). The generation of responses takes into account the user's request and navigates efficiently through the knowledge website to find a suitable response. This operation is performed each time the user queries the database using the REST API. The knowledge database is read-only. Therefore, it cannot be updated with new samples and the database must be built from scratch.

4.2. Text analysis The objective of the text analysis modules is to analyse the text in order to find radical content. Three provided core modules are: 1) a deep learning classification engine to tag posts, paragraphs or sentences as radical/not radical, 2) a clustering algorithm that finds similar content and 3) a pattern search tool to extract patterns that might be indicative of suspicious activities. All the modules are flexible and can be adapted to different targets of radicalisation detection. This section contains the architecture of the classification core module and the interconnection of the different components. Figure 14 shows the architecture of the system. The classification model receives a list of texts as input and outputs a classification tag for each processed text. The data preparation model is in charge of translating the input (received as a list of texts) to the format of the classification engine. The word dictionary is a list of words (that is created by the system during training). The Model Handler is the core (written in Pytorch13) and it has a functionality to encase the classification engine and to train, evaluate and test the system. The PT model is the collection of parameters and hyperparameters of the model. The PT model and the Word Dictionary are created during the training phase.

Figure 2 : Architecture of the classification engine

The model was trained and validated using the Wikipedia Talk Labels dataset14. This dataset has two classes (Aggressive and Nonaggressive) with more samples from the Nonaggressive class (15% of the samples are tagged as Aggressive). The input must be limited to 500 words. If the text exceeds this size, it will be automatically cut to the first 500 words.

4.3. Clustering This module receives documents as input and outputs the cluster in which the document is included. The algorithm tries to assemble the documents. The architecture of the clustering algorithm is shown in Figure 5. The vectorizer transforms the original document in a vector. The similar content search finds potential similar documents to the document queried (using the recent history). The aggregator of the content of the discussion thread actually makes the comparison among the query and the candidates and outputs the cluster (if the document is similar to the documents already in it) or creates a new one.

2522 S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 8 Author name / Procedia Computer Science 00 (2020) 000–000

Figure 3. Architecture of the clustering engine

The limit of the text to be analysed is 400 characters. This algorithm works better with short messages and thus we impose to limit the input. Text bigger than this size are cut to the first 400 characters.

4.4. Pattern detection This module detects patterns of suspicious language in messages. The patterns use lexical features (lemmas, PoS), syntactic features (from a dependency parser) to identify parts of the text that match the pattern. It also uses related terms to augment the initial set of patterns provided by the user. The architecture of the model is shown in Figure 6. The Pattern Builder takes the set of patterns. If some pattern contains lemmas, it search for related terms using the Related Terms Search block (only if a word embedding file is provided in the configuration file).

Figure 4. Architecture of the pattern detection module

Each pattern is specified as a Json object with the following fields: • “meta”: is the category of the pattern. It can be shared among different patterns. S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 2523 8 Author name / Procedia Computer Science 00 (2020) 000–000 Author name / Procedia Computer Science 00 (2020) 000–000 9 • “name”: unique name of the pattern. If multiple patterns use the same id an exception is raised. If patterns are created automatically using the related terms functionality, those patterns will create a unique name using the original pattern as base. • “pattern”: an array of Json objects with the pattern. The Json objects follow the same format of the Spacy Rule-Matcher15. An example of pattern with the right format is: { "pattern": [{"LEMMA": "buy"}, {"POS": "DET", "OP": "?"}, {"DEP": "dobj", "LEMMA": "passport"}], "lang": "en", "name" : "buy_passport", "meta": "passport" }

4.5. Behaviour analysis A module for behaviour analysis has been developed in order to automatically identify anomalous patterns in the performance of publishers. These abnormal behaviours can provide valuable insights about emerging radicalisation procedures of users. The module collects data regarding the publication dates of different authors, creating different data series Figure 3. Architecture of the clustering engine representing the publication patterns of each user over time. Aggregations with the number of publications for each

user are performed both monthly and daily. This time series are analysed and modelled with the aim of statistically The limit of the text to be analysed is 400 characters. This algorithm works better with short messages and thus we impose to limit the input. Text bigger than this size are cut to the first 400 characters. characterise and detect the following abnormal behaviours: 1) Pattern 1: Number of daily publications above a normal threshold that users have made in certain days. These 4.4. Pattern detection anomalies represent isolated peaks of both high and strange activity; This module detects patterns of suspicious language in messages. The patterns use lexical features (lemmas, PoS), 2) Pattern 2: Anomalous periods of silence, that is, significantly long time intervals in which users have not had syntactic features (from a dependency parser) to identify parts of the text that match the pattern. It also uses related any activity in web forums (i.e., users have not published anything); terms to augment the initial set of patterns provided by the user. The architecture of the model is shown in Figure 6. The 3) Pattern 3: Anomalous activity periods, that is, significantly long time intervals in which users have had Pattern Builder takes the set of patterns. If some pattern contains lemmas, it search for related terms using the Related significantly unusual activity (i.e., users have made several consecutive publications); Terms Search block (only if a word embedding file is provided in the configuration file). 4) Pattern 4: Time periods with unusual behaviour if authors follow a monthly pattern in their publications.

V. The digital far-right When we want to structure a representation of extreme right-wing groups present on the Internet it is necessary to examine several principles that will determine whether or not a group’s or party’s sites will be included in this radical space of digital far right, but also what kind of discourse can we collect on these sites. The first principle is to determine the boundary of this space and thus determine what kind of relationship these groups have with the democratic political space. Indeed, out of the 20 identified extreme right-wing political parties ‡, only 8 have candidates for certain elections§. We have therefore chosen to focus on the more radical groups. The second principle is to take a temporal reference because these groups are very mobile. They emerge and disappear fast. We have therefore chosen to work on the last 10 years in particular to take into account the sharp rise in terrorism and the radicalisation of political discourse during this period. The third principle was not to focus only (GUESPIN, L. 1976) on alternative information sites or the sites of extreme right-wing known from the media, but to work on three discursive levels. Indeed, between the generic discourse conveyed by a politician, a political commentator, an intellectual, the discourse disseminated by a journalist, an intellectual, a high-profile personality and the integrated discourse conveyed by a grassroots activist, a follower of the movements, there are many useful variations to understand these discourses. We have therefore defined three levels of discourse:

A) Generic discourses: We intend to identify on the Internet the founding discourses of extreme right-wing ideologies in order to be able to distinguish the differences between the discourse that creates a concept or an analysis that will then be appropriated or disseminated. For example, the notion of the great replacement that is used by xenophobic, nativist, nationalist groups finds its origins at the end of the 19th century and during the 20th century in anti-Semitic and racist writings. This theory will fuel the neo-Nazi ovements. This thesis is reintroduced by , a French extreme right-wing writer.

Figure 4. Architecture of the pattern detection module

Each pattern is specified as a Json object with the following fields: ‡ • “meta”: is the category of the pattern. It can be shared among different patterns. , Mouvement national républicain, Rassemblement national , populaire, Rassemblement National, Adsav, Civitas, Debout la France, Dissidence française, , Ligue du Midi, Ligue du Sud (France), Parti communautaire national-européen, Parti de la France, Les Patriotes, Souveraineté, identité et libertés § Rassemblement national, Debout la France, Les Patriotes, Souveraineté, identité et libertés, Parti de la France, Mouvement national républicain, Parti nationaliste français, Rassemblement pour la France 2524 S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525

10 Author name / Procedia Computer Science 00 (2020) 000–000 B) Disseminated discourses: We then intend to identify the discourses held by the media relays, the followers of the ideas, the associated intellectuals. These discourses have are targeted on making impact, gaining support, increasing communication and studying these variations is useful to better characterise the ideas and the rhetoric. Eric Zemmour avoids approaching the notion of the great replacement in its xenophobic or conspiratorial dimension of a programmed invasion of foreigners, but presents it rather as an invisible process that gradually makes his culture and the notion of homeland disappear. We can therefore see that while Renaud Camus speaks of a deliberate process of substitution of the French population by a non-European population, insisting on the xenophobic and racist dimension which is underpinned by a determined and determining choice of the disappearance of values, Zemmour insists on the consequences of an immigration process. He insists on the unanticipated consequences of submergence. C) Integrated discourses: Here we intend to identify Internet users, the targets of such propaganda, contributions from the public and the appropriated words of ultra-right discourse.

The fourth principle is to differentiate between discursive genres that are often related to the types of sites or digital spaces. Indeed, the propagators of ultra-right discourse adapt the discourse into a conversational, judicial, deliberative, epideictic genre. These genres have different effects on users and are constructed with an increasingly strategic quest for conviction..

5.1 Map of digital ultra-right: To construct a representation of the different ultra-right groups we did not use an analysis of political positions or forms of political action used by the groups. Using textual analysis tools (IRAMUTEQ) and sentence base tagging collected from 56 analysed sites, we chose to identify the groups by their internal relations and by their oppositions based on automated textual analysis. This work was based on a random sampling of the discourses made by the initiators, the texts of the supporters and the key concepts developed by the group on the first page of their websites. After analysis we can therefore (see diagram below) differentiate 4 families allowing us to identify the thematic orientations of ultra- right groups active on the Net.

Figure 5 : Mapping of the groups of the digital ultra-right

• Fundamentalists: The digital family of fundamentalists largely consists of groups or individuals active in networks and on the web on the basis of adherence to an original purity of faith, identity, mode of society. S. Alava et al. / Procedia Computer Science 176 (2020) 2515–2525 2525

Author name / Procedia Computer Science 00 (2020) 000–000 11 These groups defend an original vision of the Christian faith (fundamentalist Catholics), of identity linked to birth (nativists), of culture linked to racial origin (identities) or to the ancient or traditional way of life (reactionaries). For these groups, the important thing on the Internet is to value the truth, the justice, the purity, which is today endangered by mixing, relativity, acceptance of interculturality. Fundamentalists know where they come from, who they are and want to take their place in a relativist, Europeanist and globalist virtual world.

• Defenders: The family of defenders consists of groups, sites or posts on social media that identify a significant threat to the original identity, core culture, personal values of the original group. Before these potential dangers, the groups will produce a discourse of victims and danger. We find in this family the conspirators who denounce lies and therefore the manipulation of individuals, the xenophobes who denounce foreigners, the antifeminists who denounce threats against virility, the survivalists who denounce pollution and the programmed end of life, the ultra-secular who denounce the dictatorship of the religion.

• Nostalgic: This family consists of groups and individuals who act in reaction to a danger to value a time, a field, a life-saving ideology. Whether these groups are royalists, neo-Nazis, followers of fascism, or nationalists, they express a nostalgia and an expectation of the establishment of a new order based on old protective values.

• Fighters: The families of the fighters live in the urgency of a fight they know is there and they don't want to lose. Their aim is to call for action by denouncing the attacks of the scapegoats they find or by interpreting the warlike form of the facts that represent for them the symptoms of the urgency of the fight. Foreigners, Muslims, Arabs, Jews, women and the opposing team are, according to them, almost immediate concrete threats. It is no longer time to talk, to argue, it is time to act, and their rhetoric is the one of command or planning.

VI. Bibliography

1. ALAVA, S. FRAU-MEIGS, D. HASSAN, G. (2017). Social media and radicalisation leading to violent youth extremism: UNESCO Report. Directorate of Information and Communication - UNESCO. 2. ALLARD, Claude. (2007). L’activisme politique contemporain : défection, expressivisme, expérimentation ». Rue Descartes, n° 55, février 3. AMOSSY R. (2010), L'argumentation dans le discours, Paris, Colin. 4. CAMUS, J-Y. (2006). Extremisms in France: should we be afraid of them?. Toulouse : Éditions Milan 5. CASTELLI GATTINARA, P. & FROJO, C. (2018). Quand les identitaires font la une: Stratégies de mobilisation et visibilité médiatique du bloc identitaire. Revue française de science politique, vol. 68(1), 103-119 6. DEBRAY, R. (2002). Passing to infinity. Cahiers de médiologie N°13 La scène terroriste Coordinated by Catherine Bertho Lavenir, and François-Bernard Huyghe. Pp. 3-13. 7. DUPIN, E. (2017). La France identitaire : Enquête sur la réaction qui vient. Editions La Découverte 8. GUESPIN, L. (1976). Types de discours ou fonctionnements ? Langages, No 41, Typologie du discours politique. 9. HUGHES, F-B (2011). Terrorism Violence and propaganda. Publisher: Gallimard Découvertes 10. JACKSON P. (2020) Pioneers of World Wide Web Fascism: the British far right and Web 1.0. In: Littler M., Lee B. (eds) Digital Extremisms. Palgrave Studies in Cybercrime and Cybersecurity. Palgrave Macmillan, Cham 11. KERBRAT-ORECCHIONI C. (1980). L'énonciation de la subjectivité dans le langage, Paris, Armand Colin. 12. PRUNEAU, Charlyne. (2014). Les réseaux de groupes français d’extrême droite sur Facebook : Note de recherche N°22. Chaire de recherche du Canada en Sécurité et Technologie : Universite de Montréal 13. VINCENT D. (2005). Conversational analysis, discourse analysis and interpretation of social discourse: the case of radio trash. Marges linguistiques, number 9, May 2005