arXiv:2103.00242v1 [cs.CL] 27 Feb 2021 media-platforms-peak-points.html aint ohr.Ti silsrtdi h entosof definitions the in illustrated is This disinfo of harm. intention do underlying the to by Therein, mation . differ and two mis- detection. latter of The phenomena stance the namely both and consider this, we – achieve task to (NLP) Processing and needed Language Natural rumors, underlying news, core the fake f detecting including – checking, online information false detecting matically disinforma ever and the mis- of and of us tion. detection media reminding effective mainstream for flag need in red growing another both yet content, raising dubious online, of flood a nteps eae hr a enarpdgot npop- Twit- in Facebook, growth as rapid such Parler. a and platforms been Reddit media ter, has social there of decade, ularity past the In Introduction 1 sBei n h Speieta lcin swl sthe in- as an well brought as that election, pandemic COVID-19 fodemic presidential the US of the emergence and Brexit as nti ok eeaieterltosi ewe auto- between relationship the examine we work, this In ∗ 1 https://www.digitalinformationworld.com/2021/02/soc Author Contact n nls xsigwr nti ra eoedis- challenges. before future area, and learnt this lessons in review cussing We work survey. existing this analyse of and focus the view- is holistic which a point, from mis- detection and examining disinformation detection and survey stance no between relationship is the there and mining analysis, so- argumentation sentiment related as such other efforts tasks with prior media cial been detection have stance there contrast in While to task a right. as own (b) its or detecting component claims; and a fact-checked detection, previously as rumour (a) fact-checking, including: of ways, different framed in been has detection Stance intent). malicious with deliberately spread it false, disinfor- (intentionally be mation or online, false) information (unintentionally false of detection the for known also texts, as in expressed attitudes Detecting uvyo tneDtcinfrMs n iifrainIde Disinformation and Mis- for Detection Stance on Survey A tnedetection stance oci Hardalov Momchil [ lme l,2020 al., et Alam 1 a eoea motn task important an become has , oevr otoesa vnssuch events controversial Moreover, Abstract ] 4 { ihi.Ti,i un a e to led has turn, in This, it. with aa optn eerhIsiue BU oa Qatar Doha, HBKU, Institute, Research Computing Qatar oci,anv rsa.ao,isabelle preslav.nakov, arnav, momchil, 2 1 oaUiest S.KietOrdk” Bulgaria Ohridski”, Kliment “St. University Sofia , 2 ∗ , ra Arora Arnav 3 nvriyo oehgn Denmark Copenhagen, of University 1 hcSe Research CheckStep ial- 1 act- , 3 r- , - rsa Nakov Preslav ennso ifrn ei outlets media different of leanings e products new ad ic fifraincnb oefltool de- powerful idealogical issue a particular understanding be a like can of tasks of information bates variety of a piece for a wards will we approach the survey. also this is in This adopt intent). thi harmful whether with of (regardless prob- done information same false the of of spread treati part lem: thus as factuality, disinformation on th and focused Thus, misinformation has prove. to work hard of very majority also vast is it important, very is harm rumours. or theories conspiracy intentionally ated also and context, audio/visual manipulated nls .. npltcldebates political different in from studied e.g., been has detection angles, stance of task The o fact-checking for Information-Disorder-Venn-Diagram.png hr ifrn ye ftx aebe tde,includ- studied, been have posts text media of social types ing different ther, hs oin yCar adefo is Draft, First tion from Wardle Claire by notions these articles in,dts ttsis rnltos rwe aiei t is satire when or translations, seriously. statistics, dates, tions, xrse ydfeetatr aebe considered, been politicians have actors as different such by expressed tnedtcin uig tal. et Zubiaga detection. stance oeto natmtdsse hc sssac sfeatures as stance fact- uses com- the which a in system as automated aid or an (to of gather misinformation), claim ponent to studying a – or towards task process false texts standalone checking of or a detection users as the of be in stances could role This a play information. task the of formulations general. in task tection sacmoeto h uorvrfiainppln,and pipeline, verification rumour the of stance K¨uc¸¨uk component discussed they Can a and where as prediction, veracity rumour on web journalists eetn n grgtn h xrse tne to- stances expressed the aggregating and Detecting hr aebe opeo eetsresrltdto related surveys recent of couple a been have There oee,teei oeitn vriwo o different how of overview existing no is there However, 2 http://firstdraftnews.org/wp-content/uploads/2018/07 s“ is [ ecysie l,2017 al., et Derczynski [ nnetoa itkssc siacrt ht cap- photo inaccurate as such mistakes unintentional aa n g 2014 Ng, and Hasan [ ,and ”, oela n a,2017 Rao, and Pomerleau } @checkstep.com [ aslwk ta. 2019 al., et Hanselowski 1 , [ disinformation 4 [ 2020 oaudrnadWee 2009 Wiebe, and Somasundaran and [ ] hree l,2018 al., et Thorne [ ieahlsi iwo h tnede- stance the on view holistic a give hrf ta. 2020 al., et Shurafa sbleAugenstein Isabelle [ uig ta. 2016a al., et Zubiaga [ ] ono n odasr 2016 Goldwasser, and Johnson ] ahrn ifrn frames different gathering , . s“ is [ arctdo deliberately or fabricated 2018a ] .Wieteitn odo to intent the While ”. ial,stances Finally, . ntification [ [ aenle l,2018 al., et Habernal ] tfnve l,2020 al., et Stefanov sr nthe on users , ] ] rregarding or , ] rsn survey a present rdetermining or 2 ] misinforma- 1 ] , n news and Fur- . 3 /Types-of- aken cre- is s ng ] ] ] e . , , Dataset Source(s) Target Context Evidence #Instances Task English Datasets Rumour Has It [Qazvinian et al., 2011] 7 Topic Tweet ) 10K Rumours PHEME [Zubiaga et al., 2016a] 7 Claim Tweet : 7.5K Rumours Emergent [Ferreira and Vlachos, 2016] nj Headline Article∗ ) 2.6K Rumours FNC-1 [Pomerleau and Rao, 2017] nj Headline Article q 75K RumourEval ’17 [Derczynski et al., 2017] 7 Implicit‡ Tweet : 7.1K Rumours FEVER [Thorne et al., 2018] ɀ Claim Facts ) 185K Fact-checking [Hanselowski et al., 2019] Snopes Claim Snippets ) 19.5K Fact-checking RumourEval ’19 [Gorrell et al., 2019] 7 \ Implicit‡ Post : 8.5K Rumours COVIDLies [Hossain et al., 2020] 7 Claim Tweet q 6.8K Misconceptions TabFact [Chen et al., 2020] ɀ Statement WikiTable ) 118K Fact-checking Non-English Datasets Arabic [Baly et al., 2018] nj Claim Document q 3K Fact-checking DAST (Danish) [Lillie et al., 2019] \ Submission Comment : 3K Rumour Croatian [Boˇsnjak and Karan, 2019] nj Title Comment q 0.9K Claim verifiability Arabic [Khouja, 2020] nj Claim Title q 3.8K Claim verification

Table 1: Key characteristics of the stance detection datasets for mis- and disinformation detection. #Instances denotes dataset size as a whole; the numbers are in thousands (K) and are rounded to the hundreds. ∗the article’s body is summarised. ‡the stance is expressed towards a topic, which is not present in the data. Sources: 7 Twitter, nj News, ɀikipedia, \ Reddit. Evidence: q Single, ) Multiple, : Thread. for determining veracity. With this survey, we aim to bridge (ii) Emotion Recognition, where the goal is to recognise emo- this gap, present some emerging trends from this space and tions such as love, anger, sadness, etc. in the text; (iii) Per- discuss the challenges ahead. spective Identification, which aims to find the point-of-view of the author (e.g., Democrat vs. Republican) and the target 2 What is Stance? is always explicit; (iv) Sarcasm Detection, where the interest is in satirical or ironic pieces of text, which are often written In order to understand the task of stance detection, we first with the intent of ridicule or mockery; (v) Sentiment Analysis, provide definitions of stance and the stance-taking process. which determines the polarity of a piece of text. Biber and Finegan (1988) define stance as the expression of a speaker’s standpoint and judgement towards a given 3 Stance and Factuality proposition. Further, Du Bois (2007) define stance as “A public act by a social actor, achieved dialogically through In this section, we discuss the different aspects of mis- overt communicative means, of simultaneously evaluating and disinformation identification, where stance detection has objects, positioning subjects (self and others), and align- been successfully applied, i.e., fake news detection, rumour ing with other subjects, with respect to any salient dimen- verification and debunking, misconception identification, and sion of the sociocultural field”, showing that the stance- fact-checking, both as a task on its own or as a component taking process is affected not only by one’s personal opin- of a pipeline. In Table 1, we provide an overview of the key ion, but also by other external factors such as cultural characteristics of the available datasets for each task. There, norms, roles in the institution of the family, etc. For the we include the source from which the data is collected, the purpose of this survey, we adopt the general definition of target towards which the stance is expressed in the provided stance detection from K¨uc¸¨uk and Can [2020]: “for an in- context. Further, we show the type of evidence: Single is a put in the form of a piece of text and a target pair, stance single document/fact, Multiple is multiple pieces of text evi- detection is a classification problem where the stance of dence, often facts or documents, Thread is a (conversational) the author of the text is sought in the form of a cate- sequence of posts or a discussion. The final column is the gory label from this set: Favor, Against, Neither. Oc- type of the target Task. casionally, the category label of Neutral is also added to the set of stance categories [Mohammad et al., 2016], 3.1 Fact-Checking as Stance Detection and the target may or may not be explicitly mentioned in As stance detection is the core task within fact-checking, the text [Augenstein et al., 2016a; Mohammad et al., 2016]. prior work has studied it in isolated, artificial task settings Note that the stance detection definitions and the label inven- – predicting the stance towards one or several documents. tories vary somewhat dependent on the target application (see Fact-Checking with One Evidence Document Section 3). Pomerleau and Rao [2017] organised the first Fake News Finally, stance detection can be distinguished from sev- Challenge3 (FNC-1) with the aim of automatically detecting eral other closely related NLP tasks: (i) Biased Language fake news. The goal was to detect the relatedness of a news Detection, where the existence of an inclination or tendency towards a particular perspective within a text is explored; 3http://www.fakenewschallenge.org/ article’s body to a headline (possibly from another news tification of mis- and disinformation, here we review its po- article), based on the stance that the former takes regarding tency to serve as a component in a larger automated pipeline. the latter. The possible categories are positive, negative, Rumors Stance detection can further be used for rumour discuss and unrelated. This is a standalone task, as it detection and debunking, where the stance of the crowd, the provides annotations only for the stance, and omits the actual media, or other sources towards a claim is used to determine “truth labels”; however, the system can be further integrated the veracity of a currently circulating story or a report of un- as a component of a fact-checking system. The motivation certain or doubtful factuality. More formally, for a pair of a behind creating a stance detection instead of a full-blown textual input and a rumour expressed as text, stance classi- fact-checking task was that with a successful stance detection fication means to determine the position of the text towards model, a human fact-checker would be able to enter a claim the rumour as a category label from the set Support, Deny, or a headline and instantly retrieve the top articles which Query, Comment. agree, disagree, or discuss the claim/headline in question. They could then look at the arguments for and against the This setup has been widely explored in the context of mi- [ ] claim, and use their human judgment and reasoning skills croblogs and . Qazvinian et al. 2011 started to assess the validity of the claim in question. Such a tool with five rumours and classified the user’s stance into five would enable human fact-checkers to be fast and effective. categories: endorse, deny, unrelated, question, neutral. This work is one of the first to demonstrate the feasibility of this Fact-Checking with Multiple Evidence Documents The task formulation; however, its limited size and the focus on FEVER [Thorne et al., 2018; Thorne et al., 2019] shared task assessing stance of single posts presented significant chal- was introduced in 2018 and extended in 2019, with the goal lenges in building real-world systems. Zubiaga et al. [2016a] of assessing the veracity of a claim based on a set of sup- took the task further by analysing how people orient to and porting statements from Wikipedia. However, claims can spread rumours on social media based on conversational be composite and can contain multiple (contradicting) state- threads. The study included rumour threads associated with ments, thus making multi-hop reasoning a required skill for nine newsworthy events, and users’ stance before and after solving the task. The authors offered claim–evidence pairs the rumours were confirmed or denied. Dungs et al. [2018] annotated into three categories: SUPPORTED, REFUTED, continued this line of research, but focused on the effective- and NO ENOUGH INFO. The last category includes claims ness of the stance to predict the veracity of the rumours. which are either too general or too specific, and thereby can- Hartmann et al. [2019] explored the flow of (dis-)information not be supported or refuted by the available information in on Twitter after the MH17 Plane Crash. Wikipedia. This kind of setup may help fact-checkers to un- Recently, RumourEval [Derczynski et al., 2017; derstand the decisions that the models made in their assess- Gorrell et al., 2019] was held as a sequence of shared ment of the veracityof a claim, or can navigatea humanto the tasks for automated claim validation. The work aimed to final judgement. The second edition (2019) of the task eval- identify and to handle rumours based on user reactions and uated how robust the models are with respect to adversarial ensuing conversations in social media. The tasks offered attacks, where the participants were tasked with building new annotations for both stance and veracity. Both the 2017 examples to “break” the existing models, and then to propose and 2019 competitions were similar in spirit: the 2019 one “fixes” in order to improve the system robustness to such at- extended the task with more tweets and also Reddit posts. tacks. This work showed the importance of modeling the discourse Hanselowski et al. [2019] presented a task constructed 4 around a story instead of drawing conclusions based on a from manually fact-checked claims on the Snopes fact- single post. checking portal. For this task, a model has to predict the Ferreira and Vlachos [2016] focused on debunking ru- stance of evidence sentences from articles written by jour- mours based on news articles as part of the Emergent5 project. nalists towards claims. In contrast to FEVER, the task does They collected a set of claims and news articles from ru- not require multi-hop reasoning. mour sites with annotations both for stance and for veracity, Chen et al. [2020] focused on verifying claims using tab- done by journalists. The goal was to leverage the stance of ular data. The TabFact dataset was generated by human a news article (summarized into a single sentence) regarding annotators who created positive and negative statements the claim as one of the componentsused to determineits over- about Wikipedia tables. Solving the task requires two dif- all veracity. A downside of this approach is the need of sum- ferent forms of reasoning in the statement: (i) linguistic, marisation in contrast to FNC-1 [Pomerleau and Rao, 2017], i.e., semantic-level understanding, and (ii) symbolic, i.e., ex- where entire news articles were used. ecution on the tables’ structure. Misconceptions Hossain et al. [2020] explored detection 3.2 Stance as a (Mis-/Dis-)information Detection of misinformation related to COVID, based on a set of known Component misconceptions listed in Wikipedia6. In particular, they eval- Fully automated systems can assist in gauging the extent, and uated the veracity of a tweet depending on whether it agrees, studying the spread, of false information being propagated disagrees,orhas no stance with respect to a subset of miscon- online. Hence, in contrast to the previously discussed appli- ceptions most relevant to it. This may allow fact-checkers to cations of stance detection – as a stand-alone system for iden- 5http://www.emergent.info/ 4https://www.snopes.com/ 6https://en.wikipedia.org/wiki/COVID-19 misinformation assess the veracity of dubious content in a convenient way by and freezing for FNC. The most important hyper-parameter evaluate the stance of a claim regarding an already checked turned out to be the learning rate, while freezing more lay- stories, known misconceptions, and facts. ers did not help. Mohtarami et al. [2018] worked on mitigat- Media profiling Another appealing aspect of ing the effects of irrelevant and noisy information on memory stance detection for fact-checking is media profiling. networks by learning a similarity matrix and a stance filtering Stefanov et al. [2020] explored the feasibility of an unsu- component applied at inference time. Moreover, they made pervised approach for identifying the political leanings of a small step towards explaining the stance of a given claim media outlets and influential people on Twitter based on by extracting meaningful snippets from evidence documents. their stance on controversial topics. They built clusters of Memory networks have also shown to be effective in a cross- users around core vocal ones based on their behaviour on lingual setting [Mohtarami et al., 2019]. Other noteworthy Twitter such as retweeting, using the procedure proposed approaches include combining bags of words with variational by Darwish et al. [2020]. This is an important step towards autoencoders [Augenstein et al., 2016b], and using bidirec- understanding media biases. tional conditional encoding when modeling the interaction The reliability of news media sources has been au- between the claim and the target [Augenstein et al., 2016a]. tomatically estimated based on their stance with respect The top-performing systems on to known manually fact-checked claims, without access FEVER [Yoneda et al., 2018; Nie et al., 2019] adopted a to gold labels for the overall medium-level factuality of state-of-the-art LSTM-based model for natural language reporting [Mukherjee and Weikum, 2015; Popat et al., 2017; inference, namely enhanced sequential inference model Popat et al., 2018]. The assumption is that reliable media (ESIM) [Chen et al., 2017]. Nieetal. [2019] proposed agree with true claims and disagree with false ones, while a neural semantic matching network, which came first for unreliable media, the situation is reversed. The trustwor- in the competition. They incorporated additional in- thiness of Web sources has also been studied from a Data formation such as page view frequency and WordNet Analytics perspective. For instance, Dong et al [2015] pro- features in addition to using pre-trained contextualized posed that a trustworthy source is one that contains very few embeddings. More recent approaches have used bi- false claims. directional attention [Li et al., 2018], a GPT language model [Malon, 2018] and graph neural networks [Zhou et al., 2019; Multiple languages Existing work for languages other than Atanasov et al., 2019]. Another notable idea is to use pre- English is scarce. All of the aforementioned research has trained language models as fact-checkers based on a masked focused only or primarily on English. Nevertheless, inter- language modelling objective [Lee et al., 2020]. est in stance detection for other languages has started to emerge. Baly et al. [2018] integrated stance detection and Threaded Stance Another setting where stance detec- fact-checking for Arabic. Khouja [2020] proposed a dataset tion can be applied to detect mis- and disinforma- for Arabic which matches the FEVER [Thorne et al., 2018] tion is in conversational threads [Zubiaga et al., 2016a; setup. Lillie et al. [2019] collected data for stance and for Derczynski et al., 2017; Gorrell et al., 2019]. In contrast to veracity from Danish Reddit threads, annotated using the the single task setup which ignores or does not provide addi- (S)upport, (D)eny, (Q)uery, (C)omment schema proposed tional context, here, important knowledge can be gained from by Zubiaga et al. [2016a]. Boˇsnjak and Karan [2019] worked the structure of the user interactions within conversational on stance detection, claim verification, and sentiment analysis threads. A common characteristic of the proposed methods of comments on articles in Croatian news. is to use tree-like structured models. Zubiaga et al. [2016b] explored Linear-Chain and Tree CRFs fed with posts en- 4 Methods coded with lexicon-based, content formatting, punctuation, Stance Detection (Single Task) Here, we discuss ap- and tweet formatting features. Kumar and Carley [2019] re- proaches for stance detection in the context of mis- and placed the CRFs with Binarised Constituency Tree LSTMs, disinformation detection without including extra tasks or and used pre-trained embeddings to encode the tweets. data. One line of research is the Fake News Chal- More recently, Tree [Ma and Gao, 2020] and Hierarchi- lenge [Pomerleau and Rao, 2017] setting. During the compe- cal [Yu et al., 2020] Transformers were proposed combining tition, the teams mostly developed the models with rich hand- both post- and thread-level representations for rumour de- crafted features such as words, word embeddings, and sen- bunking. Kochkina et al. [2017] split the conversations into timent lexica [Riedel et al., 2017; Hanselowski et al., 2018]. branches, and then modeled each branch using branched- Hanselowski et al. [2018] showed that the most impor- LSTM and hand-crafted features. Li et al. [2020] deviated tant group of features are lexical ones, followed by fea- from this structure by proposing modeling the conversations tures from topic models, while sentiment analysis did as a graph. Tianetal. [2020] showed that pre-training on not help. Ghanemetal. [2018] used lexical cue words. stance data helps build better representations for threaded More recently, Slovikovskaya and Attardi [2020] explored tweets for downstream rumour detection. Li et al. [2019a] the effectiveness of different types of transfer learning used an ensemble model for stance with features similar to with pre-trained Transformer models, significantly surpass- those in [Kochkina et al., 2017], which they further combined ing the results from previous approaches. Furthermore, with user credibility information, conversation structure, and Guderlei and Aßenmacher [2020] showed the robustness of other content-related features to predict the veracity of the ru- these models to batch size, learning rate, sequence length, mour. Zubiaga et al. [2018b] chained four different sequen- tial classifiers that showed state-of-the-art results on conver- Zubiaga et al. [2018a] considered a four-step tracking pro- sational threads. Li et al. [2019b] also leveraged user credi- cess as a pipeline for rumour resolution: (1) rumour detec- bility features in addition to other tweet-related features. Fi- tion, which, given a stream of claims, determines whether nally, the stance of a post might not be expressed directly they are worth verifying or they do not contain a rumour; towards the root of the thread, and thus the preceding posts (2) rumour tracking for finding relevant information about must be also taken into account [Gorrell et al., 2019]. the rumour using social media posts, sentence descriptions, and keywords; (3) stance classification to collect stances Multi-Dataset Learning Mixing data from different do- towards that rumour; and (4) veracity classification to ag- mains and sources can improve the robustness and the ability gregate the information from the tracking component, the of the models to generalise. However, setups which combine collected stances, and optionally other relevant information both mis- and disinformation identification with stance detec- about sources, users metadata, etc., to obtain a predicted truth tion, outlined in Section 3, vary in their annotation and label- value for the rumour. Possible methods which can be ap- ing schemes, making the combination of different datasets a plied at each step in the pipeline are discussed in more detail challenging task. in [Zubiaga et al., 2018a]. Earlier approaches focused only on the pre-training of models on multiple tasks, e.g., Fang et al. [2019] achieved state-of-the-art results on FNC-1 by fine-tuning the model on 5 Lessons Learnt and Challenges multiple tasks such as question answering, natural language Integration People question false information more and inference, etc., which are weakly related to stance detection. tend to affirm true information [Mendoza et al., 2010], and Recently, Schiller et al. [2020] proposed a stance detection thus, stance can play a vital role for verifying dubious con- benchmark to evaluate the robustness of stance models, in- tent. We argue that a tighter integration between stance and cluding some of the factuality-related datasets. They lever- fact-checking is needed. Stance can be expressed in different aged a pre-trained multi-task deep neural network (MT-DNN) forms, e.g., news articles, user posts, sentences in Wikipedia, model and continued its training on all datasets simultane- and Wiki tables, among others. All of these can guide hu- ously using multi-task learning, showing sizeable improve- man fact-checkers through the process of fact verification, ments over strong baselines trained on single datasets. and can point them to relevant evidence, which can help them 7 to make informed decisions pertaining to the factuality of a Systems Popat et al. [2018] proposed CredEye , a sys- claim. Moreover, the wisdom of the crowd can be a pow- tem for automatic credibility assessment. It takes a erful instrument in the fight against mis- and disinformation, claim as an input and analyses its credibility by consid- although it should be noted that vocal minorities can derail ering relevant articles from the Web. The underlying ap- public discourse. Nevertheless, these risks can be mitigated [ ] proach Popat et al., 2017 combines the predicted stance of by taking into account the credibility of the user or of the in- the articles regarding the claim with linguistic features to ob- formation source, which can be done automatically or with tain a credibility score. the help of human fact-checkers. Nadeem et al. [2019] developed FAKTA, an system for au- tomatic end-to-end fact-checking of claims. It retrieves rel- Explainability is an import aspect in stance detection re- evant articles from Wikipedia and selected media sources, search, especially in the context of its application to mis- which are used for verification. FAKTA uses a stance detec- and disinformation detection. Moreover, explainability is a tion model, trained in a FEVER setting, to predict the stance crucial step towards adopting fully automated fact-checking. and to obtain entailed spans. These predictions, combined In terms of previous work, FEVER 2.0 [Thorne et al., 2019] with linguistic analysis, are used to provide both document- may be viewed as a step towards obtaining such explana- and sentence-level explanations and a factuality score. tions. Specifically, there have been efforts to identify ad- Wen et al. [2018] worked on cross-lingual cross-platform versarial triggers, which offer explanations of vulnerabili- rumour verification. They included multimodal content from ties at the model level [Atanasova et al., 2020b]. However, fake and from real posts with images or videos shared on FEVER is artificially created and is limited to Wikipedia, Twitter. For this purpose, they collected supporting docu- which may not reflect real-world settings. To mitigate ments from two search engines, Google and Baidu, which this, explanation by professional journalists can be found on they then used for veracity evaluation. They considered fact-checking websites, and can be further combined with posts in two languages, English and Chinese. However, they stance detection components in an automated system. A trained the stance model on English data (FNC-1) using pre- step in this direction is [Atanasova et al., 2020a], who gen- trained multilingual sentence embeddings, and is further used erated natural language explanations for claims from Politi- cross-platform features for training a neural network model. Fact given gold evidence document summaries by journalists. [ Nguyen et al. [2020] proposed the Factual News Graph Other existing systems Popat et al., 2017; Popat et al., 2018; ] (FANG) model, which models the social context for fake Nadeem et al., 2019 offer explanations to a more limited ex- news detection. In particular, FANG uses the stance of user tent, highlighting span overlaps between the target text and comments with respect to the target news article, and in addi- the evidence documents. Overall, a more holistic and realis- tion also temporal information, user-user interactions, article- tic setting for generating explanations of how fact-checking source interactions, and source reliability information. models arrived at its prediction is still needed. Dataset Sizes Another major limitation is the size of ex- 7https://gate.d5.mpi-inf.mpg.de/credeye/ isting stance detection datasets. The vast majority of such datasets contain at most a few thousand examples. Con- ing those into a stance detection pipeline, while challenging, trasted with the related task of Natural Language Inference, paves the way towards a robust detection process. where datasets such as SNLI of more than half a million sam- ples have been collected, this is far from optimal. Never- theless, recent advances in transfer learning and the release 6 Future Trends of large pre-trained models such as ELMo, GPT*, BERT, and RoBERTa have eased the efforts to train well-performing Multimodality Spreading mis- and disinformation through systems on smaller datasets. Furthermore, GPT-3 showed multiple modalities is becoming increasingly popular. One a remarkable success in few-shot learning, but at the ex- such example of this are DeepFakes, i.e., synthetically pense of training a billion parameter model. Recent research created images or videos, in which (usually) the face of showed that smaller models can also be good few-shot learn- one person is replaced with another person’s. Another ers [Rethmeier and Augenstein, 2020]. such example are information propagation techniques such Dataset Biases An additional challenge is that existing as memetic warfare, i.e., the use of for informa- datasets also have certain potentially undesirable biases. tion warfare. Both DeepFakes and memetic warfare re- Datasets constructed from naturally occurring data in fact- quire the modelling of not only text, but also of other checking portals, such as Augenstein et al. [2019] exhibit a modalities. Some work in this area is on fake news long-tail distribution when it comes to which entities are men- detection for images [Nakamura et al., 2020], claim verifi- tioned in the claims. Moreover, due to the nature of which cation for images [Zlatkova et al., 2019], or searching for claims are deemed to be worthy for fact-checking, some en- fact-checked information to alleviate the spread of fake tities can be mentioned in claims with predominantly one news [Vo and Lee, 2020]. veracity class. This, in turn, could yield biased models and biased decisions when such models are put in practical use. Data Mixing As we previously discussed (see Section 3), Another option is to use artificially constructed datasets such task definitions and label inventories vary between different as FEVER, which are also biased as the way the claims and applications of stance detection for mis- and disinformation the evidence statements are expressed is not representative detection. This is not specific to this context, but it is a of naturally occurring claims, and it is likely that the types known challenge in stance detection [K¨uc¸¨uk and Can, 2020], of claims are not reflective of check-worthy claims either as discussed in [Schiller et al., 2020]. Nevertheless, large- [Wright and Augenstein, 2020]. As biases in datasets are un- scale studies of approaches that aim to leverage the relation- avoidable, it might instead be worth noting what the intended ships between the label inventories, or the similarity between application for them is [Waseem et al., 2020]. datasets are still to come. One promising direction is the use of label embeddings [Augenstein et al., 2018], as they offer a Granular Stance As research in stance detection has convenient way to learn the interaction between disjoint label evolved, so has the definition of the task and the label in- sets that carry semantic relations. ventories. As shown in Section 3, the labels can vary based on the use case and the setting they’re used in. Much of Weighing Stances When evaluating the factuality of a the literature adopt a variant of the Favour, Against, Neither piece of information, the nuance in the strength of the ex- labels, or an extended schema such as (S)upport, (Q)uery, pressed stance by an actor is another indicator that should be (D)eny (C)omment [Mohammad et al., 2016], however that is taken into consideration in the decision-making process. In not enough to accurately assessing stance, as it would need practice, this can be subjective, which is why annotations for ascertaining the strength with which the target is supportedor the strength of agreement are scarce. Moreover, a complex refuted. In case of neutral, detecting whether that is the case system should weigh sources based on their known biases and because of conflicting arguments or because of truly neutral trustfulness. stance is another challenge. Modelling Context Modelling the context stance is ex- Multilinguality Finally, we argue for the importance for pressed in is a particularly important, yet challenging fact-checking systems to be language-agnostic for several task. In many cases, it is important to consider the reasons: (i) the content in question may originate from a text background of the stance-taker as well as the characteris- piece in various languages; (ii) the evidence or the stance tics of the targeted object. In particular, in the context may not be expressed in the same language, in turn (iii) pos- of social media, one can provide information about the ing a challenge for fact-checkers, who might not be na- users such as their previous activity, other users they in- tive speakers of the language considered. This is an impor- teract most with, the threads in which they discuss a par- tant challenge and an emerging trend not only in stance de- ticular topic, or even their interests [Zubiaga et al., 2016a; tection, but in NLP in general. Currently, only a handful Gorrell et al., 2019; Li et al., 2019b]. The context of stance of datasets for factuality and stance cover languages other expressed through news articles is related to the fea- than English [Baly et al., 2018; Boˇsnjak and Karan, 2019; tures of the media outlets, like sources of funding, pre- Lillie et al., 2019; Khouja, 2020]. While they are a good viously known biases, or credibility [Darwish et al., 2020; start, they are small in size, and do not offer a truly cross- Stefanov et al., 2020]. When using contextual information lingual setup. Only recently, Vamvas and Sennrich [2020] in- about the object, factual information about the real world, and troduced a cross-lingual setup with three languages for stance the time of posting are all important to consider. Incorporat- in debates. 7 Conclusion [Dong et al , 2015] Dong et al . Knowledge-based trust: Es- timating the trustworthiness of web sources. VLDB, 2015. We surveyed the current state-of-the-art in stance detection for mis- and disinformation detection. We explored applica- [Du Bois, 2007] Du Bois. The stance triangle. Stancetaking tions of stance for detecting fake news, verifying and debunk- in discourse: Subjectivity, evaluation, interaction, 2007. ing rumours, identifying misconceptions, and fact-checking. [Dungs et al., 2018] S. Dungs et al. Can rumour stance alone Furthermore, we discussed existing approaches used in dif- predict veracity? In COLING, 2018. ferent aspects of the aforementioned tasks. We also identi- fied several vital challenges which need to be addressed in [Fang et al., 2019] W. Fang et al. Neural multi-task learning order to make substantial progress, such as the integration of for stance prediction. In FEVER, 2019. stance in fully- or semi-automated systems. Finally, we out- [Ferreira and Vlachos, 2016] W. Ferreira and A. Vlachos. lined promising future trends in the field of stance detection Emergent: a novel data-set for stance classification. In for mis- and disinformation detection. NAACL-HLT, 2016. [Ghanem et al., 2018] Ghanem et al. Stance detection in fake References news a combined feature representation. In FEVER, 2018. [Alam et al., 2020] F. Alam et al. Fighting the COVID-19 [Gorrell et al., 2019] G. Gorrell et al. SemEval-2019 task 7: infodemic in social media: a holistic perspective and a call RumourEval, determining rumour veracity and support for to arms. arXiv:2007.07996, 2020. rumours. In SemEval, 2019. [Atanasov et al., 2019] A. Atanasov et al. Predicting the role [Guderlei and Aßenmacher, 2020] Maike Guderlei and of political trolls in social media. In CoNLL, 2019. Matthias Aßenmacher. Evaluating unsupervised repre- [Atanasova et al., 2020a] P. Atanasova et al. Generating fact sentation learning for detecting stances of fake news. In checking explanations. In ACL, 2020. COLING, 2020. [Atanasova et al., 2020b] P. Atanasova et al. Generating [Habernal et al., 2018] I. Habernal et al. The argument rea- label cohesive and well-formed adversarial claims. In soning comprehension task: Identification and reconstruc- EMNLP, 2020. tion of implicit warrants. In NAACL-HLT, 2018. [Augenstein et al., 2016a] Augenstein et al. Stance detection [Hanselowski et al., 2018] A. Hanselowski et al. A ret- with bidirectional conditional encoding. In EMNLP, 2016. rospective analysis of the fake news challenge stance- detection task. In COLING, 2018. [Augenstein et al., 2016b] I. Augenstein et al. USFD at SemEval-2016 task 6: Any-target stance detection on [Hanselowski et al., 2019] A. Hanselowski et al. A richly Twitter with autoencoders. In SemEval, 2016. annotated corpus for different tasks in automated fact- checking. In CoNLL, 2019. [Augenstein et al., 2018] I. Augenstein et al. Multi-task [ ] learning of pairwise sequence classification tasks over dis- Hartmann et al., 2019 M. Hartmann et al. Mapping (dis- parate label spaces. In NAACL-HLT, 2018. )information flow about the MH17 plane crash. In NLP4IF, 2019. [Augenstein et al., 2019] I. Augenstein et al. MultiFC: A [ ] real-world multi-domain dataset for evidence-based fact Hasan and Ng, 2014 K. Hasan and V. Ng. Why are You checking of claims. In EMNLP-IJCNLP, 2019. Taking this Stance? Identifying and Classifying Reasons in Ideological Debates. In EMNLP, 2014. [Baly et al., 2018] Baly et al. Integrating stance detection [ ] and fact checking in a unified corpus. In NAACL-HLT, Hossain et al., 2020 T. Hossain et al. COVIDLies: Detect- 2018. ing COVID-19 misinformation on social media. In NLP- COVID19, 2020. [Biber and Finegan, 1988] Biber and Finegan. Adverbial [Johnson and Goldwasser, 2016] K. Johnson and D. Gold- stance types in english. Discourse Processes, 11(1), 1988. wasser. “All I know about politics is what I read in Twit- [Boˇsnjak and Karan, 2019] M. Boˇsnjak and M. Karan. Data ter”: Weakly supervised models for extracting politicians’ set for stance and sentiment analysis from user comments stances from Twitter. In COLING, 2016. on Croatian news. In BSNLP, 2019. [Khouja, 2020] J. Khouja. Stance prediction and claim veri- [Chen et al., 2017] Q. and Chen et al. Enhanced LSTM for fication: An Arabic perspective. In FEVER, 2020. natural language inference. In ACL, 2017. [Kochkina et al., 2017] Kochkina et al. Turing at SemEval- [Chen et al., 2020] W. Chen et al. TabFact: A large-scale 2017 task 8: Sequential approach to rumour stance classi- dataset for table-based fact verification. In ICLR, 2020. fication with branch-LSTM. In SemEval, 2017. [Darwish et al., 2020] K. Darwish et al. Unsupervised user [K¨uc¸¨uk and Can, 2020] D. K¨uc¸¨uk and F. Can. Stance detec- stance detection on twitter. In ICWSM, 2020. tion: A survey. ACM Comput. Surv., 53(1), 2020. [Derczynski et al., 2017] L. Derczynski et al. SemEval-2017 [Kumar and Carley, 2019] S. Kumar and K. Carley. Tree task 8: RumourEval: Determining rumour veracity and LSTMs with convolution units to predict stance and rumor support for rumours. In SemEval, 2017. veracity in social media conversations. In ACL, 2019. [Lee et al., 2020] N. Lee et al. Language models as fact [Rethmeier and Augenstein, 2020] N. Rethmeier and I. Au- checkers? In FEVER, 2020. genstein. Long-tail zero and few-shot learning [Li et al., 2018] S. Li et al. An end-to-end multi-task learn- via contrastive pretraining on and for small data. ing model for fact checking. In FEVER, 2018. arXiv:2010.01061, 2020. [Li et al., 2019a] Li et al. eventAI at SemEval-2019 task 7: [Riedel et al., 2017] B. Riedel et al. A simple but tough-to- Rumor detection on social media by exploiting content, beat baseline for the Fake News Challenge stance detec- user credibility and propagation information. 2019. tion task. ArXiv:1707.03264, 2017. [Li et al., 2019b] Q. Li et al. Rumor detection by exploit- [Schiller et al., 2020] B. Schiller et al. Stance detec- ing user credibility information, attention and multi-task tion benchmark: How robust is your stance detection? learning. In ACL, 2019. arXiv:2001.01565, 2020. [Li et al., 2020] J. Li et al. Exploiting microblog conversa- [Shurafa et al., 2020] C. Shurafa et al. Political framing: US tion structures to detect rumors. In COLING, 2020. COVID19 blame game. In SocInfo, 2020. [Lillie et al., 2019] Anders Edelbo Lillie et al. Joint rumour [Slovikovskaya and Attardi, 2020] Slovikovskaya and At- stance and veracity prediction. In NoDaLiDa, 2019. tardi. Transfer learning from transformers to fake news challenge stance detection (FNC-1) task. In LREC, 2020. [Ma and Gao, 2020] J. Ma and W. Gao. Debunking rumors on Twitter with tree transformer. In COLING, 2020. [Somasundaran and Wiebe, 2009] S. Somasundaran and [Malon, 2018] Christopher Malon. Team Papelo: Trans- J. Wiebe. Recognizing stances in online debates. In former networks at FEVER. In FEVER, 2018. ACL-AFNLP, 2009. [Mendoza et al., 2010] M. Mendoza et al. Twitter under cri- [Stefanov et al., 2020] P. Stefanov et al. Predicting the topi- sis: Can we trust what we RT? In SOMA, 2010. cal stance and political leaning of media using tweets. In ACL, 2020. [Mohammad et al., 2016] S. Mohammad et al. SemEval- 2016 task 6: Detecting stance in tweets. In SemEval, 2016. [Thorne et al., 2018] J. Thorne et al. FEVER: a large-scale dataset for fact extraction and VERification. In NAACL- [Mohtarami et al., 2018] M. Mohtarami et al. Automatic HLT, 2018. stance detection using end-to-end memory networks. In NAACL-HLT, 2018. [Thorne et al., 2019] J. Thorne et al. The FEVER2.0 shared task. In FEVER, 2019. [Mohtarami et al., 2019] Mohtarami et al. Contrastive lan- guage adaptation for cross-lingual stance detection. In [Tian et al., 2020] L. Tian et al. Early detection of rumours EMNLP, 2019. on twitter via stance transfer learning. In ECIR, 2020. [Mukherjee and Weikum, 2015] S. Mukherjee and [Vamvas and Sennrich, 2020] J. Vamvas and R. Sennrich. X- G. Weikum. Leveraging joint interactions for credi- Stance: A multilingual multi-target dataset for stance de- bility analysis in news communities. In CIKM, 2015. tection. In SwissText-KONVENS, 2020. [Nadeem et al., 2019] Nadeem et al. FAKTA: An automatic [Vo and Lee, 2020] N. Vo and K. Lee. Where are the facts? end-to-end fact checking system. In NAACL-HLT, 2019. searching for fact-checked information to alleviate the [Nakamura et al., 2020] K. Nakamura et al. Fakeddit: A new spread of fake news. In EMNLP, 2020. multimodal benchmark dataset for fine-grained fake news [Waseem et al., 2020] Z. Waseem et al. Disembodied ma- detection. In LREC, 2020. chine learning: On the illusion of objectivity in NLP. [Nguyen et al., 2020] V. Nguyen et al. FANG: Leveraging arXiv:2101.11974, 2020. social context for fake news detection using graph repre- [Wen et al., 2018] W. Wen et al. Cross-lingual cross- sentation. In CIKM, 2020. platform rumor verification pivoting on multimedia con- [Nie et al., 2019] Y. Nie et al. Combining fact extraction and tent. In EMNLP, 2018. verification with neural semantic matching networks. In [Wright and Augenstein, 2020] D. Wright and I. Augen- AAAI, 2019. stein. Claim check-worthiness detection as positive un- [Pomerleau and Rao, 2017] D. Pomerleau and D. Rao. Fake labelled learning. In EMNLP Findings, 2020. news challenge stage 1 (FNC-I): Stance detection, 2017. [Yoneda et al., 2018] T. Yoneda et al. UCL machine reading [Popat et al., 2017] K. Popat et al. Where the truth lies: Ex- group: Four factor framework for fact finding (HexaF). In plaining the credibility of emerging claims on the Web and FEVER, 2018. social media. In WWW, 2017. [Yu et al., 2020] J. Yu et al. Coupled hierarchical Trans- [Popat et al., 2018] K. Popat et al. CredEye: A credibil- former for stance-aware rumor verification in social media ity lens for analyzing and explaining misinformation. In conversations. In EMNLP, 2020. WWW, 2018. [Zhou et al., 2019] J. Zhou et al. GEAR: Graph-based evi- [Qazvinian et al., 2011] Qazvinian et al. Rumor has it: Iden- dence aggregating and reasoning for fact verification. In tifying misinformation in microblogs. In EMNLP, 2011. ACL, 2019. [Zlatkova et al., 2019] D. Zlatkova et al. Fact-checking meets fauxtography: Verifying claims about images. In EMNLP-IJCNLP, 2019. [Zubiaga et al., 2016a] A. Zubiaga et al. Analysing how peo- ple orient to and spread rumours in social media by look- ing at conversational threads. PLOS ONE, 11(3), 2016. [Zubiaga et al., 2016b] A. Zubiaga et al. Stance classifica- tion in rumours as a sequential task exploiting the tree structure of social media conversations. In COLING, 2016. [Zubiaga et al., 2018a] A. Zubiaga et al. Detection and reso- lution of rumoursin social media: A survey. ACM Comput. Surv., 51(2), 2018. [Zubiaga et al., 2018b] A. Zubiaga et al. Discourse-aware rumour stance classification in social media using sequen- tial classifiers. Inf. Process. Manage., 54(2), 2018.