Defining and Annotating Credibility Indicators in News Articles

A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles Amy X. Zhang Aditya Ranganathan Sarah Emlen Metz MIT CSAIL Berkeley Institute for Data Science Berkeley Institute for Data Science Cambridge, MA, USA Berkeley, CA, USA Berkeley, CA, USA axz@mit:edu adityarn@berkeley:edu emlen:metz@berkeley:edu Scott Appling Connie Moon Sehat Norman Gilmore Georgia Institute of Technology Global Voices Berkeley Institute for Data Science Atlanta, GA, USA London, UK Berkeley, CA, USA scott:appling@gtri:gatech:edu connie@globalvoices:org norman@virtualnorman:com Nick B. Adams Emmanuel Vincent Jennifer 8. Lee Berkeley Institute for Data Science Climate Feedback Hacks/Hackers Berkeley, CA, USA University of California, Merced San Francisco, CA, USA nickbadams@berkeley:edu Merced, CA, USA jenny@hackshackers:com emvincent@climatefeedback:org Martin Robbins Ed Bice Sandro Hawke Factmata Meedan W3C London, UK San Francisco, CA, USA Cambridge, MA, USA martin:robbins@factmata:com ed@meedan:com sandro@w3:org David Karger An Xiao Mina MIT CSAIL Meedan Cambridge, MA, USA San Francisco, CA, USA karger@mit:edu an@meedan:com ABSTRACT KEYWORDS The proliferation of misinformation in online news and its amplifi- misinformation; disinformation; information disorder; credibility; cation by platforms are a growing concern, leading to numerous news; journalism; media literacy; web standards efforts to improve the detection of and response to misinforma- ACM Reference Format: tion. Given the variety of approaches, collective agreement on the Amy X. Zhang, Aditya Ranganathan, Sarah Emlen Metz, Scott Appling, indicators that signify credible content could allow for greater col- Connie Moon Sehat, Norman Gilmore, Nick B. Adams, Emmanuel Vincent, laboration and data-sharing across initiatives. In this paper, we Jennifer 8. Lee, Martin Robbins, Ed Bice, Sandro Hawke, David Karger, present an initial set of indicators for article credibility defined by and An Xiao Mina. 2018. A Structured Response to Misinformation: Defining a diverse coalition of experts. These indicators originate from both and Annotating Credibility Indicators in News Articles. In The 2018 Web within an article’s text as well as from external sources or article Conference Companion, April 23–27, 2018, Lyon, France. ACM, New York, NY, metadata. As a proof-of-concept, we present a dataset of 40 articles USA, 10 pages. https://doi:org/10:1145/3184558:3188731 of varying credibility annotated with our indicators by 6 trained annotators using specialized platforms. We discuss future steps 1 INTRODUCTION including expanding annotation, broadening the set of indicators, While the propagation of false information existed well before the and considering their use by platforms and the public, towards the internet [11], recent changes to our information ecosystem [24] development of interoperable standards for content credibility. have created new challenges for distinguishing misinformation This paper is published under the Creative Commons Attribution 4.0 International from credible content. Misinformation, or information that is false (CC BY 4.0) license. Authors reserve their rights to disseminate the work on their or misleading, can quickly reach thousands or millions of readers, personal and corporate Web sites with the appropriate attribution. In case of republi- helped by inattentive or malicious sharers and algorithms optimized cation, reuse, etc., the following attribution should be used: “Published in WWW2018 Proceedings © 2018 International World Wide Web Conference Committee, published for engagement. Many solutions to remedy the propagation of mis- under Creative Commons CC BY 4.0 License.” information have been proposed—from initiatives for publishers to WWW ’18 Companion, April 23–27, 2018, Lyon, France signal their credibility1, to technologies for automatically labeling © 2018 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC BY 4.0 License. misinformation and scoring content credibility [6, 29, 35, 42], to the ACM ISBN 978-1-4503-5640-4/18/04. https://doi:org/10:1145/3184558:3188731 1The Trust Project: https://thetrustproject:org engagement of professional fact-checkers or experts [13], to cam- could contribute annotations using open standards developed dur- paigns to improve literacy [19] or crowdsource annotations [31]. ing this work, while any system for displaying or sharing news While all these initiatives are valuable, the problem is so multi- could make their own decisions about how to aggregate, weight, faceted that each provides only partial alleviation. Instead, a holistic filter and display credibility information. For instance, systems such approach, with reputation systems, fact-checking, media literacy as web browsers, search engines, or social platforms could surface campaigns, revenue models, and public feedback all contributing, information about a news article to benefit readers, much like how could collectively work towards improving the health of the infor- nutrition labels for food and browser security labels for webpages mation ecosystem. To foster this cooperation, we propose a shared provide context in the moment. Readers could also verify an article vocabulary for representing credibility. However, credibility is not a by building on the annotations left by others or even interrogate Boolean flag: there are many indicators, both human- and machine- the indicators that a particular publisher or other party provides. generated, that can feed into an assessment of article credibility, Finally, the data may be helpful to researchers and industry watch- and differing preferences for what indicators to emphasize ordis- dogs seeking to monitor the ecosystem as a whole. play. Instead of an opaque score or flag, a more transparent and customizable approach would be to allow publishers, platforms, 2 RELATED WORK and the public to both understand and communicate what aspects of an article contribute to its credibility and why. In recent years, researchers have sought to better define and char- In this work, we describe a set of initial indicators for article acterize misinformation and its place in the larger information credibility, grouped into content signals, that can be determined by ecosystem. Some researchers have chosen to eschew the popular- only considering the text or content of an article, as well as context ized term “fake news”, calling it overloaded [50]. Instead, they have signals, that can be determined through consulting external sources opted for terms such as “information pollution” and “information or article metadata. These indicators were iteratively developed disorder” [50], to focus not only on the authenticity of the content through consultations with journalists, researchers, platform repre- itself, but also the motivations and actions of creators, including sentatives, and others during a series of conferences, workshops, disinformation agents [47], readers, media companies [20] and their and online working sessions. While there are many signals of cred- advertising models [16], platforms, and sharers. Accordingly, our ibility, we focus on article indicators that do not need a domain approach covers a broad range of indicators developed by experts expert but require human judgment and training. This focus dif- representing a range of disciplines and industries. ferentiates our work from efforts targeting purely computational An important aspect of characterizing misinformation is un- or expert-driven indicators, towards broadening participation in derstanding how people perceive the credibility of information. credibility annotation and improving media literacy. To validate Reviews of the credibility research literature [24, 41] describe vari- the indicators and examine how they get annotated, we gathered ous aspects of credibility attribution, including judgments about a dataset of 40 highly shared articles focused on two topics pos- the credibility of a particular source or a broader platform (e.g., a sessing a high degree of misinformation in popular media: public blog versus social media) [43], as well as message characteristics health [49] and climate science [21]. These articles were each an- that impact perceptions of credibility of the message or source [30]. notated with credibility indicators by 6 annotators with training Studies have pointed out the differences in perceived credibility in journalism and logic and reasoning. Their rich annotations help that can occur based on differences in personal relevance [53], in- us understand the consistency of the different indicators across an- dividual online usage [10], and the co-orientation of reader and notators and how well they align with domain expert evaluations writer views [25], among others. This prior work suggests that a of credibility. We are releasing the data publicly2, and will host an one-size-fits-all approach or an approach that provides an opaque expanded dataset in service to the research community and public. “credibility score” will not be able to adapt to individual needs. The process outlined in this paper serves as a template for cre- However, research has also found that readers can be swayed ating a standardized set of indicators for evaluating content credi- by superficial qualities that may be manipulated, such as auser’s bility. With broad consensus, these indicators could then support avatar on social media [27] or number of sources quoted in an an ecosystem of varied

Load more