<<

Fake : A Concept Explication and Taxonomy of Online News

Maria D. Molina & S. Shyam Sundar {[email protected], [email protected]}

Media Effects Research Laboratory Donald P. Bellisario College of Communications PENN STATE UNIVERSITY, USA

Paper accepted for presentation in the and Online News Division at the annual conference of the Association for Education in and Mass Communication (AEJMC) Washington, DC, August 6-9, 2018.

Abstract- The growth of fake news online has created a need for computational models to automatically detect it. For such models to be successful, it is essential to clearly define fake news and differentiate it from other forms of news. We conducted a concept explication, yielding a taxonomy of online news that identifies specific features for use by machine learning to reliably classify fake news, real news, commentary, , and other related types of content.

ake news has received serious attention Given the enormity of the fake news problem, in a variety of fields, with scholars machine-based solutions seem inevitable for F investigating the antecedents, tackling the scope and speed with which fake characteristics and consequences of its creation news is created and disseminated, especially and dissemination. Some are primarily interested around the time of elections, disasters, crises and in the nature of contained in fake other developing stories. However, in order to news, so that we can better detect fake news and develop reliable algorithms for detecting fake distinguish it from real news. Others focus on the news, we have to be very disciplined in defining susceptibility of users— why we fall for fake fake news and differentiating it from legitimate news and how we can protect ourselves from this news. vulnerability. Both are geared toward improving With this goal in mind, we launched a concept literacy to protect consumers from false explication (Chaffee, 1991) to uncover the information. different theoretical and operational definitions Websites like and Politifact explicitly of fake news and its associated terms (e.g., address the issue by verifying information in the misinformation, , ) news cycle with the help of an army of human as described by academic research, media fact-checkers. However, human fact- checking articles, trade journals, and other relevant can be time consuming and subject to human sources. Through a meaning analysis, a taxonomy foibles such as subjectivity and being limited by of online content was developed with two prior experiences (Vorhies, 2017). An alternative primary objectives. First, through this exercise, that has been proposed is the use of machine we can pinpoint the key defining characteristics algorithms to facilitate fake news detection of fake news. Knowing the main ingredients of (Conroy, Rubin, & Chen, 2015; Wang, 2017). fake news will facilitate the development of an for detection of this type of content. information via online media. Studies show that Second, we can identify other types of content they may even be preferred over traditional that is often confused with fake news but is not professional sources (Sundar & Nass, 2001). This fake news. A conceptual understanding of these is particularly troublesome given that individuals types of content will help us better distinguish find information that agrees with prior beliefs as them from fake news and rule them out for more credible and reliable, creating an machine-detection purposes. environment that exacerbates misinformation In this paper, we will first describe the because credible information appears alongside different definitions of fake news encountered personal opinions (Bode & Vraga, 2015). through our meaning analysis and the current inconsistencies in the basic assumptions of what Defining Fake News is and is not fake news. We will then propose a Although we now have a seemingly simple theoretical definition of fake news and situate it dictionary definition of fake news as “false in a new taxonomy of online content, highlighting stories that appear to be news, spread on the the characteristics that help distinguish one type or using other media, usually created to of content from another. Finally, from the influence political views or as a joke” (Fake developed taxonomy and characteristics, this News, 2018), determining what is and what is not paper will derive specific features or indicators fake news is rather complex. There is for use in a machine learning algorithm. considerable disagreement when it comes to determining which content should be considered Evolution of Fake News: Theoretical and “fake news” and which should be excluded. Operational Definitions The term “fake news” was first used to Although the interest in fake news spiked after describe satirical shows and publications (i.e.: the 2016 Presidential election, it is not a new Daily Show, ). For creators of such phenomenon. The concept, known as content, the concept meant made-up news, with “disinformation” during the World Wars and as the pursuit of entertaining others, and not for “freak journalism” or “” during informing or deceiving. Some scholars claim that the Spanish war, can be traced back to 1896 satire should be left out of the “new definition of (Campbell, 2001; Crain, 2017). Yellow fake news” because it is “unlikely to be journalism was also known for publishing content misconstrued as factual” and it is not created with with no evidence and factually incorrect, often for the purpose of informing audiences (Alcott & business purposes (Samuel, 2016). In Yarros’ Gentzkow, 2017, p.214). However, others claim (1922) critique of yellow journalism, he that it should be included because although it is characterizes it as “brazen and vicious ‘faking,’ legally protected speech, it could be misconstrued and reckless disregard of decency, proportion and as telling (Klein & Wueller, 2017). For taste for the sake of increased profits” (p. 410). example, in 2017 a satire site run by hoaxer As if history were repeating itself, the Christopher Blair issued an apology for making phenomenon regained attention during the 2016 their story “too real,” after many were unable to U.S. Presidential elections. However, what detect its satirical nature (Funke, 2017). makes fake news unique is the information The second disagreement when environment we currently live in, where social conceptualizing fake news is intentionality. Some media is key to dissemination of information and scholars believe that for content to be considered we no longer receive information solely from fake, the content creator must have deceitful traditional gatekeepers. Nowadays, it is not intent. For example, Alcott and Gentzkow (2017) necessary to be a and work for a and Conroy, et al. (2015) argue that fake news publication to create and disseminate content should be defined as news articles that could online. Laypersons write, curate and disseminate mislead readers and are intentionally and verifiably false. This includes intentionally and the Latin American Chequeado, that follows fabricated pieces and satire sites, but excludes a similar classification system as Politifact. unintentional reporting of mistakes, , It must be noted that the vast majority of conspiracy theories, and reports that are online fact checkers are based on corroboration misleading, but not necessarily false (Alcott & with a database of verified facts. While they can Gentzkow, 2017; Klein & & Wueller, 2017). be quite useful in fake stories about established Such conceptualization leaves out facts and also for training machine-learning media misreporting from scrutiny. As Mihailidis algorithms, it cannot help us determine the and Viotty (2017) explain, face an veracity of new, incoming information about information environment where the economic, developing stories, as is often the case with the technological, and sociopolitical pressures are recent crop of fake news surrounding elections, combined with a need to report with speed, while disasters and mass shootings. Therefore, we need engaging audiences in the process. This tension a more comprehensive view of fake news, one creates an environment where online that not only checks on facts, but also linguistic become part of the problem of misinformation. characteristics of the story, its (s) and the Despite misreporting being unintentional, it is networks involved in its online dissemination. still an instance of untrue information With this in mind, we propose an original disseminated via traditional as well as online taxonomy of online content, as a precursor to media channels. identifying signature features of fake news. Finally, a third disagreement regarding fake news has to do with its conceptualization as a Taxonomy of Online Content for Fake binary variable versus one that varies on a News Detection continuum. For example, the conceptualization of In our taxonomy, we identify nine categories fake news as exclusively satire provides a binary of online content for the purpose of algorithm- differentiation between genres. It is either based detection of fake news: real news, fake hard/mainstream news (based on real facts with news, polarized and sensationalist content, satire, the purpose of informing) or fake news (made- up misreporting, commentary, native , stories with the purpose of entertaining). , and professional political However, literature exploring fake news post- content. These categories are organized based on 2016 election argues that fake news should be a combination of unique features derived from seen as a continuum because there is some online the various conceptual and operational content that have biases, but do not fall directly definitions proposed for fake news. Such features into fake news. As Potthast et al. (2017) explain include characteristics related to the message and “hardly any piece of fake news is entirely false, its linguistic properties, its sources and intentions, and hardly any piece of real news is flawless” (p. structural components, and network 3). Those who conceptualize fake news as a characteristics. In the next section, we will first continuum include human fact-checking sites differentiate between real news and fake news. such as Snopes and Politifact that classify articles Then, we identify online content that is not fake based on their degree of fakeness, rather than as news, but that could be misinterpreted by absolutely true or false. For example, Politifact audience as fake news. These types of online classifies fake news on a 6-point scale ranging content are important to identify for the sake of from “true” to “pants on fire” and Snopes building a taxonomy that has discriminant classifies in twelve categories including, true, validity in ruling out content that is not fake. false, mostly true, mostly false, mixture, Once identified, we can build algorithms to label misattribution, legend, and scam. Other human the varied forms of news that exist between the fact checkers include FactCheck.org, whose goal binary categories of real and fake, so that the is to verify statements in transcripts and videos, reader can factor that in their consumption of such information or discourse. It will also serve can be done to “extract and index factual to reduce reactance that is known to occur when knowledge from the web and use the same readers are told that a piece of news which aligns technology to extract factual statements from a well with their political beliefs is indeed false in given text in question” (Potthast et al., 2017, p. a blanket manner—providing a more nuanced 2). Other message features of real news include labeling of partisan content, without outright stylistic indicators such as the adherence to declaring it as fake, can serve to enhance style (Frank, 2015), as well as credibility of the algorithm and greater lexical and syntactical structures (Argamon- acceptance of its classification of different kinds Engelson, Koppel, & Avneri 1998) that are of real and fake news and the various shades in characteristic of real news content. between. Real news is also characterized by the use of credible sources. Source characteristics can help What is Real and what is Fake News? identify the reliability, objectivity, and veracity of Real news. The first category of content in our real news. For example, real news typically taxonomy is of course real news. Although reports directly from official sources and difficult to define, it can be understood through typically tries to present both sides of an issue or the journalistic practices that surround its event, in an effort to achieve objectivity. Finally, creation. Real news is created through journalism real news has structural characteristics which defined as “the activity of gathering, assessing, include its independence, or lack of affiliation creating, and presenting news and information” with an interest group, as well as accountability. and abides by principles of verification, Fake news/. Fake news, on the other independence and obligation to report the truth hand, is defined as false information that is (American Press Institute, 2017, para.1). The intentionally fake, and are often malicious stories operational definition of this type of content is the propagating conspiracy theories. Although this pursuit of knowledge through verification based type of content shares characteristics with on reliability, truthfulness, and independence polarized and sensationalist content (described (Borden & Tew, 2007, p. 304; Digital Resource later in this paper), where information can be Center, 2017). This operational definition can characterized as highly emotional and highly further be deconstructed into specific features partisan (Potthast et al., 2017; Allcott & that differentiate real news from other categories Gentzkow, 2017; Howard, Kollanyi, & Neudert, of information (See Table 1). 2017), it differs in important features (See Table Features of the message include factuality and 2). truthfulness achieved through fact- checking and First and foremost, fake news stories are not quote verification (Borden & Tew, 2007). factual and have no basis in reality, and thus, are Interestingly, there are already initiatives freely unable to be verified (Allcott & Gentzkow, 2017; available for anyone to verify quotes. For Cohen, 2017). They are often malicious stories example, Storyzy (2017) is an online quote that propagate conspiracy theories (Howard et al., verifier able to decipher if a quote is authentic. As 2017). Furthermore, fake news differs from many as 50,000 new quotes are entered daily, polarized content in structural components. For making it a viable resource for quote verification. example, fake news often originates from Importantly, quote verification is only one part of ephemeral sites created for ad revenue purposes. fact-checking. An overall assessment of the In fact, many of the fake sites of the 2016 claims in an article is essential. To this effect a elections were later identified as having fairly knowledge-based paradigm of machine recent registration domains and foreign locations detection, through information retrieval, semantic (Silverman, 2016; Soares, 2017). A well- web, and linked open data can be employed established news organization will not have (Potthast et al., 2017). For example, an analysis created their site a couple of months ago. As Table 1. Features of Real News Message and Sources and Intentions Structural Network Linguistic Factuality: Sources of the content: URL: Metadata: - Fact-checked. - Verified sources. - Reputable ending - Metadata - Unbiased reporting. - Always has quotes. - URL has normal indicators of -Uses last names to cite. Pedigree: registration. authenticity. Evidence: - Originated from a well- About Us section: -Statistical data, known site/organization. - Will have a clear research-based. - Written by actual news About Us. Message Quality: staff. - Authors and editors - AP style Independence: can be verified. -Edited and proof read. - Organization associated Contact us section Lexical and with the journalist. - Emails are from the Syntactic: professional - Frequent use of organization. “today.” - Past tense. Dempsey (2017) explains, fake sites can make a following a context- based paradigm of lot of revenue through online ads by simply , through social network analysis driving users to their sites so “they may not care assessing the spread of a particular piece of what the content even says or how misleading the information. Hamilton 68, a site that tracks is, as long as it is attracting eyeballs to Russian and disinformation on their paid-per-click pages” (p. 6). Because the , is an example of network features in main purpose of this type of content is ad action. As the site describes, the dashboard revenue, message and source characteristics are monitors the activity of accounts assembled often neglected. It is not uncommon for a fake based on a three-year analysis tracking news article to have unverified quotes, disinformation campaigns and identifying both emotionally charged linguistic markers, spelling humans who shared such content and the use of and grammar mistakes, and inaccurate pictures. A bots to “boost the signal of other accounts” quick reverse- search, for instance, can (Berger, 2017, para. 10). reveal if a displayed picture occurred prior to the event it claims to be a reporting about (Davis, Myths About Fake News – the Gray Area 2016). Even though some content posted online may Finally, network features are especially salient indeed be misleading, it is important to not for fake sites. Because these websites confuse them with fake news or real news. The intentionally publish deceptive and incorrect goal of developing algorithms for fake news information for financial gain, they rely on social detection is not to impinge on users’ right to media to engage audiences. As Howard et al. express their opinions, or in journalists’ (2017) elucidate “both fake news websites and endeavors, but to stem dissemination of false political bots are crucial tools in digital information. However, in order to achieve this propaganda attacks—they aim to influence goal, we need to identify the types of online conversations, demobilize opposition and content that are often misattributed to be fake generate false support” (p.1). According to news or real news. Potthast et al. (2017), the spread of misinformation can be assessed computationally Table 2. Features of Fake News Message and Linguistic Sources and Structural Network Intentions Factuality: Sources of the URL: Personalization - Not factual. message: - Not reputable and - Biased reporting. - Unverified ending (.com.co). customization Message Quality: sources. - Recently - Can reaches - Grammar, spelling or - No quotes. registered URL. individuals due to -punctuation mistakes. Intentionality - Designed to look like structure of social - Does not adhere to AP - Intentionally established site. media. style. false. About Us section: FB sources/shares: - Cites first names. -Revenue purpose. - Does not have - Often shared by Lexical and Syntactical: Independence: information about the mutual friends or pre-identified - Present tense verbs. - Source of origin is editor or listed owner. accounts. Rhetorical elements: not reputable. Contact us section: Author: - Discrepancies or Pedigree: - Email is a “personal” - Written by bots omissions. - Originated in an address obscure site or and algorithms. - “Spectacle” and Uncommon post. Metadata: narrative writing. journalistic - Metadata - Emotionally charged. practices: indicators that - Logic flaws. - Provide a Free PDF version. determine Headline deception: Queries - All CAPS and exclamations - Asks users to send their stories for to provide - Misleading and publication. cumulative . Comments: deception measures. Sound bites: - Asks users to - soundbites to comment to access an create . article. Photos/Videos: - Red flag if many - Altered pixel structure, users say it is false. shadows, reflections, and perspective distortions. -Use of photos out of context. Commentary, opinion, and feature writing. is to select facts to form an argument and adhere The first type of content to be aware of is to the Society of Professional Journalism code of commentary and feature writing. Although these conduct when doing so (Digital Resource Center, are typically pieces written by mainstream and 2017). Commentary should not be confused with traditional outlets, many often confuse it with an assertion because its emphasis is on providing hard news. Commentary and other similar conclusions based on evidence, whereas an editorial pieces are different from real news in assertion (typically seen in polarized and that the journalist does not abide by principles of sensationalist content) occurs when something is opinion- free reporting typically seen in hard declared without the necessary evidentiary basis news stories (Digital Resource Center, 2017; (Digital Resource Center, 2017). Nonetheless, the Padgett, 2017). Yet, opinion journalists are well 24/7 news cycle exacerbates the need to fill time, within their rights to express opinions. Their job and thus commentators are more often featured, blurring the lines between news and commentary Additionally, commentators should remain (Turner, 2017). free from activities than can damage their Identifying commentary, thus, requires reputation and should remain independent. The differentiating between both real news and latter can be assessed through structural features assertions (Howard et al., 2017) (See Table 3). allowing the reader to assess the partisan This can be done based on opinion journalists’ associations, if any, of the commentator (Digital adherence to the code of professional conduct of Resource Center, 2017). On the other hand, the Society of Professional Journalists and based features of the message or linguistic structure on the Verification, Independence and include fact-checking and the presentation of Accountability principles of journalism. evidence-based and expert opinions (Borden & Following the SPJ and VIA principles, specific Tew, 2007). Furthermore, the message in characteristics of message and structure can be commentary tends to be more emotionally identified to detect commentary. Features of charged and opinion-based—features that can be structure include, at the outset, the labeling of identified using linguistic markers of the message commentary as such. For example, in (Argamon-Engelson et al., 1998; Borden & Tew, professional news outlets, opinion pieces are 2007). located in the editorial, opinion section, or other such markers of the category of news that one is consuming.

Table 3. Features of Commentary Message and Linguistic Sources and Structural Network Intentions Factuality: Sources of the URL: Metadata: - Fact-checked. content: - Reputable ending - Metadata - No -Written by actual - URL has normal indicators of misrepresentations. news source. registration. authenticity. Evidence: Pedigree: About Us section: -Statistical data, - Originated from a - Will have a clear About Us. research-based. well-known - Authors and editors can be Rhetorical elements: site/organization. verified. - Emotionally charged. Independence: - Contact us section - Narrative writing. - Organization - Emails are from the Message Quality: associated with the professional organization. - AP style journalist. Labeling: -Edited and proof read. - Labeled as commentary, Lexical and Syntactic: editorial, analysis. - Frequent use of “should.” Misreporting. Related to real news and detection. That being said, this content should not commentary is misinformation, or unintentional be confused with blatantly false reports created false reporting from professional news media with the intention of being deceitful. organizations. Even though the intention of Misreporting is an example of misinformation professional journalists is not to be deceitful, “defined as false, mistaken, or misleading misreporting can sometimes occur. Thus, it is information” and not disinformation “the important for it to be included in a taxonomy of distribution, assertion, or dissemination of false, online content for the purpose of fake news mistaken, or misleading information in an intentional, deliberate, or purposeful effort to Finally, structural elements demarcated based mislead, deceive, or confuse” (Fetzer, 2004, p. on principles of accountability and independence 231). can also be identified to differentiate this content Misreporting can be distinguished from real from disinformation. For instance, although the and completely fake news (See Table 4). The first article contains biased or inaccurate reporting, the important set of features is that of sources and journalist shows accountability by providing intentions evaluated based on reliability and his/her name and affiliation with a news objectivity. For instance, misreported articles will organization. Similarly, the URL of the site typically not report straight from sources and provides information about the news organization some of its quotes might not be verified. and its independence from external groups that Similarly, this content can be evaluated based on might benefit from the misinformation in message and linguistic features such as its one- question. And, when it is pointed out, the sided reporting, and factual inaccuracies. organization will issue a retraction or correction.

Table 4. Features of Misreporting Message and Linguistic Sources and Intentions Structural Network Factuality: Sources of the URL: Metadata: - Not-fact checked. content: - Reputable ending - Metadata indicators of -Uses last names to cite. - Unverified sources. -URL has normal authenticity. Message Quality: Pedigree: registration. - AP style - Originated from a About Us section: -Edited and proof read. well-known -Will have a clear About Lexical and Syntactic: site/organization. Us. Past tense. - Written by actual -Authors and editors can news staff. be verified. Independence: Contact us section Organization -Emails are from the Associated with the professional organization. journalist. Polarizing and sensationalist content. The it provides psychological utility (Allcott and next type of online news is polarized and Gentzkow, 2017). Furthermore, assertions sensationalist content. Although this content is typically are constructed based on implied not completely fake, it is characterized by highly information, or self-generated deduction, which emotional and inflammatory content that is is stronger because it activates previous schemas extremely one-sided (hyperpartisan). These and scripts (Rich & Zaragoza, 2017). Features to assertions typically lack evidence and are based help identify such presentation of information on appeals to and preexisting attitudes include message characteristics like the use of (Allcott and Gentzkow, 2017; Digital Resource emotional appeals, hyperboles, and attention- Center, 2017). Operational definitions for this grabbing techniques (Howard et al., 2017). type of content include the presentation of Similarly, this type of content has no evidence to divisive, inflammatory and misleading support claims, tends to make generalizations and information to manipulate the audience’s logical , include ad-hominem attacks, understanding of an event or issue (Howard et al., and often display inaccurate pictures and 2017). This is problematic because when soundbites (Borden & Tew 2007; Howard et al, feedback about an event or object is limited, users 2017; Najmabadi, 2017; Reilly, 2012; Silverman, tend to judge quality of content based on previous 2016) (See Table 5). experiences and prefer confirmatory reporting as Table 5. Features of Polarized and Sensationalist Content Message and Linguistic Sources and Structural Network Intentions Factuality: Sources of the About Us section: Personalization - Not completely factual. message: - Speaks ill of media. and - Biased reporting. - One-side -Describes itself as customization -Degree of fit with a sources. leaning toward a - Can reaches political agenda. Independence: political side. individuals due to Message Quality: - Source of origin is structure of social - Lacks evidence. not reputable. media. -Excessive capitalizations. Pedigree: FB sources/shares: Rhetorical elements: - Polarized source of - Often shared by - Generalizations origin. mutual friends or - “Spectacle” and preidentified narrative writing. accounts. Metadata: - Emotionally charged. -Ad-hominem attacks. - Metadata indicators that determine - Logic flaws. deception: Queries to Headline provide cumulative - All CAPS and deception measures. exclamations - Misleading and clickbait headlines Sound bites: - Editing soundbites to create sensationalism. Visuals: Extreme use of static and moving visuals. Another operational definition of polarized right-wing style of writing, suggesting that at and sensationalist content is the evaluation of its least stylistically both have a lot in common. statements based on “goodness of fit with a Goodness of fit can also be differentiated through particular ideology” (Berghel, 2017, p.3). features of the sources/intent. For example, According to Pennycook and Rand (2017), sensationalist content typically includes only individuals fall for misinformation because they one-sided sources and their headline are clickbait fail to think analytically when faced with style, accentuating negative content and misinformation. This is particularly true with exaggerating the article. It is important to note, information that agrees with their prior however, that the use of clickbait headline by knowledge and beliefs (Bode, 2015; Rich & sensationalist content does not make the Zaragoza, 2017). Both message features and headlines necessarily false. As Ecker, source/intent features can help recognize when Lewandowsky, Chang, and Pillai (2014) explain, content is aligned with a particular ideology. For clickbait content can be misdirected such that the example, in Potthast et al. (2017), researchers information is technically true, but “misleading in were able to differentiate between hyperpartisan, substance” (p.324). Structural features can also mainstream and satire based on its style of writing be giveaways of their polarized nature. For through unmasking style-based categorization example, often in the “about us” section these using computational techniques. However, they sites will speak ill of . were unable to differentiate between left and Furthermore, network features can shed light on whether the content was personalized to be instance, the URL and about us section are received by certain individuals or networks of giveaways of the site being a , personal individuals with particular partisan affiliations or site, or a site specifically meant for citizen views. As such, features of the network are reporting. essential when identifying polarizing content The second category of citizen journalism (See Table 5). refers to subsites of professional journalism sites providing a forum for citizen reporting Citizen journalism. There are two (ex: CNN’s iReport). These sites are often different categories of citizen journalism. focused on first-hand eye-witness accounts The first includes and websites from of breaking news. Message features to citizens, civil societies, and organizations identify this type of content is similar to that with content originally created by such users of personal sites and blogs. For example, the (Howard et a., 2017). Features to distinguish content is typically a video recording, and this content from other online information when written, it lacks journalistic style of include message features such as not reporting. This content can also be identified adhering to Associated Press style of through structural features. Even though this reporting and verification. Additionally, its content is technically a subsection of a contents are more emotionally-driven and professional media site, this subsection is subjectively reported. Lastly, there are identified as content developed by users (See essential structural components. For Table 6). Table 6. Features of Citizen Journalism Message and Linguistic Sources and Intentions Structural Message Quality: Sources of the content: URL: - Does not adhere to AP style - Unverified sources. - Site is a blog or personal Lexical and Syntactic: Pedigree: site. - Past tense. - Originated from audience - Subsite of news Modality: member. organization labeled as - Often in video format. Independence: audience created. - May reflect creator’s Contact us section political and organizational - Personal emails from affiliations. bloggers. Satire. Another source of content that can detection of fake news can identify it for be found online is satirical news, defined as what it is, avoiding misclassifying it as either an intentionally false story meant to be real or fake. This is especially important perceived as unrealistic, that uses because news consumers are not always journalistic style as a parody of the style, to aware of its satirical nature, especially when mock issues and individuals in the news, or they are forwarded such “news” via informal to disseminate a prank (Frank, 2015; Reilly, social media and messaging channels. 2013). As Balmas (2012) explains, satire There are three basic set of characteristics derives from hard news and not from reality; of satirical content (See table 7). First, it and it is intended to be perceived as unreal. needs the understanding of what it means to Even though some scholars argue that it be an authentic and legitimate news practice should not be included in the description of (Baym, 2005). In other words, although fake news as its contents are expected to be satirical news may adhere to journalistic humorous (Alcott & Gentzkow, 2017; style of reporting, it does so in a humorous Borden & Tew, 2007), it is important to way rather than abiding by principles of categorize it explicitly so that automated truthfulness and verification required for journalistic practices. This satirical style of illustrated by excessive capitalizations and writing can be assessed through message and use of hyperboles (message features). linguistic features such as its adherence to Finally, a third feature of satire is that it uses journalistic style of writing. However, it real news as its reference point (Balmas, differs from real news in that it is not fact- 2012). Structurally, satirical content is checked (message feature); it often has labeled as such in the “about us” page and is amateur mistakes such as spelling mistakes, often satirical in itself as illustrated by self- grammar mistakes, and the overuse of vague proclamation. Additionally, the date of expressions such as “more and more” publication of these articles can have (message feature); and it either includes fake inconsistencies (Davis, 2016; Ojala, 2017; sources, or made-up quotes (feature of the Reilly, 2013). For example, a well-known source and intention). The second prank by the ‘yes men’ group published a characteristic of satire is the type of website special edition of that is created for the purpose. Frank (2015) depicting a utopian (Reilly, identifies five types of satirical websites that 2013). Those who believed it did not realize can be identified based on message quality, it was post-dated, among many other amateur mistakes, and clickbait headlines indicators of satire. Table 7. Features of Satire Message and Linguistic Sources and Intentions Structural Factuality: Sources of the content: About Us section: - Not-fact checked. - Fake sources. - Label as satire. Message Quality: Pedigree: - Self proclamation - AP style of writing, but may have - Originated from a satire Publication date: amateur mistakes. site. - Post-dated -Overuse of typical journalistic - Dated April 1 expressions - Grammar, spelling, and punctuation errors. Rhetorical Elements: -Use of hyperboles -Exaggerations Headline: - Clickbait headline Context: - Humorous content

Native advertisement. Another type of news. According to the report, over 80% of content to consider is native advertisement, students believed the native advertisement was defined as promotional material of persuasive real news, despite it being labeled as sponsored intent, often masked as a news article (Frank, (Stanford History Education Group, 2016). 2015). Because this type of content takes the form Specific message features to help answer of the platform it is distributed on, it is not these questions include the presence of a logo surprising that users might confuse them as true, from the company indicating authorship or even when it is clearly labeled as promotional. In sponsorship, verification of content through a Stanford study, students were unable to discern cross- checking, and the lexical components of between sponsored posts, real news, and biased the content (See Table 8). For example, in Argamon- Engelson et al. (1998) the authors throughout the story being told (Carlson, 2014). identified that the word “you” appeared more Finally, structural elements include the labeling frequently in promotional literature compared to of the content as promotional, sponsored, or paid news articles. Furthermore, features of the source content. For example, a native advertisement in include identifying the author of the content as can be identified by a “sponsored” tag. well as the sources within the article and their Similarly, paid content in will be relationship with the author or company of tagged as “paid partnership” or will include interest. For instance, even if it is written as a hashtags like #sponsored or #ad. narrative, the native ad will promote their product

Table 8. Features of Native Advertisement Message and Linguistic Sources and Intentions Structural Message Quality Sources of the content: Labeling: -Content directly associated with - If it has sources, it is - Labels indicating paid associated with brand. promotion in social media Rhetorical Elements: Independence: -Narrative writing Author is an organization Lexical Structures: or promotional entity. - First person statements-“You” Context: -Non-journalistic author

Professional political content. Yet another intentions features in Table 9) and will typically type of online content is professional political be published on a URL related to the organization content, which includes information from (See structural feature in Table 9). The second government, political parties, or public agencies, subgroup, papers, are similar to as well as information from experts (Howard et political party content, features of the message al. 2017). It is important to include this type of would differ in that it will not necessarily advance content because it reflects the agendas of an agenda, but might possess opinion linguistic particular political parties or organizations. markers (See message features in Table 9) and Although not fake news per se, they do not abide will often cite academic and research sources by objective two-sided reporting, and thus should rather than specific individuals (See sources and not be confused with real news. Professional and intentions feature in Table 9). political content have two subcategories: official content produced by a political party, candidate’s Features or Indicators of Fake News campaign, government, or public agency; and Throughout this explication, initial features or white papers or policy papers from think tanks indicators of fake news were identified based on with a political agenda. a meaning analysis of descriptions of fake news The first subgroup can be identified though in academic articles, trade publications, message features advancing the agenda of a newspaper and , among others. These candidate; for example, statements like “we will” features are decomposed as features of the or “the political party will” are typical to message or linguistics, features of sources and encounter. Furthermore, the content will typically intentions, structural features, and network lack sources, or feature only sources related to the features and can be useful when assessing online specific group or agenda (See sources and content based on the proposed taxonomy. Table 9. Features of Professional Political Content Message and Linguistic Sources and Intentions Structural Message Quality Sources of the content: URL: -PR for candidate or agenda. - Lacks sources or these are -Site is from political, Context: from groups related to the governmental, or another -Non-journalistic author. agenda. public agency Independence: About Us Section: Author is an organization or - States goals of the site. promotional entity.

For example, through our taxonomy we have features of the network are more difficult to be identified an array of message features. Message identified, yet such features are important for quality elements such as punctuation errors or better understanding dissemination of fake spelling mistakes are alerts of the possibility of an content. article to be false. Similarly, lexical components The different features are meant to be such as the word “should” or “you” can indicate indicators or red flags of fakeness, each piece of the content is possibly an opinion piece or native online content should be analyzed in terms of all advertisement. Visual components are also the features to determine where it belongs in our included, as a difference in pixel structure or a taxonomy. Future study of fake news reverse image search can assess the realness or characteristics can expand on this list, as we fakeness of an image. search for distinguishing features for including in Features of sources and intentions include a machine learning algorithm for fake news sources within the message as well as the source detection. of creation of the content. For instance, if an article does not use quotes, or these are not Conclusion verified, it is a red flag or indicator that the This paper has presented a first-cut content might be false. Likewise, features of the explication of the concept of fake news, with the source include the site of origin and pedigree. An goal of uncovering different theoretical and article coming from an obscure site or social operational definitions of fake news. This media post is more likely to be false. exercise, in turn, helped identify features to aid in Structural features can also provide means of detecting fake news through a machine learning identifying type of content. Typical structural algorithm. Our analysis unraveled various features of fake sites is the URL, often mimicking disagreements regarding the definition of fake that of a traditional outlet, but ending in .com.co news and what content should and should not be (eg: www.abc.com.co), as well as a fairly recent included. date of registration. Other indicators include the Based on our explication, we identify a “contact us” section, which provides a personal taxonomy of online content with unique features email account rather than one of a reputable that can eventually be fed into a machine learning company. algorithm for fake news detection. Even though Finally, features of the network are related to the list of features proposed is not exhaustive, it the dissemination of articles and structure of the can be used as a guideline for a multi- pronged technology that allows for this to occur. For analysis of online content. We also call for more example, fake sites are often shared by pre- content analysis of existing content belonging to identified accounts and our network of family and each of our categories so that we can better friends. Because companies like Facebook and understand their respective distinguishing Google keep their algorithms a trade secret, features. This is especially true for citizen journalism and political professional content, Bode, L., & Vraga, E. K. (2015). In related news, where our meaning analysis revealed fewer that was wrong: The correction of characteristics. misinformation through related stories We hope that the taxonomy proposed in our functionality in social media. Journal of tables and figures provide the foundation for Communication, 65(4), 619-638. identifying critical features that can aid reliable doi:10.1111/jcom.12166 classification of fake and real news by both Borden, S., & Tew, C. (2007). The role of machines and humans, and thereby promote journalist and the performance of journalism: media literacy as well as a more credible Ethical lessons from “fake news” (seriously). information environment. Journal of Ethics, 22(4), 300- 314. Acknowledgements Campbell, W. J. (2001). Yellow journalism: This research is supported by the U. S. National Puncturing the myths, defining the legacies. Science Foundation (NSF) Grant No. CNS- Westport, CT: Praeger Publishers. 1742702. Carlson, M. (2015). When news sites go native: Redefining the advertising–editorial divide in References response to native advertising. Journalism, Allcott, H., & Gentzkow, M. (2017). Social 16(7), 849-865. 10.1177/1464884914545441 media and fake news in the 2016 election. Chaffee, S. (1991). Explication. Newbury Park, Journal of Economic Perspectives, 31(2), CA: Sage. 211-236. doi: 10.1257/jep.31.2.211 Cohen, M. (2017). Fake news and manipulated American Press Institute. (2017). What is data, the new GDPR, and the future of journalism. Retrieved from information. Business Information Review, https://www.americanpressinstitute.org/journ 32(2), 81-85. doi: alism-essentials/what-is-journalism/ 10.1177/0266382117711328 Argamon-Engelson, S., Koppel, M., & Avneri, G. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). (1998). Style-based text categorization: What Automatic deception detection: Methods for newspaper am I reading. Proceedings of AAAI finding fake news. Proceedings of the ’98 Workshop on Text Categorization, 1-4. Association for Information Science and Balmas, M. (2012). When fake news becomes Technology, 52, 1-4. real: Combined exposure to multiple news doi:10.1002/pra2.2015.145052010082 sources and political attitudes of inefficacy, Crain, R. (2017). Fake news could make alienation, and cynicism. Communication advertising more believable. Advertising Age, Research, 41(3), 430-454. 88(3), 22. doi:10.1177/0093650212453600 Davis, W. (2016, December 5). Fake news or Baym, G. (2005). : Discursive real? How to self-check the news and get the integration and the reinvention of political facts: journalism. Political Communication, 22(3), All tech considered. NPR. Retrieved from 259-276. doi:10.1080/10584600591006492 http://www.npr.org/sections/alltechconsidere Berger, J.M. (2017). The methodology of the d/2016/12/05/503581220/fake-or-real-how- Hamilton 68 dashboard. Retrieved from to-self-check-the-news-and-get- http://securingdemocracy.gmfus.org/publicati thefacts?utm_campaign=storyshare&utm_so ons/methodology-hamilton-68-dashboard urce=facebook.com&utm_medium=social Berghel, H. (2017). Alt-news and post-truth in the Dempsey, K. (2017). What’s behind fake news “fake news” era. IEE Computer Society, 110- and what you can do about it. Information 114. Today, 31(4), 6. Digital Resource Center. (2017). Lesson 5: News Scientist, 61(4), 441. vs. opinion. Center for News Literacy. doi:10.1177/0002764217701217 Retrieved from Najmabadi, S. (2017). How can students be http://drc.centerfornewsliteracy.org/content/l taught to detect fake news? The Chronicle of esson-5-news-vs-opinion. Higher Education, 63(18), A22. Ecker, U. K. H., Lewandowsky, S., Chang, E. P., Ojala, M. (2017). Fake business news. Online & Pillai, R. (2014). The effects of subtle Searcher, 41(3), 54. misinformation in news headlines. Journals of Padgett, L. (2017). Filtering out fake news: It all Experimental Psychology 20(4), 323-335. doi: starts with media literacy. Information Today, http://dx.doi.org/10.1037/xap0000028 34(1), 6. Fake News (2018). In Cambridge Dictionary. Pennycook, G., & Rand, D. G. (2017). Who falls Retrieved from for fake news? The roles of analytical https://dictionary.cambridge.org/us/dictionar thinking, motivated reasoning, political y/english/fake-news ideology, and respectively. Retrieved Fetzer, J. H. (2004). Disinformation: The use of from false information. Minds and Machines, https://papers.ssrn.com/sol3/papers.cfm?abstr 14(2), 231-240. act_id=3023545 doi:10.1023/B:MIND.0000021683.28604.5b Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, Frank, R. (2015). Caveat lector: Fake news as J., & Stein, B. (2017) A stylometric inquiry folklore. Journal of American Folklore, into hyperpartisan and fake news. Retrieved 128(509), 315-332. from: https://arxiv.org/abs/1702.05638 Funke, D. (2017, November 30). A satirical fake Reilly, I. (2012). Satirical fake news and/as news site apologized for making a story too American political discourse. Journal of real. Retrieved from American Culture, 35(3), 258-275. https://www.poynter.org/news/satirical-fake- Reilly, I. (2013). From critique to mobilization: news-site-apologized- making-story-too-real The yes men and the utopian politics of Howard, P. N., Kollanyi, B., Bradshaw, S., & satirical fake news. International Journal of Neudert, L. M. (2017). Social media, news Communication, 7, 1243-1264. and political information during the US Rich, P. R., & Zaragoza, M. S. (2016). The election: Was polarizing content concentrated continued influence of implied and explicitly on swing states. The Computational stated misinformation in news reports. Propaganda Project. Retrieved from Journal of Experimental Psychology. http://comprop.oii.ox.ac.uk/wp- Learning, Memory, and Cognition, 42(1), 62- content/uploads/sites/89/2017/09/Polarizing- 74. doi:10.1037/xlm0000155 Content-and- Swing-States.pdf Samuel, A. (2016), November 29. To fix fake Kang, H., Bae, K., Zhang, S., & Sundar, S. S. news, look to yellow journalism. Retrieved (2011). Source cues in online news: Is the from https://daily.jstor.org/to-fix-fake-news- proximate source more powerful than distal look-to-yellow-journalism/ sources? Journalism & Mass Communication Silverman, C. (2016, November 16). This Quarterly, 88 (4), 719-736. analysis shows how viral fake election news Klein, D. O., & Wueller, J.R. (2017). Fake news: stories outperformed real news on Facebook. a legal perspective. Journal of Internet Law, . Retrieved from 20(10), 1-13. https://www.buzzfeed.com/craigsilverman/vi Mihailidis, P., & Viotty, S. (2017). Spreadable ral-fake-election-news-outperformed-real- spectacle in digital culture: Civic expression, news-on- fake news, and the role of media literacies in facebook?utm_term=.ilJOyKYld#.lyk3Kb9N "post-fact" society. The American Behavioral J Soares, I. (2017). The fake news machine: Inside a town gearing up for 2020. CNN. Retrieved from http://money.cnn.com/interactive/media/the- macedonia-story/ Stanford History Education Group. (2016). Evaluation of information: The cornerstone of civic online reasoning [PDF document]. Retrieved from https://sheg.stanford.edu/upload/V3LessonPl ans/Executive%20Summary%2011.21.16.pd Storyzy. (2017). About. Retrieved from: https://storyzy.com/about Sundar, S. S., & Nass, C. (2001). Conceptualizing sources in online news. Journal of Communication, 51(1), 52-72. doi:10.1111/j.1460-2466.2001.tb02872.x Sunstein, C.R. (2002). The law of group polarization. Journal of Political Philosophy, 10, 175– 195. http://dx.doi.org/10.1111/1467- 9760.00148. Turner, M. (2017, July 23). Commentary vs. Journalism: Are journalists biased? Retrieved from: https://www.schooljournalism.org/commenta ry-vs-journalism-are-journalists-biased/ Vorhies, W. (2017, May 1). Using algorithms to detect fake news – The state of the art. Retrieved from https://www.datasciencecentral.com/profiles/ blogs/using-algorithms-to- detect-fake-news- the-state-of-the-art? Wang, W. Y. (2017). "Liar, liar pants on fire": A new benchmark dataset for fake news detection. Retrieved from https://arxiv.org/abs/1705.00648 Yarros, V. S. (1922). Journalism, ethics, and common sense. International Journal of Ethics, 32(4), 410-419.