<<

The Digital Challenges to Democracy: and New Information Paradigms

Thesis submitted in partial fulfillment of the requirements for the degree of

Master of Science in Exact Humanities by Research

by

Neelesh Agrawal 201556015 [email protected]

International Institute of Information Technology Hyderabad - 500 032, September 2020 Copyright c Neelesh Agrawal, 2020 All Rights Reserved International Institute of Information Technology Hyderabad, India

CERTIFICATE

It is certified that the work contained in this thesis, titled “The Digital Challenges to Democracy: Social Media and New Information Paradigms” by Neelesh Agrawal, has been carried out under my supervision and is not submitted elsewhere for a degree.

Date Advisor: Dr Radhika Krishnan To Mom and Pop Acknowledgments

This thesis benefits from many valuable suggestions and inspirations. The most valuable support was from my research advisor, Dr Radhika Krishnan. Her zeal and hard-working attitude was infectious. Each time I met her and discussed ideas or progress on my research work, I would not only get her inputs, but also a lot of motivation. I am extremely grateful for her support throughout my academic life. My friends and “co-researchers” at IIIT; Navya, Lata, Sarigama, Vishnu, Siddhartha, Namit, Tanmay, and Ekansh were wonderful company to be in, keeping me entertained, inspired, and providing the (occasional) useful distractions. My co-passengers on the CHD train including my co-advisee Devansh Manu, Kriti, Saumya, Srujai, Bera and Pravallika were also exciting company to have. My acknowledgement also extends to some of my senior friends at IIIT; Shreedhar, Shantanu, Ameya and Ghosh, who were instrumental in influencing my approach towards academic work. A note of regard for Prof Nand Kishore Acharya, whose classes were a treasure and whose teachings have probably been the greatest influence in shaping my social science thinking. My gratitude also extends to Dr Priyanka Srivastava who was a constant source of encouragement and guidance, and working with her enriched my research perspective. The constant emotional and moral support from my parents, my sister Doll, and my childhood friend Divyansh, was crucial in keeping me sane and focused. All my accomplishments are incomplete and impossible without them.

v Abstract

The social and political shifts being perpetuated by the Internet and social media are dominating contemporary discourse around politics, society and technology. The difficulties in regulating online practices, in advancing Internet literacy and moderating learning algorithms that govern online behaviour, and moreover in preventing malicious users from infiltrating the Internet have been highlighted by scholars in this domain. As the Internet and social media have become a ubiquitous and inseparable part of our lives, so have the social shifts brought on by them. The influence of modern technology on human behaviour, both personal and social, poses a great challenge to one of the most fundamental democratic principles; human free will. This thesis highlights the anti-democratic phenomena emerging out of the incessant use of the Internet in daily life, while also looking at precursors for these phenomena through history. This thesis employs formal analysis using natural language processing as well as actor-network theory to comprehend some of these phenomena. It also connects various theories proposed by thinkers like Kierkegaard, Chomsky and Foucault, and ideas expressed by researchers, authors and journalists in recent years, to sufficiently understand these phenomena. Through the discussions and analyses presented in this thesis, the dynamics of the social ills posed by the Internet can be better understood, and this can inform further research as well as policy-making aimed at combating these issues.

vi Contents

Chapter Page

Abstract ...... vi

1 Introduction ...... 1 1.1 Chapter Organization ...... 2

2 Mass ...... 4 2.1 Introduction ...... 4 2.2 Understanding ...... 5 2.3 Understanding Surveillance ...... 6 2.4 Trial by Data ...... 7 2.5 A Political History of Modern ...... 8 2.6 A Technological History of Modern Mass Surveillance ...... 10 2.7 Data Surveillance in India ...... 12 2.8 Pervasive Surveillance ...... 13 2.9 Conclusion ...... 17

3 Information Control and Influence ...... 18 3.1 Introduction ...... 18 3.2 , Commitment and Information ...... 19 3.3 The Splinternet and Digital Tribes ...... 20 3.4 Consent as a Service ...... 21 3.4.1 Consent: Make in India ...... 23 3.5 Conclusion ...... 25

4 Shaping Political Discourses ...... 26 4.1 Motivation ...... 26 4.2 Past Work and Challenges ...... 26 4.3 Data Description ...... 27 4.4 Data Extraction and Processing ...... 28 4.5 Method for Comparison ...... 29 4.5.1 TF-IDF ...... 29 4.5.2 Word2Vec, Smooth Inverse Frequency and Cosine Similarity ...... 30 4.6 Results and Discussion ...... 31 4.6.1 Topic 1: The Revocation of Special Status of Jammu and Kashmir . . . . 31 4.6.2 Topic 2: Economic Slowdown in India ...... 32

vii viii CONTENTS

4.7 Future Work ...... 34

5 Online Disinformation ...... 35 5.1 Introduction ...... 35 5.2 Misinformation Groups and their Internet Behaviour ...... 36 5.3 Information Manipulation: Motivations and Methods ...... 37 5.4 Divergence Across Platforms ...... 40 5.4.1 Google Search Engine (and YouTube) ...... 40 5.4.2 Reddit ...... 41 5.4.3 ...... 41 5.4.4 ...... 42 5.4.5 WhatsApp: The Anomaly ...... 42 5.4.6 Anonymity and Transparency ...... 43 5.5 Misinformation and Online Behaviour in India ...... 45 5.6 Conclusion ...... 47

6 Disinformation Trends Across Social Media Platforms: An Actor-Network Analysis .. 48 6.1 Motivation ...... 48 6.2 The Actor-Network Theory ...... 48 6.3 Understanding Actors in ANT ...... 49 6.4 How the network evolves in ANT ...... 50 6.5 Actor-networks on social media platforms ...... 51 6.5.1 Problematization ...... 52 6.5.2 Interessement ...... 55 6.5.3 Enrolment ...... 56 6.5.4 Mobilization and Analysis ...... 56 6.5.5 Visual representation ...... 57 6.6 Discussion ...... 57 6.7 Future Work ...... 59

7 Conclusion ...... 60

Related Publications ...... 62

Bibliography ...... 63 List of Figures

Figure Page

4.1 Histogram of comparison for Topic 1 ...... 31 4.2 Word cloud comparison for news (left) and tweets (right) in topic 1 ...... 32 4.3 Histogram of comparison for Topic 2 ...... 33 4.4 Word cloud comparison for news (left) and tweets (right) in topic 2 ...... 33

6.1 The honeycomb of social media [64] ...... 52 6.2 Visual representation for the disinformation ANT ...... 58

ix List of Tables

Table Page

4.1 Distribution of extracted news articles ...... 28

x Chapter 1

Introduction

“And does not tyranny spring from democracy,” asks Socrates, prompting Plato to examine how the very nature of democracy is such that it would give way to tyranny in The Republic.[88] While Plato’s critique of democracy is debatable, he did describe how a free man may tend to pursue “useless and unnecessary pleasures”, and thus only a “philosopher king” who knows what is best for everyone is the best ruler for a state. This notion has endured in the modern world too, and there has thus been an effort to “control” human free will. The intention of studying behavior and human nature and controlling actions and thoughts of people is the bedrock of certain academic disciplines. In the early 20th century, behavioural psychology emerged as a way to essentially measure all human behaviour using external stimuli. Shoshanna Zuboff identifies the founder of behavioural psychology, BF Skinner, as “an early priest of instrumentarian power”, or in other words, a pioneering force in the quest to control and study human beings.[108] Eventually, in conjunction with statistical market research, this became a crucial foundation of a novel scientific approach towards advertising. However, the conjunction of statistics and behaviorism was not limited to advertising alone. The concept was essentially rooted in an effort to make all human decision-making quantifiable, or at least amenable to scientific evaluation. The applications of this have been quick to reach socio-economic and political spheres. Many companies started evaluating potential employees using these methods, and banks started requiring borrowers to take personality profile tests. Data analytics on the Internet in the 21st century has further helped extend this concept, providing more significance to statistical and behavioural studies. It is perhaps no surprise then that groups such as the Cambridge Analytica employed behavioural psychology methods (such as Myer-Briggs and the Big 5 personality tests), among others, to build the users’ psychological profiles. The availability of large-scale data has boosted the potential of statistical and behavioural methods to be applied to ever larger domains. The manipulation of voter blocs by Cambridge Analytica is a typical example. China’s “social credit system” is another such example, where all citizens have a social ranking based on their public behaviour which is recorded through various Internet apps on smartphones. This phenomena is not limited to autocratic states such as China. Modern liberal democracies too have been trying to include social behaviour as a factor to evaluate financial credit ratings. This social behaviour is captured

1 by studying online behaviour, or “digital footprint”, on financial apps, shopping and e-commerce apps, social media and even dating apps. The control of human behaviour by discipline and observation is only one side of the coin, the other being by influence and control of information. and Aldous Huxley present contrasting forms of this control in their respective dystopian novels, 1984 (1948) and Brave New World (1932). While the authoritarian government of 1984 entirely controls what information reaches the people and what does not, pseudo-utopian “World State” of Brave New World suffers the problem of information overload, reducing its masses to passivity. As Neil Postman aptly puts it, “Orwell feared that the truth would be concealed from us. Huxley feared the truth would be drowned in a sea of irrelevance.”[86] The contemporary world has evolved to partake in both kinds of control. While the governments and corporates of the world massively control what mainstream news media reports, the echo chambers of social media keep most users engaged in a singular stream of thought, and away from diverse perspectives and ideas. This is not to say that the contemporary world has devolved into a dystopia, but dystopian features are emerging in our interaction with the Internet and social media. These emerging issues have huge implications for our contemporary social system, primarily founded on democratic principles. The democracies of the world are in danger, due to polarized political discourse, heavy infringements on privacy, and a general attack on human free will. Human life itself is in danger with the social media fuelled rise in disinformation, which has led to mob lynchings in India. This research identifies three main, loosely interlinked, anti-democratic phenomena that have been massively scaled up with the rise of the Internet and social media; mass surveillance, information control, and disinformation. All three have emerged out of a felt need to control people, by the governments and corporations of the world, to gain political power, ideological leverage, or simply profit.

1.1 Chapter Organization

The stratagem for this thesis broadly consists of two approaches. The first approach is in presenting and connecting different studies and perspectives that have discussed control of people through technology. The second approach is in examining some elements of these phenomena through formal analysis. Chapters 2, 3 and 5 are based on the first approach, while chapters 4 and 6 are based on the second approach. In chapter 1, the notions of controlling human behaviour and thought, and the effects of the Internet and social media in aiding this control are introduced. The three main anti-democratic phenomena emerging through this control are also identified. In chapter 2, the first of the anti-democratic phenomena, mass surveillance, is analysed. Basic ideas in the context of surveillance are explained, and the history of modern mass surveillance is discussed. A history of Internet-enabled mass surveillance in India is also examined. Finally, the emergence of a decentralized pervasive surveillance mechanism with the rise of social media is presented and discussed.

2 In chapter 3, the second of the anti-democratic phenomena, information control, is examined. The crit- icisms and issues with and the press are examined through the writings of Søren Kierkegaard and the work of Edward Herman and Noam Chomsky. These are then extrapolated in the context of the contemporary world and the Internet and social media. The emergence of an emotional, polarizing political and social culture is identified. The control on news and information in contemporary India is also discussed. In chapter 4, a comparison of political discourse on mainstream news media and social networking sites is studied using methods from natural language processing. This study attempts to analyse whether discourse on social media is influenced by news media reporting or is independent of it. In chapter 5, the third of the anti-democratic phenomena, disinformation, is discussed. The role and history of typical disinformation-perpetuating groups on the Internet is examined, along with their online behaviour. The predominant motivations and methods of these groups are also presented. The trends in disinformation across online platforms are also briefly analysed, and the anomalous trend in disinformation on WhatsApp is also identified. Finally, disinformation and online behaviour in India is discussed, and the emergence of WhatsApp as a central player is also analysed. In chapter 6, the mechanisms enabling disinformation are analysed using the Actor-Network Theory (ANT), with the primary objective of observing the differences on social media platforms, and particularly the anomalous behaviour of WhatsApp. The concepts of usage of ANT are discussed, and then applied to study disinformation on three social media platforms; Twitter, Facebook and WhatsApp. The results are then discussed in the context of an ethnographic study conducted in four Indian states by Banaji and Bhat et al.[98] The potential of this study to benefit future ethnographic and quantitative studies is also presented. In chapter 7, a complete picture of the interactions of the three anti-democratic phenomena is presented, and steps that are being taken by various organizations to counter each of these are also discussed. The future of research in studying these phenomena is also briefly discussed.

3 Chapter 2

Mass Surveillance

2.1 Introduction

Inspection functions ceaselessly. The is alert everywhere.

Michel Foucault[77]

Prerequisite to the objective of controlling human behaviour, is a need to observe this behaviour. Foucault claims that one of the earliest examples of such control through observation predates the Internet and even the societal structure that exists today; in seventeenth century Europe. Foucault observes surveillance and control mechanisms manifesting through a seventeenth century French town’s administration to control plague. A system of permanent registration with centralized records, organized and segmented housing, and constant watch by guards, was established in the town, to track plague- stricken individuals and effectively quarantine them. Foucault essentially envisions this as a perfect laboratory setting for the government to study social control. Each town was inspected by a “syndic”, who locked households, and made daily roll calls, tracking each inhabitant’s health. Any changes in one’s health conditions were documented in the permanent centralized record. Foucault notes that “the plague is met by order; its function is to sort out every possible confusion”.[77] This ordering ultimately led to massive control of the urban space and expanded political power drastically. Foucault observes that seventeenth century plague-ridden Europe can thus be credited for birthing modern power structures, including government surveillance. The rise of the Internet, however, has greatly escalated mass surveillance. Individuals are now generating data at an immensely large scale. Unfortunately, most Internet corporations are dependent on advertising, and advertising depends on statistical and behavioural methods using large-scale data. Hence, the Internet must run algorithms that can track or surveill users’ online data, and analyse them to generate appropriate, targeted advertisements. Internet corporations’ use of surveillance and data analysis is not limited to advertising. Today, most online sites and apps use these methods to make their products addictive to the users. This phenomenon is known as “persuasive technology”.[29] However,

4 these methods are amenable to socio-economic and political use as well, and since the methods are already in place, it is nearly impossible for organizations and governments to not use it for their own means. This problem has exposed individuals to large-scale surveillance, and with the Internet reaching every aspect of life, individuals’ privacies have completely been compromised.

2.2 Understanding Privacy

Privacy is defined in the Oxford dictionary as “a state in which one is not observed or disturbed by other people”.[5] The has been defined by Alan Westin as “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others”.[21] The right to privacy is often regarded as adjunct to the right to life and liberty, and is a fundamental human right in the Universal Declaration of by the United Nations. Privacy is often conflated with secrecy. In public discourse, and particularly in narratives promoted by the state or by corporations, it is argued that if one has nothing to hide, one has no need for privacy. Westin however argues that privacy does not imply that there is a secret to be kept or something to be hidden.[21] When one makes love, or goes to the bathroom, or even has a formal conversation, there is a felt need to not be observed or disturbed. Everyone knows what these activities entail, meaning that there are no secrets. Yet, all these activities warrant the need for privacy and individuals have the right to not let others know details about them. Privacy, much like any other human right, should be looked at as an “always already”; as an a priori condition. A question must only be raised in the yielding of this right. As with all fundamental rights, the infringement of the right to privacy has to be evaluated against its possible benefits. “An infringement of privacy must be justified by a compelling state interest, and the infringing law must be narrowly tailored to serve that interest”, writes Gautam Bhatia explaining this.[54] The BBC editorial guidelines also give an example as following:

We must balance the public interest in freedom of expression with the legitimate by individuals. Any infringement of privacy in the gathering of material should be justifiable as proportionate in the circumstances of each case. . . . normally only report the private behaviour of public figures where their conduct is unlawful or where broader public issues are raised either by the behaviour itself or by the consequences of its becoming widely known.[35]

What can be inferred from Bhatia and the BBC guidelines is that the state, as well as media and corporate organizations must be wary of infringing on this right, understanding that they are treading on a thin line. With the Internet, social media and data surveillance, this cost-benefit analysis has become even more muddled.

5 2.3 Understanding Surveillance

Surveillance is defined as “the focused, systematic and routine attention to personal details for purposes of influence, management, protection or direction”.[47] A surveillance state is a country where the government pervasively surveils all the activities of its citizens and visitors. This may be done under different pretexts, including, but not limited to, crime prevention, control of misinformation, and gaining insights on public opinion. Surveillance seems very similar to espionage, but the two are fundamentally different. “Espionage is as old as governments,” explains Bruce Schneier, while surveillance “is much more of a law enforcement sort of activity than a military activity”.[34] Espionage is different from surveillance, primarily in that the former is conducted over non-citizens, living outside the state, while the latter is done by the state on its own citizens (and sometimes, tourists or visitors). However, it would not be wrong to say that surveillance has evolved out of espionage. To understand the birth of surveillance from espionage, one needs to look at the world before democracies and defined borders, when governments were primarily authoritarian (or monarchies), and who they ruled over was unclear. Political under such governments were enemies of the state, and hence non-citizens, who were thus spied upon and tried for treason. Besides this, the tools and technology created and developed for espionage have directly been borrowed into surveillance. In fact, surveillance may even be looked at as a remnant or extension of the espionage regime that an authoritative government conducted on its own citizens (or subjects). When trying to define surveillance, and distinguishing it from espionage, it is also important to understand what the term “government” signifies. For the arguments contained herein, government refers to democratic governments, unless clearly specified otherwise. To define what a democratic government implies:

Broadly speaking, it is both a principle of how to govern ourselves, and a set of institutions which allow for sovereignty to be derived from the people. Exactly how this works changes from place to place and over time, but easily the most workable and popular version is modern liberal representative democracy [emphasis in the original].[63]

Modern liberal representative democracy also means that the governance in the state should have matured enough over the years. This includes most modern Western democracies, such as the United States, France, Germany, as well as nations like India and Brazil. Nations like China that claim to be a democracy but exercise autocratic rule, or like Pakistan that have fluctuated between autocratic and democratic rule, would not be included in this definition. It is also important that the state actually functions as a democracy and is not merely set up as one. The former USSR as well as modern Russia, and countries such as Namibia and Angola are examples of the latter. This distinction is important as the terms “surveillance”, “privacy” and “espionage” do not hold the same value in other government systems. Democracies have to ensure certain basic human rights that autocracies are not entitled to. Political dissidence, a healthy symptom of a functioning democracy,

6 becomes a crime in an autocracy. Hence, it only makes sense to look at surveillance in modern democracies.

2.4 Trial by Data

Show me six lines written by the most honest man in the world, and I will find enough therein to hang him.

Cardinal Richilieu

What French statesman Cardinal Richilieu stated in the 17th century well reflects our worst fears of perfect data surveillance in the 21st century. With enough data on an individual, it is possible to indict that individual for a crime, and now, with all kinds of data available on the individual, it is easy to indict him/her for any crime. While surveillance is done primarily under the pretext of law enforcement, the term “law enforcement” has duplicitous meanings. Some may interpret it to include the curbing of political and social dissidence. This is the reason that surveillance has often targeted minority groups. Perfect surveillance is nothing short of an attempt to establish political and social conformism. Bruce Schneier, in his book Data and Goliath[34], recounts an incident from 1964, when the then FBI director J Edgar Hoover, having learned of Martin Luther King Jr’s extramarital affairs, wrote a letter to King pursuading him to commit suicide. Certainly, this could have had far-reaching impacts on the history of the US and the world. Similarly, FBI and other intelligence organizations in the US have in the past tried to curb strong political and social dissidence. The FBI even infamously runs COINTELPRO, which is a counterintelligence program designed to neutralize political dissidents. As Schneier further points out, minority groups are targeted only because of their ethnicity, regardless of there being any evidence of crime or wrongdoing. This goal of conformism is also exemplified by the (now rejected) proposal of establishing a “Social Media Communications Hub” in India.[105] Its stated goal was to “mould public opinion” and “inculcate nationalistic feeling”, clearly influenced by the right-wing BJP government. As explained earlier, the infringement of privacy must be weighed against the possible benefits of it. Individual privacy may be infringed to reveal certain criminal or antisocial behaviour. That is why it is often argued that targeted surveillance is somewhat essential to law enforcement. Targeted surveillance is conducted only on very specific persons of interest. Law enforcement agencies are known to often monitor the activities of known criminals, who otherwise try to hide their activities or manipulate these activities to appear just on the right side of the law. Any infringement thus offers them the opportunity to nab these criminals. Detection of such infringements is only possible through targeted surveillance. FBI has been able to indict such notorious criminals as Al Capone using targeted surveillance. However, targeted surveillance is also prone to misuse, as is well exemplified by the aforementioned incident where J Edgar Hoover induced Martin Luther King Jr to commit suicide.

7 Usually, the judiciary is meant to step in to monitor such abuses of power, and it has in many cases across nations. The courts can often argue that there has to be sufficient evidence of wrongdoing before targeted surveillance is conducted on an individual, or sufficient reason to believe in the possibility. However, this evaluation can only be carried out in the case of targeted, individual surveillance. Over the years, a more extensive form of surveillance has emerged; mass surveillance. Mass surveillance is the surveillance of the entire population of citizens. It is often singled out as the most defining feature of a totalitarian state, and is exercised by modern autocracies such as North Korea and China. However, it is also quite conspicuous in modern democracies, especially the United States, the United Kingdom and India. Though the emergence of mass surveillance can be traced back to the world wars, only with the development of electronic communication and digital technology has it become feasible. The Internet has scaled up mass surveillance to a level unimaginable only 15 years ago. What it has enabled further is the blurring of the very definition of state and citizen. A user on Facebook is no more a citizen of one’s own state than that of Facebook itself. Hence, the idea of surveillance being a government activity alone has also changed. Now, Internet companies act as quasi-states, surveilling its users or quasi-citizens, merely for providing targeted advertising and monetary gain. As Shoshana Zuboff observes, the Internet has given way to the economic model of “surveillance capitalism”, with human actions to be used as a raw resource to be refined into behaviour, which can then be used for advertising.[108] Internet companies have profited greatly through this economic model. Some of these companies have also colluded with state governments and other organizations to make the data they obtain available for further use. This surveillance and collusion by Internet companies has further complicated the discourse around surveillance.

2.5 A Political History of Modern Mass Surveillance

Mass surveillance, in the actual sense of the word, finds its foundations in the Western world. While Foucault traces the birth of surveillance to seventeenth century plague-ridden Europe, the origins of modern mass surveillance can be found following the first World War, when wartime interception of communication continued into peacetime in the United States, under an organization called “The Black Chamber”. This was a forerunner to the National Security Agency in the US.[55] The organization eventually shut down in 1929, but the possibility of mass surveillance was ingrained in western political systems. One of the earliest establishments of mass surveillance comes well before the digital age in 1950 in the erstwhile East Germany. The State Security Service, commonly known as the Stasi, ran an extensive surveillance network, and looked out for even minute hints of political dissidence. The citizens were aware that the government was spying on them, and hence maintained dual identities; discussing sensitive topics only with close family and in the confines of their own houses, while maintaining a facade of conformism in public.

8 Following the second World War, there was widespread fear of communism in the West. While being a communist was a fad in the 1930s, it suddenly became a crime in the 1950s, as Senator Joseph McCarthy started a so-called “witch-hunt” of possible communists. This was paralleled by the establishment of the National Security Agency or the NSA in 1952, as well as the founding of a program known as ECHELON in 1972 in the US, and aided by the United Kingdom, Canada, Australia and New Zealand.[66] Together these nations came to be known as the Five Eyes. The objective of ECHELON was originally to monitor communications from the Soviet Union and the Eastern Bloc. Intelligence agencies in the US, including the FBI and the NSA, often conducted clandestine surveil- lance operations having nothing to do with law enforcement. Besides Martin Luther King Jr, many popular figures such as Charlie Chaplin, Frank Sinatra, Albert Einstein and John Lennon, were targeted for surveillance by the FBI for expressing political dissidence. These clandestine operations came to light following the Watergate Scandal, when Senator Frank Church started investigating other possible surveillance operations, and concluded that “intelligence agencies have undermined the constitutional rights of citizens”.[55] Even as the Cold War ended, these agencies maintained their operations on the pretext of monitoring “other domestic threats”. By the late 1990s, ECHELON could intercept and monitor satellite communica- tion, circuit telephone systems, Internet traffic, and radio transmissions. The existence of ECHELON was a secret till the end of the 20th century when the European Parliament set out to investigate.[100] Even as the report issued by a committee of the EU confirmed its existence, the USA and UK denied it, as is pointed out by the following BBC report:

Echelon spy network revealed: Imagine a global spying network that can eavesdrop on every single phone call, fax or e-mail, anywhere on the planet. It sounds like science fiction, but it’s true. Two of the chief protagonists - Britain and America - officially deny its existence. But the BBC has confirmation from the Australian Government that such a network really does exist. . . [18]

2001 marked a major shift in mass surveillance programs. As Robert O’ Harrow Jr writes, “After the terror attacks on September 11, 2001, our government leaders could not resist the promise that information technology would make us safe again.”[94] In 1978, Senator Frank Church’s investigation into domestic surveillance revealed intelligence abuses. In response to the criticism that followed in its aftermath, the Foreign Intelligence Surveillance Act or FISA was passed to issue secret search warrants for surveillance. Following the terror attacks of 2001, the US further passed the Patriot Act bolstering mass surveillance capabilities, and allowing warrants to be fast-tracked in FISA courts, and has since repeatedly amended FISA to boost surveillance. With the Internet having a global reach, and major Internet companies being based in the Silicon Valley of the US, this mass surveillance program had global implications. The earliest expose´ of post-2001 surveillance happened in 2005, when the New York Times revealed some of the scope of the NSA’s surveillance programs, followed by another report by USA Today in 2006.[55] Robert O’Harrow Jr documents many domestic surveillance instances in his 2006 book No

9 Place To Hide, also pointing to the use of the collected data for purposes other than counter-terrorism. Bruce Schneier reports that the mass surveillance data has been used more frequently by the Drug Enforcement Administration (DEA) of the US, than for counter-terrorism purposes.[34] Even as media organizations such as WikiLeaks and The New York Times, among others, reported on these programs, with American magazine Wired even publishing an extensive report on NSA’s spy network in 2012, the NSA continued to deny any allegations of mass surveillance.[23] It was only in 2013 that the true extent of the mass surveillance programs in the USA and the UK was revealed verifiably. This started with whistleblower Edward Snowden handing over 200,000 top secret documents to selected media outlets. Further investigations pointed to the levels of general surveillance being too high, and confirmed allegations of mass surveillance against the NSA and other US intelligence organizations. The misuse of data was also revealed in much more detail, as reports of NSA agents using surveillance systems to track their lovers emerged. This practice was so common that it even has its own nickname ‘LOVEINT’, a play on words on SIGINT, short for signals intelligence, a technical term for intercepting communications. Such abuse has been reported across news outlets.[101],[22] As these massive revelations came to light, the European Union (EU) had to try and combat the reach of these surveillance programs in its own member states. EU justice and rights commissioner Viviane Reding was quoted as saying, “The question has arisen whether the large-scale collection and processing of personal information under US surveillance programmes is necessary and proportionate to meet the interests of national security.”[61] In the immediate aftermath of the Snowden revelations of 2013, the EU sought to issue a set of guidelines known as the General Data Protection Regulation or GDPR to regulate Internet companies operating in Europe. In 2016, the EU issued a new framework for data flow between Europe and the US, known as the “EU-US Privacy Shield”. While the US and the UK continue to run their mass surveillance programs in the name of national security, the EU has strongly prioritized the right to privacy, seeking to protect the data of its member nations.

2.6 A Technological History of Modern Mass Surveillance

To understand surveillance on the Internet, it is important to first look at certain precursors. These precursors capture the essence of what surveillance on the Internet has really evolved from, and the complexities it entails. Arguably the most essential feature of espionage has been the interception of communication. In ancient Rome, it simply meant a passer-by overhearing conspiring senators or the interception of written communication between military generals. With the emergence of telecommunication, this evolved to mean intercepting phone calls and telegrams. Encrypting communication hence became important, with the earliest encryption method emerging in ancient Rome, famously known as Caesar’s cipher. However, encryption meant that espionage had to grow sophisticated enough to be able to decrypt these messages. The Caesar cipher turned out to be easy to decrypt, and so encryptions became more and more

10 complicated. During the second World War, mathematician Alan Turing with his team at Bletchley Park spent two years trying to break the Nazi encryption code known as ‘enigma’.1 Early surveillance programs similarly emerged in the interception of communication, except in this case, the communications being intercepted were those between the state’s own citizens. A listening device being connected to telephone lines, known as ‘wiretapping’, was used by the FBI since the 1940s for targeted surveillance. Secret cameras and microphones were also installed for the purposes of targeted surveillance. CCTV cameras are simply a manifestation of secret cameras being applied to mass surveillance. Similarly, the monitoring of centralized telephone networks and Internet traffic is an extrapolation of intercepting targeted communication to intercepting communication involving all individuals. Essentially, just as surveillance methods evolved out of espionage methods, the methods for mass surveillance have evolved from the methods for targeted surveillance. Mass surveillance was only enabled by the advent of a centralized communication facility. This only became feasible in the 1990s with the rise of the Internet. With the Internet, every individual was connected, and just by intercepting one node, one could access all the connected nodes. However, it is not simply the structure that has enabled the scale of mass surveillance seen today. Computers and the Internet are in essence data-driven entities. “Computers constantly produce data. It’s their input and output, but it’s also a by-product of everything they do,” says Bruce Schneier. As the Internet grew more sophisticated and content-rich, users generated even more data. The exponential growth of computing power has also enabled us to embed computers into other electronic devices. The development of artificially intelligent algorithms to make informed, objective, automated decisions has further enabled the growth of the Internet, and hence mass surveillance. These algorithms work on data, and hence data must first be generated for these to give efficient outputs. The modern fascination with automation and objectivity has escalated our reliance on these algorithms. This has started a self- feeding loop; a user wants to incorporate these algorithms in more aspects of their lives, these algorithms require data, and so the users generate computational data on those aspects of our lives. Furthermore, devices such as telephones, automobiles, and even kitchen appliances, are increasingly being connected to the Internet for running more sophisticated algorithms. Thus the growth of these algorithms ensures constant generation of data in all aspects of our lives, making this data amenable to surveillance as well. These devices imbued with computing power are called ‘embedded systems’, and with the incorporation of algorithmic power and the Internet, these systems are now being touted to form the ‘’. Another major development that has boosted the scope for mass surveillance has been the rise of social media in the early 2000s. The ubiquitous nature of information shared by Internet users on social media has made actual mass surveillance conceivable for the first time in history. The data shared on social media can be analysed and modelled to generate all kinds of profiles on users, including psychological, political, and consumerist profiles. Massive deep learning algorithms are already running across the Internet to generate these profiles, primarily for targeted advertising. Nonetheless, the other uses of this

1Alan Turing’s interception and decryption of secret Nazi communication is estimated to have shortened the war by two years, saving over 14 million lives.

11 information, and these algorithms, are no secret, as is illustrated by the Cambridge Analytica scandal. During the 2016 US presidential election campaigns, British organization Cambridge Analytica (CA) illegitimately, acquired data of over 220 million Americans, including 50 million Facebook users, and used it to generate psychological and political profiles of voters. While CA’s role in the US elections was disclosed in 2018, forcing the company to close operations, CA’s executives have stated that the company had worked in more than 200 elections around the world. Interestingly, Cambridge Analytica finds its origins in an academic project. “The great tech pioneers, of course, do not share this concern (of digital technology’s adverse effects on democracy) because they are firm believers in a sunny techno-utopia and in their ability to take us there,” states Jamie Bartlett.[63] Indeed the founders of the major Internet corporations envisioned a utopia emerging from the systems they had built, and they continue to believe in their vision. The latter is best exemplified by the 2017 ‘manifesto’ by Mark Zuckerberg[82], where he envisions a global community emerging as Facebook brings us closer together, in opposition to the online echo chambers.

2.7 Data Surveillance in India

India has been far behind in protecting privacy than most European nations. While the Snowden revelations of 2013 caused a furore in Western societies, the Indian government was silently rolling out its own surveillance programs similar to NSA’s surveillance programs. In a 2015 essay, Gautam Bhatia recounts that neither is privacy mentioned in the Indian Constitution, nor was it part of the Constituent Assembly Debates, and furthermore, how a provision to prohibit ‘unreasonable searches and seizures’ (on the lines of the American Fourth Amendment) was rejected in the Assembly.[54] Despite this, the Supreme Court had often fairly accounted for an individual’s right to privacy in its judgements by invoking Articles 14, 19 and 21 of the Indian Constitution, as is illustrated by Bhatia in a 50-year biography of Supreme Court cases. However, Bhatia also expresses concerns over recent (as of 2015) decisions by the Supreme Court that had seemingly unfairly denied the right to privacy. A landmark development since Bhatia’s essay has been the upholding of the right to privacy as a fundamental constitutional right in 2017 by the . Hence, it can now be claimed that the Indian constitution guarantees a fundamental right to privacy. Nonetheless, India’s state surveillance programs continue to function without any interference from the Supreme Court. Much like the United States, India’s mass surveillance programs were a response to the Mumbai terror attacks of 2008. The first such program implemented was the Central Monitoring System or CMS. It sought to automate the process of lawful interception of communications, by centralizing all telecommunications. Before this, intelligence agencies were required to contact the service providers for any information. However, CMS ensures that all communications are collected, stored and analysed in a centralized manner in real time. These communications include not only SMS and phone calls, but also

12 data that travels across the Internet via the TCP/IP protocol. This potentially means the interception of even general emails and chats.[44] Another mass surveillance program run by the Indian government is NETRA (Network Traffic Analysis).[117] It was developed by the Defence Research and Development Organisation (DRDO). The program started with ‘Project Black Knight’ in late 2013, and has been fully functional since 2014. The NETRA is rumoured to provide each agency the ability to process 300 TB of information per second. It can monitor Internet traffic on a real time basis using both voice and textual forms of data communication, especially social media, communication services and web browsing, and analyse social media trends and identify sources of various viral messages. While NETRA is predominantly used for espionage, domestic agencies too have limited access to the program. In fact, it can be used in conjunction with other domestic surveillance programs including the aforementioned CMS. Unfortunately, there is limited-to-no information regarding NETRA in the public domain, and hence it is difficult to surmise its current capacity.[117] Around 2013-14, Indraprastha Institute of Information Technology, Delhi, developed a social media surveillance tool called Advanced Application for Social Media Analytics or AASMA. According to an article in Scroll.in[69], it has been deployed by 40 government departments as of April 2017, including the police and other government agencies. The tool is capable of collecting and analyzing citizens’ social media feeds in real time. It can be used to “track public views and sentiments on various social media platforms” and to handle “sensitive issues and protests”, implicitly aiming to curb social and political dissidence. The program has grown without any public scrutiny or uproar about its clear infringement on the citizens’ right to privacy. More recently, in 2018, the Indian government proposed a “Social Media Communications Hub”, with a stated goal of influencing online discourse. It would give analytical inputs to the government to “mould public opinion” and “inculcate nationalistic feeling”. Following a PIL, the Supreme Court promptly asked the government to withdraw the proposal, citing the 2017 judgement that upheld the right to privacy. Nonetheless, as an source who worked closely on AASMA stated, “such programs will come into effect as soon as public scrutiny comes down, which tends to happen rather easily in India.”

2.8 Pervasive Surveillance

There is a tendency to look at surveillance as an invasive phenomenon, and ignore its pervasive nature. Mass surveillance is pervasive by definition. Yet, the notion remains difficult to conceive as mass surveillance has never been feasible, until the Internet and the explosion of digital technologies. The Internet, and particularly social media, ensure the pervasive capacity of surveillance, owing to extensive information shared by users, and governments’ and corporations’ direct access to this information. In other words, the ubiquitous nature of the Internet, especially social media, has finally enabled mass

13 surveillance. However, surveillance has really turned pervasive in that individuals are now both the surveillant and the surveilled. In the first chapter for his book People vs Tech, Jamie Bartlett invokes ideas from philosopher , the founder of , pointing out how concepts introduced by him have manifested with digital technology.[63] The first concept he talks about is an architectural design called the ‘panopticon’, discussed earlier. Bartlett draws a comparison to this system and that of modern electronic and Internet surveillance. Bartlett is not the first to use the panopticon as a metaphor for surveillance. The metaphor itself was first popularized by in 1975. Foucault used the ‘panopticon’ as a metaphor for the modern disciplinary society in Discipline and Punish, originally titled in French as “Surveiller et Punir”.[77] The panopticon is an architectural design introduced by Jeremy Bentham. Foucault himself came across this design while studying historical . The word is derived from the Greek word panoptes, meaning ‘all-seeing’. Bentham’s design was meant for institutional buildings that require monitoring and control, such as , schools and sanatoriums, and Bentham himself primarily focused on the panopticon for a . Bentham suggested a structure where all the inmates of an institute can be monitored centrally, and by a single security personnel. The inmates can themselves not see whether they are being monitored. The idea, therefore, was that the inmates would behave as though they are being watched at all times, as they cannot be certain that they are not being watched. The 1984 film adaptation of George Orwell’s Nineteen Eighty-Four was noted for its use of a “cinematographic panopticon” to depict state surveillance in the narrative.[102] In 1990, Mike Davis drew attention to the similarities of CCTV technology with the panopticon design; there is a central observer, unseen to those being observed. He stated that the design of urban malls with the central CCTV observation tower “plagiarizes brazenly from Jeremy Bentham’s renowned nineteenth-century design for the ‘panopticon prison’”.[46] In 1993, Surveillance studies expert David Lyon elaborated on the ‘electronic panopticon’ enabled by data surveillance. He states that, “where vestiges of the Panopticon are present within electronic surveillance, they present a challenge to social analysis and to political practice”, concluding that “no single metaphor or model is adequate to the task of summing up what is central to contemporary surveillance, but important clues are available in Nineteen Eighty-Four and in Bentham’s panopticon.”[45] However, the modern panopticon, as Bartlett describes it, is not a centralized system, as “everyone is both watching and being watched”; the panopticon is now both-ways. With social media, users constantly upload as well as pass judgement on uploaded content. There is an angry social media mob lurking somewhere, making sure that a user cannot say or do something radical without facing their wrath. “Being always under surveillance and knowing that the things you say are collected and shared creates a soft but constant self-”, points out Bartlett. Much like Bentham’s original panopticon, there is an invisible monitoring, and the inmates on social media must behave. Hence, social media creates a pervasive surveillance environment; a perfectly feasible version of the panopticon. Even more so, this is a classic example of the consequential infringement of the right to liberty (of expression), with infringement of the right to privacy.

14 The second concept Bartlett recounts from Bentham is that of the ‘felicific calculus’, also known as Bentham’s moral calculator, designed in 1789. The founder of utilitarianism, Bentham wanted to calculate the exact degree of pleasure that a specific action will cause. Utilitarianism evaluates ethics by the consequences of an action, and seeks to maximize pleasure or happiness and minimize pain. The difference between pleasure and pain may be considered utility, and hence, utilitarianism essentially seeks to maximize the utility of an action. As Bartlett points out, Bentham was interested only in that which could be measured and categorized, and thus the idea of a machine to calculate utility fascinated him. Bentham indeed designed a proto-algorithm, much like the modern machine learning algorithms, factoring in such variables or vectors as “intensity”, “duration”, “remoteness”, “uncertainty” among others. Bartlett points out that Bentham did not have either the computing power or sufficient data to actually implement his algorithm. However, modern machine learning algorithms who. Bartlett even goes on to define what he calls a “Felicific Calculus 2.0”. This version has access to millions of data points, such as people’s health, age, immunity, etc. In fact, this version does not even have to be utilitarian, and ethical standpoint could just be another data point, as Bartlett illustrates:

Me: I am a deontologist who believes in the categorical imperative. Can you verify that no one is harmed unnecessarily in the making of the latest trainers (and if not, please buy me a pair)? FC 2.0: I have not ordered you a pair of the latest trainers. I advise you to consider changing brands.

Bartlett says that the Felicific Calculator 2.0 is imbued with the original’s basic flaw; it is highly reductionist, and cannot possibly capture the intricacies of human decision-making. Unfortunately, as is the case with most modern technology, it does not have to be perfect, but simply outperform the human mind. The modern obsession with data and numbers further ensures that such algorithms would thrive. Electronic and data surveillance ensures that there is enough data for these algorithms to work as well. Cathy O’Neil, in her book Weapons of Math Destruction, reported several instances where algorithms have already taken over human decision-making in such critical fields as college admissions, parole applications, police officer distribution, teacher evaluations and so on. She calls these algorithms ‘Weapons of Math Destruction’, a play on words from the term ‘Weapons of Mass Destruction’, pointing to their destructive characteristics.[37] Virginia Eubanks points out that these algorithms strongly require adequate enforcements of fairness, accountability, and transparency.[118] These algorithms, though seemingly objective, always inadvertently incorporate certain subjective biases of their creators. As a Data & Society report on algorithmic accountability points out, “Critically, algorithms do not make mistakes, humans do.”[90] They have primarily identified the lack of auditing or testing of these algorithms, for the negative discriminatory outcomes that they have led to. They have proposed a sequence of testing similar to the roll-out for any software product, albeit also accounting for potential biases. These steps are through pre- and post- marketing trials, as well as third-party auditing.

15 Currently, many journalists audit some of these algorithms by reverse-engineering the algorithm to detect any biases. Nonetheless, Bartlett points out that the real threat is not in these algorithms failing, but in them performing better than humans, and increasing the efficiency of decision-making. As these algorithms improve efficiency, they would inadvertently find their way into such fields as crime prediction and social profiling. The invisible injustices of these algorithms that both O’Neil and Bartlett point to would be just that; invisible. The greatest danger of such heavy reliance on would mean that people would lose the ability to think freely for themselves. Both the modern panopticon and the Felicific Calculator 2.0 point to us losing fundamental char- acteristics of what defines us as individuals in democracies. While the former seizes our freedom of expression, the latter robs us of our free will. These outcomes ultimately lead to us losing control of our own lives to an invisible central force driven by digital technology. These paradigms are typical features of the “surveillance capitalism” economic model conceptualized by Zuboff [108], as discussed earlier. The surveillance capitalist economy not only monitors but also inadvertently moderates behaviour. The latter quality is subtle, but poses a threat to human nature itself. Just like the Felicific Calculator 2.0, the present Internet regime values the various optimization faculties presented by modern technology, and even the harvesting of the most is permissible to achieve this. The discovery of what Zuboff terms as “behavioural surplus” has aided this process. The “behavioural surplus” is the surplus value generated by mining enormous behavioural data. This surplus becomes available to corporates that convert it into multiple kinds of marketable products, with the ultimate purpose of maximizing profits. This surplus is now not only available for an individual’s online activity, but any activity where an individual leaves a digital footprint; while using GPS, smart devices, or even public transport. The Internet corporations are not only the creators of this surplus, but also its users. This surplus is utilized by Internet corporations to influence human action; leading them to content that is likely to catch their attention most in the given situation. Instances of this range from suggesting “break-up” songs when a user indicates the end of a relationship online, to leading augmented reality video game players to certain restaurants and other places. This arrangement makes the existence and marketability of the surplus essential for their business model. Thus, as Zuboff proclaims, “Demanding privacy from surveillance capitalists or lobbying for an end to commercial surveillance is like asking...a giraffe to shorten its neck”.[108] This double-edged sword of control is termed “instrumentarianism” by Zuboff, which includes both “the instrumentation and instrumentalization of behaviour for the purpose of modification, prediction, monetization, and control.”[108] Instrumentation is the technological architecture that allows for the perpetuation of a behavioural surplus, while instrumentalization is the social structure that permits this. Zuboff even compares this system of control to that by totalitarian governments; with the total economic, political, and social control that Internet systems now have on human life. Furthermore, just as an “other” category of human being is seen as an enemy by totalitarian regimes, for the instrumentarian Internet

16 regime, the “other” constitutes non-marketable human experience and the very notions of human freedom and free will.

2.9 Conclusion

There are three forms of surveillance prevalent in the modern world; surveillance by corporations, state surveillance, and pervasive surveillance. While corporations mainly use surveillance for advertisement, many corporations have used user data to make ‘persuasive technology’, that is intentionally addictive. This force of addiction by corporations on users is a direct attack on their free wills, and hence a direct attack on democracy. The data and its analysis generated by these corporations is prone to use by other organizations for ulterior motives. Academicians and organizations have both used this information to study socio-economic and political manipulation, as is well exemplified in the case of Cambridge Analytica. Responding to global terrorist attacks, governments too have been tempted to develop methods to analyse user data on the Internet, and run mass surveillance networks for national security. However, these mass surveillance networks have also been used by governments to influence discourse, and manipulate socio-economic and political spheres in their nations. Misuse of this data by individuals working for the government is also not uncommon, as is illustrated by cases such as the aforementioned LOVEINT. The surveillance by governments and corporations not only has serious implications for the right to privacy, but is also a direct attack on individuals’ free will. Furthermore, pervasive surveillance has emerged as a consequence of these activities by governments and corporations, wherein Internet users are both the surveillant and the surveilled. Thus, individuals have to maintain an online facade, and are less free to speak their mind, out of fear of massive repercussions by other users, who are essentially their surveillants. This is a further attack on individuals’ freedom of expression. Thus surveillance and its implications are a clear and direct attack on democracy and the principles imbibed in the system. Nonetheless, analysis of data and its use to not only watch over but also manipulate individuals persists, even in the most democratic nations.

17 Chapter 3

Information Control and Influence

3.1 Introduction

There is no more action or decision in our day than there is perilous delight in swimming in shallow waters.

Søren Kierkegaard[111]

Mass surveillance has evolved to become a scalable and feasible methodology of observing citizens, but as discussed earlier, the larger purpose for this activity is to control actions and thoughts of people. Central to the idea of pervasive surveillance has been the panopticon, as discussed in chapter 2. It is a primary control mechanism today, and as Bartlett describes it, a perfectly feasible both-ways panopticon structure has been enabled by social media. However, the panopticon only gives a limited picture, that of being able to watch and be watched, but simply watching individuals is the means, not the end. Controlling actions and thoughts needs influence beyond the scope of surveillance, and this is where the media comes in. Thinkers like Søren Kierkegaard and Noam Chomsky have criticized the press and the mass communication practices enabled through it. While Kierkegaard observed that the attempted objective, singular-minded reporting by the press during his time was an attack on individuality, Chomsky went on to also examine the role of ‘elites’ in using the press and media as a machine. Sociologist Thomas Mathiesen further went on to envision the concept of a ‘synopticon’ as a reciprocal to the ‘panopticon’ discussed earlier. The synopticon, as described by Mathiesen, is steered by the mass media, to create “human beings who control themselves through self-control and who thus fit neatly into a so-called democratic capitalist society”.[113] However, while his ideas were limited to the many viewing the few, with the masses viewing the lives of prominent media figures, with social media, this has also become both ways; the many view and judge the few, and the few influence the knowledge and lifestyles of the many, all through the media.

18 The amenability of mass media for propaganda can be seen throughout history. Nazi Germany used films to spread their propaganda, while Joseph Stalin used multiple mass media for the purpose of what came to be known as ‘disinformation’; spreading false information deliberately, usually for an ideological cause. The Internet and social media are in some sense, extensions of the mass media, while also adding novel mechanisms for sharing information. What Kierkegaard, Chomsky and Mathiesen observe of mass media, has only been exacerbated through the Internet and social media, and through their analysis of the former, parallels and contrasts can be found for the latter.

3.2 Anonymity, Commitment and Information

Kierkegaard’s criticism for the daily press begins with his scorn for the conception of a collective ‘public’, and what he calls the trend of ‘mathematical equality’ that it ascribed to people. He argues that a concrete ‘approximate’ levelling preserves , but this has been abandoned for abstract totality in equality. He further argues that such a mathematical view of society renders individuals as abstract concepts like fractions or numbers, and thus the public as a whole, determinate and operative. Thus, the concept of equality itself departs from being a qualitative concept to a quantitative process of comparative mathematical operations.[111] Since the public is an abstract and not a concrete concept, it exists only through construction, and this construction is enabled by the mass media. Kierkegaard explains that not only does the mass media conceive the public through a misconstrued notion of equality, but in fact without the possibility of mass communication there would be no public to speak to or speak of. However, both the press and the public are ‘phantoms’, abstract concepts that only exist through each other. Kierkegaard further recognizes the lack of accountability that such abstraction leads to. The press speaks for the public, but no individual is responsible for the public opinion or what the press speaks. This was especially true for Kierkegaard in nineteenth century Denmark, as pseudonymous or anonymous publishing was prevalent in journalism. Kierkegaard thus claims that the mass media embodies impersonal communication. Furthermore, Kierkegaard also criticizes the press for its indiscriminate coverage and massive dis- semination of information, with an unsentimental tone. He identifies a certain lack of commitment in reporting events, both by the press in creation, and the public in consumption. He argues that when every incident is viewed and presented with a mathematical equality by the press, it loses any relevance or significance. By making all news equal, the press is also making all news dispassionate, and in turn the public as passive consumers of news. He concludes, “here. . . are the two most dreadful calamities which really are the principal powers of impersonality - the press and anonymity”.[111] The mass media and the public thus emerge as homogenizing forces, rendering a sense of impersonality and inactivity in individuals. This critique of the press manifests itself perfectly in the age of the Internet. The Internet suffers from the same concepts of the public (or the online community at large), anonymity, information overflow, and passivity. Dreyfus notes, “Kierkegaard would surely see in the net with its interest groups, which

19 anyone in the world can join and where one can discuss any topic endlessly without consequences, the height of irresponsibility”.[60] This has become increasingly true with the emergence of social media, with its multiple interest groups. Moreover, online users often have anonymous identities on these groups, such as on Reddit or 4chan. Even when this is not the case, the identities are often hard to perfectly determine in some cases, such as on WhatsApp public groups, where, though a mobile number is available, determination of identity and more importantly, the individual personality, is virtually impossible. While Dreyfus also argues that on these commitments and losses are at best virtual, these may be valued more dearly on certain spaces such as Facebook or Twitter, where virtual losses may lead to real loss of reputation. Yet, anonymity occurs on the Internet in that no platform assumes responsibility for the accuracy of the information available online, and this has also come to mean that users neither know nor care for the source. With the rise in the phenomenon of misinformation in the last three years, platforms like Twitter have taken more responsibility, and there has been some impetus for the users to check the source, but this is just a small change. With platforms like WhatsApp where there is little incentive to check the source, and it is nearly impossible for the platform to account for the information, these patterns perpetuate.

3.3 The Splinternet and Digital Tribes

One may hypothesize that with the rise of the Internet, and the variety of information available online, Kierkegaard’s grievance of the homogenizing effect of the press may be accounted for. However, this does not happen due to two reasons. The first one is simply that in addition to the variety of information, there is also an overload of information, and a passivity to its consumption. These are some of the primary factors that make the press a homogenizing force. The second reason is the division of the Internet from ‘the public’ to multiple publics - the digital tribes. In 1997, Van Alstyne and Brynjolfsso almost prophetically identified that electronic communities could emerge into either a ‘global village’ or a ‘cyber-balkans’.[80] This is because while the Internet has the capability to bridge gaps, allowing an individual user to inculcate various perspectives, it also has the potential to fragment interaction, where users spend more time with like-minded users and interest groups, and shut out any unfavourable opinion. The term ‘global village’ was first used by cultural theorist Marshall McLuhan in the 1960s, who predicted that the coming age of electronic communications would lead to a seamless web of information which he called the global village.[78] The ‘global village’ view of the Internet has had a heavy influence on the founders of popular online platforms, as Jamie Bartlett notes.[63] This view believes that by allowing people from diverse backgrounds to easily connect and communicate, the Internet would decrease conflict between these groups. The term ‘cyber-balkans’ models itself after the Balkans region, which was a diverse region relatively peacefully governed by the Ottoman Empire, but since 1912 has been broken into multiple smaller states that have become increasingly hostile towards one another. Van Alstyne and Brynjolfsso predict a similar

20 outcome to the ‘global village’ that the Internet may enable. They assert that since the Internet also makes it easier to find like-minded individuals, users may be more inclined on connecting with these groups, rather than diversifying their views. They also predicted that “it can facilitate and strengthen fringe communities that have a common ideology but are dispersed geographically” and then “their subsequent interactions can further polarize their views or even ignite calls-to-action”. These users would then prefer only to interact with the Internet in their virtual cliques. This can be seen today with the rise of the ‘alt-right’, and other polarized radical groups online.[80] Even typically non-political positions now have their own virtual cliques, as can be seen with radical veganism. Bartlett makes similar observations as Van Alstyne and Brynjolfsso (however, in retrospect) in the second chapter of his book People vs Tech, which is aptly titled ‘The Global Village’.[63] He further points out that much of this behaviour is inherent to humans, who have always clustered with like-minded individuals. “However, the tech giants have turned these psychological weaknesses into a structural feature of news consumption and exploited them for money”, Bartlett proclaims. Indeed the incentive for Internet companies is more in people subscribing and spending more time on their platforms, and the virtual clique organization of the Internet seems to benefit them. Thus the tech giants are less interested in keeping people informed and be exposed to a wide range of accurate facts and ideas, than they are in keeping people hooked to their services. In fact, the more specific their interests are, the easier it is to target users with the right advertisements. Thus the financial benefits for Internet companies may mean that the virtual clique economy of the Internet is here to stay.

3.4 Consent as a Service

While Kierkegaard stressed on the impersonal effects of mass media in the nineteenth century, Edward Herman and Noam Chomsky further observed the control on information by the ‘dominant elites’ of the society, in their book Manufacturing Consent.[52] Through case studies from the history of the US, Herman and Chomsky proposed a propaganda model to describe the systemic biases inherent to the functioning of corporate mass media in the US. The model examines these media houses as businesses more interested in generating advertising revenues than quality news reporting. It is argued that news itself is only a means to that end. Herman and Chomsky claim that the study is applicable to any country with a similar political and economic structure to the US. This has been corroborated by various studies that followed, often with empirical assessment as well.[65] The propaganda model suggests that there are many filters, broadly belonging to five classes, through which information is filtered to be presented as news. The five classes that Herman and Chomsky identified are ownership, advertising, sourcing, flak and fear ideology.

1. Ownership: Most mass media outlets are owned by profit-seeking corporations and this imperative creates a bias. These corporations often have their own agendas, and thus control reporting of news that may harm these agendas.

21 2. Advertising: Most mass media outlets rely heavily on advertising revenue for their internal funding. The news must attract the greatest number of readers, especially the affluent readers who would be interested in the advertised products. Thus, news that may generate interest in fewer readers would be marginalized.

3. Sourcing: News outlets depend on certain sources of information to produce news. These sources of information are predominantly the government, bureaucracy, large corporations and ‘experts’. This leads to a symbiotic relationship between news outlets and their sources, wherein news that may harm the sources is discarded, and often the sources influence the news reporting.

4. Flak: Flak refers to negative responses that news outlets may receive for their news reporting. This can lead to loss of advertising media and affect the outlet’s public image. Herman and Chomsky claim that powerful, private influence groups can organize themselves to form ‘flak machines’ that undermine news reporting on certain issues. One such flak machine is the US-based Global Climate Coalition (GCC), which comprises powerful fossil fuel and automobile companies, and constantly undermines media reporting on climate change and global warming.

5. Fear Ideology: The fifth filter is the media’s constant adoption of a fear or ‘anti-’ ideology in its reporting. While Herman and Chomsky identified anti-communism as the fear ideology filter when they first published their work in 1988, in a 2002 revised edition, they claim that the war on terror and Islamophobia have replaced anti-communism as the fear ideology. Chomsky has said of this filter, “Because if people are frightened, they will accept authority.”

These filters also interact and reinforce each other. Chomsky has examined these filters being apparent in the US media reporting on many news events. For instance, in reporting on climate change, the media gives equal weightage to climate change deniers, despite very few climate scientists denying it. This owes greatly to the ownership and advertising filters, as major corporations would profit from not acknowledging climate change. This is often explained away to seeking ‘objectivity’ in news reporting, but Chomsky argues that the undue objectivity in news is almost always a product of one or more of these five types of filters. It must be noted that there are certainly instances of good investigative journalism that seem to defy the propaganda model, one of the most notable of which is the Snowden exposition discussed in chapter 2. However, such journalism seems to most prominently be conducted by independent, and often local, newspapers. The bulk of news media reporting across corporate media houses in the US has been shown to fit the propaganda model, in its biased war coverage, reporting on scandals and leaks, and climate change. Herman and Chomsky thus believe the media to be a monolithic entity, something that resonates well with Kierkegaard’s critique of the press. Another interesting observation that Herman and Chomsky

22 make is that historically, news media that differed from the general monolithic narrative, and presented news from a different perspective often perished soon due to a lack in funding, as it only appealed to a small proportion of the masses, and not ‘the public’. In describing the propaganda model, Herman and Chomsky pointed out that media houses were essentially businesses trying to sell a product to advertising businesses, the product being their readers and audiences. This could not be truer for the Internet and social media companies, whose business model is also about utilizing user data as a product for advertising businesses. Elements of the propaganda model are prominent on the Internet, albeit somewhat decentralized on the consumer level. The Google and Facebook duopoly on the Internet means a strong ownership filter controls the dissemination of online information, and the fact that in recent years tech company owners have bought media companies (for instance, Jeff Bezos of buying the Washington Post) further concentrates the ownership. The ideology filter has become more decentralized, in that consumers are now faced with only the ideology that most appeals to them and spared the rest, living in their comfortable virtual cliques. The flak filter is also now supercharged, with online trolls and radical groups readily forming a flak machine quicker than any business organizations could hope for.[104] The Internet is thus a great way for companies to horizontally control diverse types of media. Much like it can be argued that the news media has produced good investigative journalistic reports, it can also be argued that there are instances where the Internet has led to organization of social reform movements against the dominant elites of society. While this is partly true, it has also been shown that the role of the Internet in such movements may sometimes be exaggerated, as was evident from the Arab Spring movement, which some credit to organization on social media, but only found real-world traction when the Internet was shut down by the government.[85] It is important to note that news media and social media also interact with each other, and how these two propaganda models interact forms an interesting study. Al-Rawi (2018) conducts one such study, observing news media and social media’s response to the phenomenon of fake news, showing a conflict between the two, with each blaming the other for its spread.[17] Al-Rawi argues that the flak mechanisms on Twitter act to undermine public trust in mainstream news. Similar interaction between news media and social media would be discussed in chapter 4.

3.4.1 Consent: Make in India

Sevanti Ninan has described the delegitimization of mainstream media over the past five years, and her observations indicate a strong manifestation of the propaganda model in the media landscape in India.[103] This has been in large part due to the ruling BJP government of India. This delegitimization starts with the government manipulating the sourcing filter. Immediately following his election to office, Prime Minister Narendra Modi was reported to have instructed his cabinet ministers and senior bureaucrats to not talk to journalists. This made the Prime Minister the only source of information, and that too not even directly. As a Scroll.in article notes, journalists now relied on Modi’s Twitter account for sourcing major government information.[43]

23 Next, the government took control of the ownership filter by co-opting media houses. Ninan notes that “this has been particularly true of television, with a host of channels—, India TV, Republic TV, Times Now, among others—turning openly partisan”.[103] Next the government even created its own modes of communication, as Modi started his monthly radio programme, ‘Mann Ki Baat’, sharing his views on current events and political process in India. These channels became the main communication mechanisms for government information to reach the masses, and this also emerged as the primary (almost only) source of information for the mainstream media. In addition to this, Modi held no press conferences, and virtually no interviews with journalists, except in very few cases, where the questions were pre-screened and answers only written. In the run up to the 2019 national elections, Modi also launched ‘NaMo TV’, a TV channel which was aired free of cost by many DTH operators, and also streamed on YouTube, and has been criticized by the opposition for acting as a ‘propaganda machine’. The channel continuously telecast repeats of Modi’s rallies and speeches. The channel disappeared almost immediately after the elections, and has since been interpreted to have been a campaigning mechanism.[27] The BJP government also took control of information flow on social media through masterful manipulation by its IT cell.[50] The IT cell hired ‘social media influencers’ whose role was to take control of and influence political communication on WhatsApp, Facebook and Twitter, during state and national elections. It has been speculated that these influencers operate during non-election periods as well, and some have been linked to the spread of misinformation and hate speech.[99] The government then attacked the advertising filters. One of the successful ways it attempted this was by influencing government participation in annual events conducted by media houses. Ninan documents an incident from March 2017, when the Prime Minister and other members of government pulled out of the Economic Times global summit at the last minute, speculatively to express the government’s displeasure at their reporting against the BJP government.[103] A similar incident led to the resignation of the editor of , in the run-up to their annual summit. The BJP government has also benefited from the organization of an effective, extremist, flak machine. Ninan documents multiple instances of death threats, attacks and violence against journalists, with the most infamous incident being the murder of in September 2017. In addition, journalists have also been victims of online ‘trolling’, including through fake news videos. This kind of trolling has particularly targeted women journalists.[56] The BJP government seems to have also experimented with fear ideology, as is evident from the 2019 Pulwama attack and the events in its aftermath, particularly the 2019 Balakot strike. The media reporting on these incidents fuelled anti-Pakistan and Islamophobic sentiments and glorified the government’s “strong response”. The timing of these events being so close to the national elections as well as to the eventual abrogation of Articles 370 and 35A of the Indian constitution, only serves to solidify the position of this filter in the propaganda model. Thus, the BJP government has been exercising sizable influence on all the five filters of the propaganda model, and controlling the Indian media landscape. The success of the BJP government in the latest

24 elections has further solidified their influence on Indian media and news reporting. Ninan observes that independent news media houses, as well as local and region-specific news sites, have emerged as a response to this control. Most of these news media outlets operate online, and many emerged as a direct response to the perpetuation of fake news. However, these media outlets lack the influence that mainstream news has, and sometimes also have to face the online flak machine. Nonetheless, this alternative journalism at least offers voices and opinions a suitable platform.

3.5 Conclusion

The press and mass media has largely been another control mechanism used by the state and corporates to influence the people’s thoughts and actions. This often presents the state with the opportunity to exercise its power without public . Corporates also benefit from this mechanism, with advertising businesses being able to reach a large proportion of the masses, and news businesses generating massive advertising revenue. The homogenizing effect of news media has perpetuated into the modern world, aided by corporate and state influence of news media. Many expected that the Internet would serve as a response to this homogenizing influence of news media, by bringing diverse groups together. However, the result has been increased polarization, and the amplification of the propaganda model that was prevalent in the structure of news media. Internet companies have themselves emerged as large corporates with similar goals of generating massive advertising revenue, aided by the virtual clique economy of the Internet. Local and independent news publishers who often come up with original news reporting and do not participate in the propaganda model may be the most reliable news source in the modern world.[103] However, as Rasmus Kleis Nielsen, Director of the Reuters Institute, notes, many of these publishers have found it difficult “to build a sustainable business in the digital space.”[107] Satirical and comedy news have also been identified as an alternative to the homogenizing and polarizing news reporting on mainstream news and social media. However, while some, such as Jon Stewart and John Oliver, have been lauded for their heterogeneous, passionate coverage,[24] most others have been identified to stick to the same formula of mainstream news and social media.

25 Chapter 4

Shaping Political Discourses

4.1 Motivation

As discussed in the previous chapter, it is important to study the interaction between news and social media to understand how social and political discourse has evolved with the rise of social media and people’s increasing reliance on it for news and information. Most commentators have observed a conflict between the two platforms. This chapter attempts to compare the nature of political discourses over social media and mainstream news media in India. This study is motivated by determining whether discourse on social media is influenced by mainstream news reporting, or if it is a new form of news media, at least in the content that it publishes.

4.2 Past Work and Challenges

In the past, such discourse comparison has been attempted using topic modelling.[17] However, this has a few limitations, especially in the context of this study. Firstly, only basic comparison of the generated topics can be conducted by a quantitative researcher. Qualitative expertise is required in comparing the degree to which these topics really differ. This also means that the study would have to be conducted afresh for each new news topic (not to be confused with the topics generated via topic modelling, which would practically be subtopics of the news topic). However, this is likely to be the case with any such work. Nonetheless, this study would attempt to delay this dependence on qualitative judgement, and propose a general quantitative model that is scalable across news topics. Secondly, topic modelling as a method itself has strong limitations in the context of Twitter posts. This is because of the sparseness of a single Twitter post, limited to 240 characters and often featuring spelling errors, and abbreviations, further reducing the amount of meaningful data that can be processed from the posts. Some studies have attempted to solve this problem by aggregating multiple tweets based on certain parameters. This technique is also known as ‘pooling’[39]. This aggregation has been done based on users[72], by aggregating tweets by the same user into a single document, [93], by aggregating tweets including certain hashtags into a single document, and conversations[39], by including comments

26 and replies to the tweet into the document. However, each of these aggregation methods have limitations. A user may have only one post, or very few posts, but with constant spelling errors and abbreviations, that the model cannot understand, making the user aggregation redundant. A -based aggregation would be ineffective if a large portion of our tweets do not have any hashtags. Finally, comments and replies to a tweet are often redundant sentences, and might instead add more noise to our document, making conversation-based aggregation counter-productive. These problems with topic modelling for Twitter posts are further scaled up in the case of Indian Twitter posts. Most Indians are at least bilingual and hence their posts often include words from Indian languages written in the Roman script. This transliteration is often non-uniform. An example is in the form of the Hindi word for ‘no’ - nhF\ (IPA: [email protected]˜I:) - which may be written as ‘nai’, ‘naee’, ‘nahi’, ‘nahee’, ‘nhi’ etc. This makes it even more difficult for topic modelling, and even natural language processing in general, to be successfully rendered over Indian Twitter posts. Furthermore, a pilot work done to model topics on Indian Twitter posts using the Latent Dirichlet Allocation method[48] gave no meaningful results. Finally, the distinction between style of writing between Indian English-language mainstream news, and Indian Twitter posts is not only in the formality of tone, but also due to the latter’s inclusion of variably transliterated Indian language words, which the former does not do.

4.3 Data Description

Twitter is taken as representative of social media, as the content on Twitter is public and accessible via the Twitter API.[7] Four news websites were identified to represent mainstream news media. These four newspapers were selected based on their general outreach as well as online accessibility.

1. It is the most widely read English-language national daily in India[10], and the most accessed Indian news website.[13]

2. Hindustan Times Another widely read English-language national daily in India, and the second most accessed Indian news website.[13]

3. Firstpost An English-language news and media website with high online access.[9] This was also chosen for being the most accessed Indian news website that does not have a directly corresponding print or TV outlet.

4. The most accessed independent business news website in India.[13] The Economic Times, the

27 more accessed business news website is a subsidiary of The Times of India, and the news content of the two often overlaps.

Next, two key current topics in Indian politics were identified for case studies; (1) the revocation of special status of Jammu and Kashmir, and (2) economic slowdown in India. News articles and Twitter posts (tweets) in these two topics over a period of one year (16 Nov 2018 to 15 Nov 2019) were collected.

4.4 Data Extraction and Processing

News articles were extracted from respective news websites using a Python3 script based on the Newspaper3k Python library[6] and the Beautiful Soup Python library[1]. A total of 5282 articles on topic (1) and 2174 articles on topic (2) were extracted. The breakdown of these by newspaper and topic is as given in table 4.1.

Times of India Hindustan Times Firstpost Business Standard Topic(1) 2419 1685 756 422 Topic(2) 1224 373 123 454

Table 4.1 Distribution of extracted news articles

Besides the text of the article, the metadata extracted included author name and publication date and time. Next, to extract data from Twitter, certain keywords were identified that were adequately representative of the topic on the whole. These were: Topic (1): “article 370”, “article 35a”, “special status kashmir” Topic (2): “economic slowdown”, “economy slowdown”, “modi made economic crisis”, “modi hai to mandi hai”, “modinomics” Around 316126 tweets for topic (1) and 66392 tweets for topic (2) were found based on the keywords and extracted using the Twitter API.[7] Besides the tweet text itself, the metadata extracted included number of likes and retweets, and date and time of tweet. The text of both the articles and tweets were tokenized and cleaned, by removing stopwords (articles, prepositions, conjunctions etc; ‘not’ is considered a stopword, but was not removed as it has semantic value) and stray punctuation marks and special characters, using the Python NLTK package. The words were not stemmed or lemmatized at this stage. This is a standard process for processing text before using it for further analysis.[84]

28 4.5 Method for Comparison

Due to the challenges faced in topic modelling, the method chosen for this study was to analyse text semantic similarity in a given time range between news and tweets. Although some of the challenges, like the sparseness of tweet texts, would still pose a problem with text similarity methods, these are relatively muted than in topic modelling. Based on past work done to understand news lifespan, it was determined that a particular news article may be discussed over a maximum period of 2 days on social media.[109] Thus, news articles from a particular date are compared with Twitter discussions over the next two days of their respective publication dates. For instance, the news articles published on 21 March 2019 were collated and compared with a document containing all tweets dated 21 March 2019 and 22 March 2019. It was ensured that there are at least two news articles for a date, and at least a 500 word document can be obtained for the corresponding tweets, to avoid redundant comparisons. With these restrictions, comparisons were finally possible for 254 dates in topic (1) and for 215 dates in topic (2). Methodologies that are rather new but relatively well-established in natural language processing were used to obtain a relevant text similarity measure.

4.5.1 TF-IDF

TF-IDF is short for term frequency–inverse document frequency, and a frequently used numerical statistic to determine a word’s relevance to a given document. The TF-IDF is the multiplication of two values, the term frequency (TF) and the inverse document frequency (IDF). The term frequency is quite simply the frequency of a word in a document. The inverse document frequency is a slightly more complicated measure that intends to gauge how much information the word actually provides to the meaning of the entire document. It does so by determining how rare the word is across all the documents. Hence, a common word like ‘the’ which is likely to have a high TF measure, would end up having a low IDF measure, because of its common occurrence across all documents, and would ultimately (ideally) result in a low TF-IDF measure. For this study, term frequency is calculated by adjusting for document length. Hence, TF = ft,d nd where ft,d is the number of times word t occurred in document d and nd is the total number of words in document d. The inverse document frequency on the other hand is calculated as a log-scaled inverse N fraction of documents containing the word. Hence, IDF = log( df(t) ), where N is the total number of documents, and df(t) is the number of documents containing the term t. It is possible that there are no documents containing a term, and hence the IDF measure is often smoothed by adding 1 to the numerator and denominator (which practically corresponds to an assumption that there is at least a document containing the term). However, a zero-division error is unlikely for the corpus in this study, so no smoothing was done. Hence,

f N TF − IDF = t,d ∗ log( ) nd df(t)

29 Upon calculating the TF-IDF values, words with a TF-IDF value of below 0.1 were removed from each document. Although stopwords were already removed from the corpus, it was also important to remove high occurring terms that might be common across the corpus, as they would skew the text similarity results. For instance, ‘economy’ could be a very commonly occurring term for news articles and tweets around topic (2). Thus, a text similarity measure might indicate that the news article and tweet are very similar, but not because they are conveying similar sentiments around the topic, but because they are talking about a common topic.

4.5.2 Word2Vec, Smooth Inverse Frequency and Cosine Similarity

Word2Vec are two-layer neural networks that produce word embeddings, maintaining the context of these words.[114] Word embeddings are mappings of words to vectors of real numbers. This vector representation is useful in making quantitative inferences from our data. Word2Vec is a particularly useful method of word embedding as similar words in the vocabulary are assigned close vector representations. For instance, ‘country’ and ‘nation’ would have similar vector representations. This further helps us in computing semantic similarity efficiently. A commonly used example to illustrate the capabilities of Word2Vec is by extracting analogies: vec(king) + vec(female) − vec(male) ≈ vec(queen), i.e., on adding the vector of ‘female’ to the vector of ‘king’, and subtracting the vector of ‘male’, the vector obtained is close to the vector representation of ‘queen’. In other words, Word2Vec models understand basic analogies like king is to queen what male is to female. Cosine similarity measures similarity between two vectors by calculating the cosine of the an- gle formed by the two vectors. This is how it can be estimated that the total value of vec(king) + vec(female) − vec(male) is close to the value of vec(queen), as the angle formed by the two is extremely small, and thus the cosine measure is close to 1. For this study, cosine similarity needs to be measured for the news document and the tweets docu- ment. Thus, the news document and the tweets document need to be represented by particular vectors respectively. A simple, intuitive way to do this would be to take the average of all the word vectors in the document. However, this often gives more relevance to words that may not contribute too much semantically. Although, some of these words have already been accounted for by removing stopwords and using a tf-idf threshold, there is scope for further optimization. To achieve this optimization, a technique called ‘Smooth Inverse Frequency’ (SIF) was employed.[97] Instead of taking simple averages, SIF takes weighted averages of the word embeddings. The weight a used is a+p(w) , where a is usually taken to be 0.001, and p(w) is the estimated frequency of the word in the corpus. SIF further computes the first principal component for the sentence embeddings. A principal component is the best-fitting one-dimensional representation of a higher dimensional space of vectors. The projections of the respective sentence embeddings on the first principal component are then subtracted from them to give the final sentence embeddings. This accounts for syntactic variation, that has no semantic significance. Thus, the vectors obtained for each document with this method are

30 best representative of their semantic value. This algorithm was implemented using the Python fse (fast sentence embeddings) library to boost performance.[87] Having obtained adequate vector representations for each of the collated documents, a cosine similarity was calculated for the news document and corresponding tweets document. Furthermore, a word-level analysis was also attempted (independent of topic modelling), using word cloud comparison. A word cloud is used to visualize a large number of words from a corpus in a graph, where the area taken by each word is proportional to the number of occurrences in the corpus.

4.6 Results and Discussion

4.6.1 Topic 1: The Revocation of Special Status of Jammu and Kashmir

Figure 4.1 Histogram of comparison for Topic 1

Only 36 out of 254 (14%) date comparisons show moderate similarity in news and tweets discourse, while 124 (49%) comparisons indicate dissimilarity in discourse (refer figure 4.1). It is also important to look at significant dates in the context of respective news topics. For instance, the Jammu and Kashmir Reorganisation Act was passed in Lok Sabha on 6 August 2019, and in Rajya Sabha on 9 August 2019. These were major events that shape the discourse around topic (1). Of all the date comparisons, 6 August 2019 indicated the strongest dissimilarity. The TF-IDF results indicated that the most relevant term to the tweet document for this date was ‘Pakistan’, which was much less relevant in the corresponding news document. Upon further examination, it was discovered that anti-Pakistan sentiments were being expressed over social media, while the very mention of ‘Pakistan’ was almost

31 absent from mainstream news (for that date). This indicates that there may be entire subtopics that are not covered over mainstream news but strongly influence social media.

Figure 4.2 Word cloud comparison for news (left) and tweets (right) in topic 1

A word cloud comparison can further illustrate this divide (refer figure 4.2). More polar and emotion- ally charged words like ‘deserve’, ‘award’, ‘butcher’, ‘fascism’ appear in the Twitter corpus. A portion of the news word cloud is indicative of a discourse around national security and a strong prime minister, with words like ‘leader’ and ‘security’ being quite prominent. Largely however, news media seems to cover more details on the logistics, with words like ‘article’, ‘assembly’, ‘bill’, ‘constitution’ being some of the more prominent words, while Twitter is more focused on the impact and reacting around it. Finally, while the news contains discernible mentions of Kashmir’s political leaders (Mehbooba Mufti and Omar Abdullah), the only politician names discernible, and rather prominent, from Twitter word cloud are Narendra Modi and Amit Shah. The Twitter word cloud thus seems indicative of extreme and polar reactions fixated on the major impacts, rather than discussing the intricacies involved.

4.6.2 Topic 2: Economic Slowdown in India

A mere 11 out of 215 (5%) date comparisons show moderate similarity in news and tweets discourse, while 141 (65%) comparisons indicated dissimilarity in discourse (refer figure 4.3). This distribution in similarity measures aligns with the results in topic (2). Unlike topic (1), however, it may be difficult to ascertain distinct dates when discourse around this topic may have been amplified. However, a word cloud comparison gives some relevant results, represented in figure 4.4. The word cloud for news indicates that a significant part of the discourse around the economic slowdown revolved around the elections. This is illustrated by words like ‘election’, ‘party’, ‘seat’, ‘candidate’, ‘minister’, ‘vote’. In addition, two of the major states - Haryana and Maharashtra - that held

32 Figure 4.3 Histogram of comparison for Topic 2

Figure 4.4 Word cloud comparison for news (left) and tweets (right) in topic 2

33 elections in the time period covered are also mentioned frequently. The major political parties associated with these and the general elections - BJP, Congress, and Shiv Sena are also frequently mentioned. By contrast, the discourse on Twitter around this topic did not seem to involve conversations around the election. Rather, the focus was around ‘jobs’, ‘people’, ‘network’, ‘gdp’. The mention of Prime Minister Modi also seemed to dominate Twitter discourse. References to specific companies like ‘Uber’ and ‘Ola’ also stood out. In general, it can be inferred that while the news discourse around economic slowdown tied greatly to elections, the conversation on Twitter seemed to cover immediate industry and individual effects of economic slowdown. The results and inferences seem to strongly corroborate the claim that discourse on social media largely deviates from what is published in the news.

4.7 Future Work

The methodology described here is itself scalable across different news topics and newspapers. By employing it for more topics and newspapers, its applicability can be solidified. The limitations of a simple topic modelling based comparison were highlighted earlier, but the semantic similarity methodology and word cloud comparisons used here may be supplemented by a further word-level analysis, either using quantitative methods or qualitative expertise from media or discourse studies. A topic modelling methodology that works in conjunction with these methods may also enrich the results. This methodology may even help in identifying specific subtopics that are relevant to understanding the distinction in news and tweet discourse (such as the anti-Pakistan sentiment identified in Results: Topic 1).

34 Chapter 5

Online Disinformation

5.1 Introduction

Our job at Facebook is to help people make the greatest positive impact while mitigating areas where technology and social media can contribute to divisiveness and isolation.

Mark Zuckerberg[82]

Whether social media follows a similar practice of information control as mainstream media or not is arguable. Yet, there are other disturbing phenomena in the context of information that have escalated with the Internet and social media - misinformation and its more malicious counterpart, disinformation. Misinformation is defined in the Merriam-Webster Dictionary as “incorrect or misleading information”[4], while disinformation is defined as “false information deliberately and often covertly spread (as by the planting of rumors) in order to influence public opinion or obscure the truth”.[2] In other words, disinformation is the spread of misinformation deliberately. As such, both misinformation and disinformation are complementary, since intentionality is often difficult to deduce, and the intentional spread of false information by some individuals often occurs in conjunction with its unintentional spread by other unwitting individuals. A more common term used in the context of mis- and disinformation is ‘fake news’. Collins English Dictionary defines it as “false, often sensational, information disseminated under the guise of news reporting”.[3] The term ‘disinformation’ derives itself from the Russian word dezinformatsiya, coined by Joseph Stalin, who facilitated the Soviet propaganda campaign. The word was incorporated into English dictionaries in the late 1980s, when the US began to actively counter Soviet disinformation, while itself engaging in disinformation against Libyan leader Muammar Gaddafi, and in internal politics. The phenomena of disinformation itself is thus not only intuitively, but also practically linked with propaganda itself, and thus is central to the synoptic model discussed in chapter 3.

35 Misinformation may occur in various forms; conspiracy theories, hoaxes, political propaganda or entirely fabricated news. These forms are often overlapping; for instance many conspiracy theories simultaneously indulge in some form of political propaganda. News which includes facts but is presented in a biased manner, may not be directly considered fake news, as long as it is fact-based. Advocacy journalism, a genre of news reporting, involves presenting news from a non-objective perspective, without spreading propaganda or fabricating information. This form of reporting is considered by some to be more informative than objective news reporting, who believe that multiple such outlets with varying transparent viewpoints would better serve public interest. However, if biased news reporting indulges in a tendency to spread propaganda or conspiracy theories, it should be considered misinformative.

5.2 Misinformation Groups and their Internet Behaviour

There has been extensive work around the role of social media and the Internet in disinformation in the US, particularly centered around the 2016 US presidential elections. The 2016 US presidential elections particularly led to an exponential rise in the academic study of fake news and misinformation across disciplines, as observed in a review article by Ha et al.[71] In a 2017 Data & Society report, Marwick and Lewis examine the spread of disinformation by the radical ‘alt-right’ in the US. The alt-right is defined as an online community that emerged in the US during the 20101s, loosely connected by shared far-right ideologies such as white supremacism, antisemitism, xenophobia, racism and belief in certain conspiracy theories.[26] It begins with pointing out how the Internet subcultures have a strong influence on news frames and the propagation of ideas. Media in general is vulnerable to being taken advantage of due to its proclivity towards sensationalism, and its increasing dependence on social media and analytics. The report points out that radical groups have developed a good understanding of strategic use of social media, memes etc to not only manipulate content, but to also increase their visibility. They also use irony and knowledge of Internet culture to manipulate a larger audience that is vulnerable to their methods. Even before misinformation emerged as a phenomenon, a certain group of online users, who came to be known as trolls, were known for manipulating information. Their goal has been to post inflammatory messages and elicit emotional responses from users. Most online trolls claim to be apolitical and maintain “a quasi-moral argument that, by trolling, they are exposing the hypocrisy, ignorance, and stupidity of the mainstream media” and the general public.[26] They have often pointed out their antipathy towards the mainstream media, for its sensationalist tendencies. A well-documented instance is that of a 4chan user messaging the Oprah show posing as a pedophile and sharing an incredible story of having “9000 penises” (which was itself a pop culture reference from Dragon Ball Z). The team, however, believed the story, and ran it online with the show host alerting viewers of “a known pedophile network” with over “9000 penises”.[119] Other instances include Facebook trolls making light of death-related posts, considering themselves in opposition to those who offer token condolences, without having known the person. Trolls are also known for disregarding ‘political correctness’ considering it an impediment to free speech. The

36 online alt-right mimics the behaviour and tactics of Internet trolls, albeit with an ideological motive. Furthermore, the two often work together online due to these shared tactics, perhaps unintentionally. Besides these online groups, there are certain users who hold great influence over social media, known as ‘influencers’. These are users who have generated a certain online following, and thus are connected with many other online users. Many of these influencers are themselves far-right ideologues, and become prominent centers for the spread and/or amplification of misinformation. One of the outcomes of their influence is that they hold great credibility for other Internet users. For instance, Twitter provides a blue verified badge for certain users so that “people know that an account of public interest is authentic”.[14] However, this causes Twitter users to conflate posts shared by ‘verified users’ as authentic information.[91] These influencers often also include prominent public figures, celebrities and political leaders. Marwick and Lewis pointed out the participation of US President Donald Trump in similarly amplifying misinformation, by constantly sharing conspiracy theories online, that had otherwise “been confined to fringe right-wing circles online”.[26] Marwick and Lewis also identify the role of hyper-partisan news outlets in the generation and spread of misinformation. They point to the emergence of right-wing networks of news websites and blogs in the US, identifying Breitbart “at the center of this new ecosystem, along with sites like The Daily Caller, The Gateway Pundit, The Washington Examiner, Infowars, Conservative Treehouse, and Truthfeed”. While the bulk of the reporting through these sites may not be considered misinformation on the surface, Benkler et al describe hyper-partisan sites as “combining decontextualized truths, repeated falsehoods, and leaps of logic to create a fundamentally misleading view of the world”.[120] Similar right-wing networks also emerge through online forums and discussion boards such as Reddit and 4chan. While 4chan had been a center for organizing progressive libertarian social movements from 2003 to 2008, the increasing popularity of the site caused these users to move to other less public forums. Following this, 4chan was revitalized as an ideologue for racist, xenophobic, misogynistic and general hate posts. To counter this 4chan deliberately set up a /pol/ thread for any political discussions. This thread was subject to extensive moderation. In August 2014, however, a harassment campaign emerged through 4chan, which later came to be known as Gamergate, targeting several women in the video game industry, including developers and media critics. This harassment included doxing (searching for and publicly broadcasting private information), rape threats and death threats. 4chan reacted by banning all posts related to Gamergate. The campaign then moved to 8chan, a “free speech alternative” to 4chan, which has since been linked to white supremacism, neo-Nazism, the alt-right, racism and anti-Semitism, hate crimes, child pornograpphy and multiple mass shootings.[68] Similar alternatives emerged for other online forums, such as Voat, a clone of Reddit, and the invite-only .ai, a clone of Twitter.

5.3 Information Manipulation: Motivations and Methods

People spread misinformation for three reasons (or a combination of these); promoting an ideology, making money or getting attention.[58],[89], Marwick and Lewis identify certain beliefs where various

37 online radical groups, including the alt-right, white nationalists, and conspiracy theorists, find common ground. These include contempt for multiculturalism and immigration, feminism and non binary genders, belief that there are intrinsic (and/or genetic) differences among races and genders, seeing political correctness as an assault on free speech, promotion of , and a tendency to spread conspiracy theories. For these groups, spreading their ideology on the Internet is a necessary war against the domination of the ‘liberal left-wing’, which they assume to be a threat to their way of life. These groups also radicalize other young users by creating a ‘participatory culture’ on online forums and message boards, such as 8chan, and establish strong networks online.[79],[59] Besides these ideological groups, there are other users, who share misinformation purely to generate revenue via advertising networks. One such group of teenangers, based in Macedonia, created a range of websites spreading pro-Trump, anti-Hillary fake news during the 2016 US presidential elections.[28],[38] The group claimed to be apolitical and attributed their actions to the finding that such content generated high advertising revenue. A Buzzfeed article quotes them:

“I started the site for an easy way to make money,” said a 17-year-old who runs a site with four other people. “In Macedonia the economy is very weak and teenagers are not allowed to work, so we need to find creative ways to make some money. I’m a musician but I can’t afford music gear. Here in Macedonia the revenue from a small site is enough to afford many things.”[38]

However, even these groups often plagiarized from alt-right, ideologically driven blogs and websites. Marwick and Lewis claim that similar networks, with no political leanings of their own, are prominent on the Internet. Furthermore, users share information to gain online reputation. This reputation may be measured by the number of ‘likes’ and ‘shares’ generated on Facebook or the number of ‘favourites’ and ‘retweets’ on Twitter. Users are thus motivated to share content that expresses relatability within their communities. In this spirit, they may tend to share misinformation that they find appealing and possible to generate ‘likes’ online, without checking its validity and without any ideological leanings. It is important to note, however, that these three types of motivations (ideology, money, and reputation) are often intertwined. The success of online misinformation can be attributed to the perpetrators’ knowledge and utilization of Internet culture in creating and spreading any information. One such instance is the common usage of memes on the Internet. The first definition of meme in Merriam-Webster Dictionary is given as “an idea, behavior, style, or usage that spreads from person to person within a culture”. Indeed, a meme can be anything, and can quite simply be understood as a unit of information. However, the second definition of meme in meme in Merriam-Webster Dictionary is more contextualized to the Internet, given as “an amusing or interesting item (such as a captioned picture or video) or genre of items that is spread widely online especially through social media”. Memes have emerged as a cultural object on the Internet, and even in print media. The alt-right has been quite successful in leveraging memes to create ideological misinformation. Their posts often accompany captioned pictures or videos. Marwick and Lewis also document that the alt-right have appropriated memes such as Pepe the frog as a symbol

38 for their movement. Another instance is the antisemetic ‘(((echo)))’ meme, which involves putting any Jewish names in triple parentheses to separate them from the rest. This appropriation and creation of memes by the alt-right is successful owing to the exclusive participatory culture of the Internet, where the knowledge of memes is seen essential to the knowledge of the Internet. The alt-right has thus created a subculture on the Internet, which is prone to misinformation. Another method for spreading misinformation has been strategic amplification (or deamplification). The alt-right has leveraged several tools to enable this, including fake accounts, bots, and highly networked communities. Often these networked communities, work together to get a hashtag to trend, or to hijack an extant hashtag, by posting spam messages using those hashtags. These activities are further amplified by their usage of fake accounts and social bots. Fake accounts are accounts usually representing non-existent persons created and operated entirely by another user, and are used for various purposes, such as inflating the following of an online issue, influencing discourse, and broadcasting content. Often these fake accounts are created to spread a false image of a group targeted by the alt-right. Social bots are essentially fake accounts that are created by another user, but operate independently based on a set of algorithms. Social bots are thus capable of operating more efficiently than fake accounts, and amplify the effects to a larger extent. It must be noted that while radical groups have a tendency to spread misinformation, the actual impact of their behaviour on the spread of misinformation is unclear. Nonetheless, their methods and techniques certainly resonate with the trends in misinformation that is being seen everywhere. However, it would be erroneous to claim that radical far-right groups alone are at the forefront of disinformation. One of the more prominent instances is that of the Russian Internet Research Agency (IRA) to influence the 2016 US presidential elections.[62] A number of fake accounts were created by the IRA, on multiple online platforms, to pose as US citizens, with far-left or far-right political leanings. Many of these accounts were operated by bots. A special investigation into the Russian meddling was led by Robert Muller in the US. His report claims that the IRA “used social media accounts and interest groups to sow discord in the U.S. political system through what it termed “information warfare””.[95] A report issued by the Intelligence Community Assessment had earlier claimed that the incident aligns with “Moscow’s longstanding desire to undermine the US-led liberal democratic order”, and that the goals of the IRA were to hurt the presidential campaign of Hillary Clinton and aid the election of Donald Trump. It is also important to consider which users actually believe and thus share misinformation. Gentzkow and Allcott, in a study centered on the 2016 US presidential elections, identify certain kinds of users who are likely to believe ideologically aligned fake news. They identified that Republicans were slightly more likely to believe ideologically aligned fake news than Democrats. Heavy news media consumers were also more likely to believe such fake news, but no correlation was found in social media use or education levels. However, those with isolated social media networks were more likely to believe ideologically aligned fake news, perhaps because of reinforcing beliefs from their peers, and a lack of contradictory beliefs. Furthermore, they noted that decisive voters were more likely to believe fake news which corresponds with the strength of ideology towards believing fake news.[58] These results also

39 align with the results from Banaji and Bhat et al which claims that “prejudice” is the reason for sharing , and not digital illiteracy.[98] The study would be discussed in more detail later. As Marwick and Lewis conclude, “we can expect the continuation of some current trends: an increase in misinformation; continued radicalization; and decreased trust in mainstream media”.[26]

5.4 Divergence Across Platforms

In considering the spread of misinformation, it is crucial to understand the various Internet platforms that facilitate this. While message boards and forums like 4chan and Reddit (and now their clones 8chan and Voat) are platforms for the creation of misinformation and radicalization of online users, most of the threat of disinformation is on mainstream social media sites like Twitter and Facebook, where all kinds of users may share misinformation. Feingold et al offer a preliminary study on trends across four Internet platforms in the respective user as well as platform behaviour that facilitate the spread of misinformation, using a mixed-methods approach. These four platforms are social media services Twitter and Facebook, Google search engine, and Reddit discussion boards. Like much work around disinformation, their study is centered around the 2016 US presidential elections.

5.4.1 Google Search Engine (and YouTube)

Google search is the most widely used Internet service for getting information. The search results generated by Google are governed by an algorithm, and spammers have long tried to manipulate this algorithm. Not only are the search metrics governed by simple query-document matching and most visited pages, but the result ranking is also contextualized to an individual user’s personal search history and location, also inheriting the user’s biases. Furthermore, as discussed in the context of the Felicific Calculator 2.0 earlier, these algorithms also reflect the biases of their creators, and would often display controversial search results.[36] Feingold et al identified that sites known to spread false information repeatedly appeared in Google search’s top results for politician names during the 2016 US presidential elections. They identify that a symbolic ‘arms race’ continues between Google and spammers, with Google continuously engaged in algorithmic monitoring and improvements. Feingold et al also suggest that Google inform the users regarding the process and limitations of the search results generation, though this might be difficult considering the black-box nature of these algorithms. YouTube, a video sharing platform owned by Google, similarly recommends videos based on an algorithm, and has faced similar difficulties. The YouTube content moderation model is similar to Facebook’s crowd-sourcing and human-based monitoring model, though it also employs an algorithm to monitor content. The YouTube content moderation system has faced flak over and over, often falsely penalizing or letting profane content pass. One of the known instances is that of YouTube’s recommendation algorithm suggesting softcore paedophilic videos, that have not been flagged by their

40 content moderation model.[76] YouTube has consistently been unable to tackle this issue, as well as failing to flag misinformative video content.

5.4.2 Reddit

The radicalizing effects of Reddit have been explored earlier in this chapter. Reddit has since revised its policy on removing subreddits, or limiting access to them, through ‘quarantining’ subreddits, which means that it is off-limits for non-subscribers and requires an opt-in screen for visiting users.[12],[73] This quarantining gives a bad reputation to the subreddit, making users wary of its content. However, there are still certain subreddits where conspiracy theories and other forms of false information is prominent, though Feingold et al noted that volunteer moderators and the Reddit community both delete obviously false content. They also observed that though some volunteer moderators may not be active in their moderation, nudges and reminders in certain subreddits helped in reducing the spread of misinformation. The complete banning of controversial subreddits however means bad reputation for Reddit as a free speech platform, and thus it has chosen to only do so in case of heavy violations. Nonetheless, Reddit’s algorithmic ranking aims towards reducing the visibility of problematic subreddits.[41] Furthermore, Reddit is also relatively transparent in accounting for changes to the otherwise complex website structure.[11]

5.4.3 Twitter

Twitter is a social networking service and broadcasting, where users interact through short messages known as ‘tweets’. Registered users can post, like, and retweet (share) tweets, while unregistered users can read the tweets. All the tweets on Twitter are public, and can be accessed through the Twitter API for various purposes, including marketing and research. Twitter. Feingold et al observe that users on Twitter often engage in adversarial interactions and debates. Also, Twitter users conflate posts shared by ‘verified users’ as authentic information, thus leading to a different kind of trust bias from Facebook. Another trust bias caused by Twitter is in the system of ‘likes’ and ‘retweets’. A high number of likes or retweets causes users to believe the information more. Also, users tend to believe a tweet more if it has an external link. Finally, the all-public nature of Twitter is susceptible to bot activity and thus the automated spread of misinformation. Twitter has since taken certain steps to allow researchers more access to the usage data, and actively flagging hate speech and fake news. A recent fact-checking activity involved US president Donald Trump’s false claims on mail-in ballots, wherein Twitter provided an external link with the correct facts.[67] Another activity involved Twitter flagging a tweet by Trump as hate speech.[115] These incidents not only point to an increased flagging capacity from Twitter, but also an indirect attack on the trust biases discussed earlier, as Trump is a Twitter verified user. Furthermore, Twitter has also been taking actions to mitigate the effects of spamming and bot activity by developing machine learning tools that can detect and remove such bots and spam.[121]

41 5.4.4 Facebook

Facebook is a social media and social networking service. Its users can create a profile to share personally identifying information with other users. Users can interact with other users by agreeing to become online ‘friends’, but in a limited manner, can also interact with any other user. The latter interactions are prominent when users choose to join common-interest groups. Users can also choose to ‘follow’ certain users, as well as Facebook pages, which can represent popular figures, organizations, and even ideas and interests. Users follow their friends by default, though they have the freedom to change this setting. Posts made by the ‘followed’ users and pages, as well as those made in joined groups, appear in a user’s news feed, and these posts make up the bulk of the information disseminated to a user via Facebook. The appearance of these posts are entirely governed by an algorithm that is optimized to present the most engaging posts to the user. Facebook has also defined a policy to identify “objectionable content” that it may remove if shared via Facebook, as well as inauthentic behaviour and misleading representation that may lead to sanctions on pages and users.[8] The procedure of this removal, however, is almost entirely human-driven. Facebook employs personnel whose job is simply to scroll through thousands of posts everyday, and decide whether or not they violate any regulations. Facebook has been blamed for its lack of sufficient number of employees for content monitoring, with the ones presently employed simply overloaded with work, only being able to spend a second or two per post. Feingold et al observed that users tend to believe the news they see on Facebook, in general. Fur- thermore, those users who are aware that Facebook is taking steps to resolve the fake news problem tend to believe the news they see on Facebook more, creating an ironic ‘trust bias’. They also found a positive correlation between user education levels and vigilance in assessing false information. They also recognized that users overrated their own individual understanding of Facebook’s tools. They have also called for more aggressive fact-checking systems. Facebook’s existing content moderation system has been identified as a “near total failure” by existing studies.[116] Nonetheless, there is a drive to push for more personnel being employed at Facebook, as well as moving to better automation for content monitoring.

5.4.5 WhatsApp: The Anomaly

Neither Marwick and Lewis nor Feingold et al make any reference whatsoever to WhatsApp in discussing disinformation; in fact most of the studies based in the US or Europe do so. It seems to be the platform at the center of the disinformation debate in developing countries, especially in India and Brazil, but seemingly not so in the US or European nations. Indeed WhatsApp is an anomaly not only in that it stands in a unique position in the disinformation debate, but also as a platform itself. Unlike Facebook and Twitter, it is not a social media platform by itself; it is essentially a messaging service, and was founded exclusively as one. However, changes on the platform over the years have led to it incorporating many features of social media sites. In an exclusive

42 messaging service, a user could only message those users whose number they had, unlike Facebook or Twitter, where a user can find other online users and interact with them, and even make online-exclusive friends. However, this is a little more complicated with the existence of ‘public groups’ on WhatsApp. On these public groups, users can interact over common interests, and eventually connect in a similar manner to Facebook or Twitter. However, public groups are the only channels for such interactions, unlike on Facebook or Twitter. Furthermore, the quick forwarding features of WhatsApp and its isolated smartphone-exclusive existence further the problem. Perhaps the most prominent distinction that separates WhatsApp from Facebook and Twitter is its decentralized storage of content. While all data is exchanged through and stored on the respective platform servers for Facebook and Twitter, WhatsApp only stores important metadata (such as phone number, profile picture, messaged contacts), and exchanged messages are stored solely on the involved user’s devices. Furthermore, these messages are protected through the Signal protocol and end-to-end encryption, meaning that no one else can eavesdrop on these messages unless verified to be the owner of the account by WhatsApp. An exception to this is in backing up WhatsApp data to Google’s cloud storage, where it is not stored encrypted, and thus can be accessed by someone breaking into the Google account. This puts WhatsApp in a different position entirely in tackling misinformation. Content moderation is almost entirely impossible, as the content can only be accessed by the users involved. The problematic public groups can thus become breeding grounds for radicalization and false information, without any of the checks and balances that could be possible with Reddit or 4chan. Any such groups can only be dealt with by reporting to law enforcement, and even law enforcement may find it difficult to indict involved users. While WhatsApp has itself funded some research around misinformation[15], its uncanny structure as a platform makes it difficult to find solutions. WhatsApp has employed feature tweaks targeted at the cognitive processing of information. For instance, a forwarded message now comes with a ‘forwarded’ tag, which is meant to serve as a warning to the user that the message is not the other user’s original thought. However, even endorsement from the other user by simply forwarding content may warrant the message credibility, as also observed by Banaji and Bhat et al.[98] There is a long way for WhatsApp in countering disinformation, and a lot of research must go into it, with a much different perspective as compared to the research in other forms of online media. Unfortunately, the problem at hand is immediate, and a lot of damage may be done by the time WhatsApp comes up with a viable solution.

5.4.6 Anonymity and Transparency

Feingold et al identify four factors to evaluate a platform’s amenability for misinformation. These are anonymity, control, complexity and transparency, the former two being user-facing and the latter two being platform-facing.

43 1. Anonymity: The more an average user reveals their identity on the platform, the lower the anonymity. Feingold et al argue that a lower value is preferable for this.

2. Control: The extent to which a user can choose what content they see on the platform defines the control factor. Feingold et al argue that a higher score is preferable for this, but this also means that users can limit their interactions to virtual cliques. Furthermore, for some platforms the control score is difficult to determine, as the functioning of their algorithms are not transparent enough.

3. Complexity: The difficulty in understanding the structure of a platform defines its complexity. Feingold et al claim that a lower complexity is preferable in countering misinformation, but this may not be true if one is to account for WhatsApp, where malicious users have found workarounds to attack the complexity of WhatsApp.[81]

4. Transparency: The visibility of data and the functioning of the platform, especially the flow of information on it indicates its transparency. Feingold et al argue that a higher transparency is preferable. Many platforms however choose not to divulge directly identifying data to protect user privacy.

The anonymity and transparency factors seem to be the most crucial, and might point to a suitable direction for research. Facebook, for instance, offers low anonymity, but also a low level of transparency in its functioning. This implies that external research efforts to tackle misinformation may be difficult, and thus Facebook itself must bear most of the brunt of fighting misinformation on its platform. Twitter, on the other hand, offers medium anonymity, except for verified users who are often already public figures. However, the all-public nature of Twitter aids external research efforts, and this reflects well in the fact that the majority of disinformation research has focused on Twitter.[71] For WhatsApp, anonymity is high and transparency is extremely low, both undesirable values. Though user accounts are linked to phone numbers, the sharing of which usually implies a relatively intimate relationship between users, users themselves reveal very little on the platform itself. With almost no data stored on any external server, the transparency for WhatsApp is also extremely low. WhatsApp thus has leveraged the control and complexity factors more. An instance for the former is WhatsApp allowing users to control who can add them to groups, which can help limit their exposure to problematic public groups.[124] Another instance for WhatsApp leveraging the complexity factor includes the introduction of the ‘forwarded’ tag, and the limitations it has imposed on wide broadcasting of messages, particularly viral messages. WhatsApp has claimed that this has led to a fall in massive forwarding of viral, potentially misinformative messages.[83]

44 5.5 Misinformation and Online Behaviour in India

Even though most misinformation research has been centered around the US and Europe, Rasmus Kleis Nielsen, director at the Reuters Institute, claims, “the problems of disinformation in a society like India might be more sophisticated and more challenging than they are in the West”.[70] Indeed, as pointed out earlier, much of Facebook’s efforts in tackling misinformation are centered around India. The problem is more complex in India because of multiple factors. Not least of these factors is the ethnic and religious diversity of India, that has led to more amplified echo chambers than in the West. The sudden and rapid growth of Internet access in India has been another factor. India has had a considerably lower literacy rate at 74% than the US or Europe, with average literacy rates of 99%. The lower literacy rate coupled with the limited Internet access in India until 2013, meant that the larger population was digitally illiterate. However, much of this changed with the advent of smartphones in 2013, and the creation of telecom company Reliance Jio in 2016. As Banaji and Bhat et al note, Jio offered 4G Internet packages at a much lower price than competitors, with the cognisance of, and alleged complicity by, the Telecom Regulatory Authority of India (TRAI).[98] This predatory pricing by Jio, led to downward revision of tariffs across the telecom markets, and coupled with the smartphone boom in India, has led to a highly increased access to the Internet.1 This increase in access to the Internet has arguably been much higher than the increase in India’s digital literacy, leading to a large population in India using the Internet while also struggling to understand it. In such a scenario, WhatsApp has been readily adopted by a bulk of the population, with its ability to exchange audio-visual content, and generally high usability and accessibility. These masses, unlike those in the US and Europe, have not been familiar with the practices of the Internet. While the alt-right radicalized on the Internet in the late 2000s, no such Internet radicalization has been documented for India, and is unlikely to have happened with the low access to the Internet. They are thus not readily wary of online information, and often do not even have the requisite understanding for fact-checking. Internet viability has thus superseded Internet literacy in India. In fact, it may even be argued that the kind of radicalization that had occurred on Reddit and 4chan, is manifesting itself through WhatsApp public groups in India. These are more potent grounds for such radicalization as there is virtually no scope for content moderation on WhatsApp as was possible for Reddit or 4chan. Rasmus Kleis Nielsen revealed that a survey indicated that 36% of respondents expressed trust in news media, “statistically indistinguishable from the 34% who say they trust news found via social media”.[92] This indicates a similar distrust in news media as that noted by Marwick and Lewis. In fact, the disinformation campaign in India seems to be much more closely linked to an organized propaganda campaign than elsewhere. Facebook has removed pages, groups and accounts, linked with disinformative behaviour, linked with both the Congress and BJP parties in India.[122],[123] Reuters noted the use of WhatsApp clones and software tools in India that were “used by some BJP and Congress workers to manually forward messages on a mass basis; software tools which allow users to automate delivery of

1It must be additionally noted that Jio’s steps have had dangerous implications for the telecom market with most companies locked into debt or forced out of the market, and the emergence of a duopoly.

45 WhatsApp messages; and some firms offering political workers the chance to go onto a website and send bulk WhatsApp messages from anonymous numbers.”[81] This circumvents any of the feature tweaks that WhatsApp may introduce to counter the spread of viral messages. Besides election manipulation, ideologically-charged misinformation is also rampant in India, particu- larly through WhatsApp. Banaji and Bhat et al, through a series of focus-group interviews, concluded that male, technologically literate, Hindu upper or middle caste users were more likely to create, curate and share ideologically-charged misinformation.[98] They suggest that such users may even be likelier to create and administer groups that are responsible for perpetuation of such misinformation. They also claimed that women personally related to these men were also prone to share such misinformation. The ideologies shared were primarily Hindu nationalist, with occasional misogynistic overtones. This coincides with the rise and increasing popularity of the ruling Hindu nationalist BJP party in India. This ideologically-charged misinformation has directly caused communal riots[42], mob lynchings in India (which have come to be known as ‘WhatsApp lynchings’)[51], and has also influenced election results.[33] In case of some violence causing misinformation, such as rumours of child snatchers, Banaji and Bhat et al claimed that the motivation for forwarding such misinformation was not ideological, but rather a concern for social well-being and a distrust for strangers. This distrust for strangers, though, in itself may be ideologically driven, and even borderline xenophobic, something Banaji and Bhat et al fail to observe. Even with such misinformation, they noted that users who forwarded these were predominantly male.2 Alexis Madrigal, in an unrelated article, noted that such rumours were likely to be ideologically charged, as the rumours primarily targeted minorities.[19] The xenophobia is quite apparent in mob lynchings of Assam, fuelled by WhatsApp, but building on the local myth of ‘xopadhora’, an alien who lifts children. The xopadhora is characterized as a male with long hair, “transgender-like appearance”, and almost always someone from outside the local community.[96] This strongly points to xenophobia-charged misinformation. The success of misinformation in India, thus not only relies on the perpetrators’ knowledge of Internet culture and practices, but also their understanding of social stigmas and myths. Some of the efforts by the Indian administration includes the police holding group admins responsible for moderating content, and even threatening to penalize those who fail to do so; an exacerbated version of how Reddit administrators rely on subreddit moderators.[25],[16], A statement by New Delhi-based organization SFLC.in, however, claims that such a practice is unconstitutional.[110] Interestingly, in its efforts to curb misinformation, the Indian government has consistently asked social media companies to provide access to user data.[20] India’s surveillance mechanisms as discussed in chapter 2, already require telecom companies to share phone records on demand with the authorities. This action also means that the telecom companies are not held responsible for the content that users share through their services. This understanding has now been complicated with WhatsApp. The government had blamed mob violence directly on WhatsApp, instigating the platform to take steps towards curbing

2It must be noted that Banaji and Bhat et al’s study may be prone to subject bias, as their observations are based primarily on in-person interviews and user narratives.

46 misinformation on its platform. While WhatsApp claims that these steps are working, the government has expressed dissatisfaction, asking to be able to track messages that lead to death or violence, and even indicated the possibility of shutting down social media platforms that do not comply. WhatsApp has indicated that such a measure is an impossibility with its end-to-end encrypted structure. Privacy activists are also concerned about the government’s demands, even fearing that access to such information may be used by the government to track down dissidents. This latest development brings us full circle in the context of this thesis. It indicates a vicious cycle of surveillance, media control, and misinformation arising, with an increase in centralized control of the masses with each iteration.

5.6 Conclusion

Studies around disinformation have been predominantly conducted with a psychological purview, to understand social behaviour that has led to its rise. However, it is not easy to predict misinformation- related behaviour with our current knowledge and expertise. Internet behaviour has been well-documented and can be looked at as a foundation for understanding misinformation-related online behaviour. However, it is not sufficient to simply make deductions based on Internet behaviour alone. Misinformation behaviour has traits of its own, which need to be understood in tandem with Internet behaviour. Furthermore, social behaviour alone is not a root cause for the rise of disinformation. Structures on social media have amplified certain behavioural traits, and muted others, and hence are subject to investigation too. There is divergence in the ways that social media platforms amplify (mis)information. Finally, online behaviour and social media trends also vary across nations, and should be considered in context of the social and political spectrum of the nations. There is a perceived urgency in dealing with disinformation, especially due to the violence that it has led to in countries like India. However, Danah Boyd warns us against rushing into short-term technical solutions as “such an approach creates a distraction while enabling the underlying issues to flourish”.[40] Boyd points out that the phenomenon of disinformation not only raises concerns for our technological structure, but also the social structure that has evolved to enable it. As Alexis Madrigal also points out, many mob lynchings caused in India due to WhatsApp-spread misinformation also indicated some deep-rooted social problems, with religious, class or gender divisions coming into play.[19] Thus both these ends must be tackled, and only long term methods to both reform social ills, and technological contingencies would sufficiently solve the problem of disinformation. It may also be important to critically examine the steps taken by Internet companies and governments, as they might be misleading.

47 Chapter 6

Disinformation Trends Across Social Media Platforms: An Actor-Network Analysis

6.1 Motivation

As briefly discussed in the previous chapter, the mechanisms and spread of misinformation vary across online platforms, though there are certain commonalities. Most of the studies examining these trends involve in-person interviews and extracting user narratives based on them. However, a formal analysis for the spread of online misinformation is also required. To that effect, an analysis of three social media platforms - Twitter, Facebook, and WhatsApp - through the Actor-Network Theory (ANT) framework is attempted. Through classical methods defined within ANT, this chapter describes a model for ‘unblackboxing’ the socio-technical paradigms created by these platforms that have enabled online misinformation. These three platforms have been identified as the most prominent centers of online misinformation in various studies. However, the role of WhatsApp has been quite divergent from the other two, and the ANT analysis may help us further understand this anomalous behaviour.

6.2 The Actor-Network Theory

The Actor-Network Theory (ANT) is a framework to describe ‘socio-technical’ paradigms through a network of ‘actors’.[57] These actors are heterogeneous, and may include human as well as non-human entities.[106] The non-human entities can include animals, such as microbes, places, such as laboratories, and machines or tools such as inscription devices, or even text.[31] Both human and non-human actors in an actor-network have agency, and certain interests that cause them to act. An actor that can form or break associations with other actors is termed a ‘volitional actor’ or an ‘actant’.[57] These actors move and change, translating their interests in both place and form, and accordingly bring about changes across the network.[106] ANT is essentially a materialist theory, in reducing the social to material. This means that social forces such as ‘social norms’, ‘social laws’, or ‘culture’ are only seen in the form of interactions between

48 actors, human or non-human, and not independently. It essentially assumes that the sum of material, non-social phenomena can account for these social forces. Latour further elaborates on the latter idea:

But it’s not the case that shifting from social to natural objects means shifting from a bewildering multiplicity to a welcoming unity. We have to shift, yes, but from an impov- erished repertoire of intermediaries to a highly complex and highly controversial set of mediators.[32]

It must be noted that despite the heterogeneity of its actors, ANT endows an analytical impartiality in examining the actors, their agencies and interests, thus favouring a ‘supersymmetry’ in its structure. It may be claimed that there is an inherent asymmetry and hierarchy to society, and ANT ignores these concepts of power and domination in favour of imagining a symmetric structure. Latour, however, clarifies that since one of the objectives of ANT is to explain the asymmetries, it is repetitive, and perhaps even circular reasoning, to include power asymmetries in the structure.[32] As an extension to this symmetry, ANT explains the technical and natural in the same terms as what is social and cultural, as it reduces the social to material. The ANT thus does not lean towards either social constructivism (technical and natural), or realism (social and cultural), the two major sociological accounts of scientific production.[57]

6.3 Understanding Actors in ANT

As already explained above, actors are heterogenous, and include both humans and non-humans. While imagining humans as actors with agency and interests is elementary, it may be difficult to surmise how non-human actors may have agency and interests in an actor-network. Sayes provides an adequate simplification, introducing four roles that non-human actors may play in an ANT analysis.[53] These are (1) conditions for the possibility of human society, (2) mediators, (3) members of moral and political associations, and (4) gatherings.

1. Conditions for the possibility of human society: Non-human actors are often necessary stabilizers of the actor-network or the human collective. These were first identified in realizing that human society was not possible based on purely social ties, but when certain actors can be mobilized sufficiently to maintain these interactions. A prominent example for these are machines and artifacts.

2. Mediators: Sayes notes that early ANT studies often considered non-human actors as intermediaries, rather than mediators. Thus non-human actors would only be actors in a formal sense, without agency. However, as a mediator, a non-human actor adds something to the actor-interactions, and does not simply act as a placeholder.

49 3. Members of Moral and Political Associations: These kinds of non-human actors carry inherent moral and political agency. Latour exemplifies this with the usage of seatbelts.[30] He argues that the legal consequences of not wearing a seatbelt add a political agency to the already moral agency of wearing a seatbelt. He further presents hypothetical scenarios where a car’s ignition can start only when the driver wears a seatbelt or a seatbelt automatically straps people in, arguing that morality and politics become irreversible with such agency.

4. Gatherings: This circles back to the necessary role of non-human actors as the possibility of human society. As Sayes explains, “nonhumans allow an actor that is no longer present to exert a palpable influence”[53], nonhumans have the potential to gather other actors from variable times and spaces. For instance, machines represent actor-interactions across time and space, and by multiple other actors (including non-human).

While the important roles of non-human actors in an actor-network have been identified, it is also important to understand that actors can not be defined independent of the network. Rather, in an actor- network, actors are ‘inter-defined’, i.e., they must be defined in the context of the problem and by the links between them, in addition to their identities and interests.

6.4 How the network evolves in ANT

The actors and network in an ANT should not be utilized independently.[32] Relationships within the actor-network form when two entities express some form of mutualism, i. e., they modify, define or stabilize each other. The core concept of how networks form in ANT is through ‘translation’, and the formation of a relationship may be called an act of ‘translation’ between elements or forces.[57] It is for this reason that ANT is also called the ‘sociology of translation’.[75] Translation is essentially the process through which actors are defined and their interests and agencies characterized, the circumstances of interaction are established, following which the actor-network is stabilized and mobilized. In 1984, Michael Callon defined a four-step process for translation (four ‘moments’ of translation), while studying the sociology of scallop-fishing.[75] The first step is ‘problematization’ or defining the problem, which involves ‘inter-defining’ which means that the actors are defined in the context of the problem and by the links between them, in addition to their identities and interests. The second step in the process is ‘interessement’. Through this step, the actors become locked into roles and places in context of the problematization, according to their identity, goals, or interests; various cross-connections and possibilities of alliances are identified. The third step is ‘enrolment’, wherein negotiations take place for the formation of the alliances. If actors demonstrate agreement, the network converges, and becomes highly aligned and coordinated. Alignment within the network represents common history and shared space between the actors, while coordination is the

50 adoption of conventions that the translation necessitates. The final step in the process is ‘mobilization’. This step ensures that the agreements formed by the network perpetuate. Here, entities converge by being made similar (such as a speed bump, a speed limit sign, and a traffic policeman, may be substituted for one another), or by simplifying (wherein a network is ‘black-boxed’ or seen as a single block). Networks with a high level of convergence may only be black-boxed. These black-boxed networks may be treated as single-point actants, and seen only for their inputs and outputs. However, even when mobilized, it must be considered that the organization of the actor-network is fluid, and an actor-network may redefine itself if new entities are introduced. This can happen when similar actors are interchanged (a traffic policeman introduced in the place of speed bump), causing a change in actor-interactions and hence the eventual translation. This can also happen when the degrees of alignment and coordination change or were earlier misunderstood, before black-boxing. This would require the ‘unblack-boxing’ of the actor-network and re-examination of the translations. There are three entities that play key roles in translation. These are obligatory passage point(s) (OPPs), immutable mobiles and boundary objects. OPPs are channels through which actors in the network may form an alliance. Immutable mobiles are objects that are transportable but their content remains the same, with examples being texts and manuscripts, or images and paintings. Boundary objects are such objects that are transportable and robust enough to maintain an identity, but may be interpreted differently across sites. An example is data, which may be analysed and interpreted in different ways by different researchers. It must finally be noted that an actor-network is not necessarily a visual, formal representation of a network, but simply an analysis of translations and actor-interactions.

6.5 Actor-networks on social media platforms

On the basis of the mechanism defined above, this study examines the formation of actor-networks responsible for the perpetuation of misinformation on three social media platforms - Twitter, Facebook and WhatsApp. At present, the actor-networks formed by all three of these may be seen as ‘black-boxes’, with the input being misinformation, and the output being its spread. Hence, in defining the actor-network, there is an ‘unblack-boxing’ of the actor-network that has been considerably mobilized. However, it can be observed from these ‘black-boxes’ that the input (misinformation) does not always generate the desired output (spread of misinformation). This indicates that there is insufficient convergence within the actor-network. This volatility of the network may be exploited to further reduce this convergence, in an effort to counter the phenomenon of misinformation. The method will thus involve the unblack-boxing of the misinformation perpetuating actor-networks formed by each of the three above platforms examining actors that contribute to the alignment and coordination, as well as those that contribute to the volatility of the network. Furthermore, the study would also extrapolate a functional understanding of social media platforms to represent the actor-networks adequately. Kietzmann et al identified seven functional blocks that define

51 different social media platforms.[64] They represented these functional blocks as ‘the honeycomb of social media’. These functional blocks are as represented in figure 6.1.

Figure 6.1 The honeycomb of social media [64]

6.5.1 Problematization

The first step would be to identify key entities that play a role in the spread of misinformation. The first is the creator of the piece of misinformation (‘message’ from here). The creator may be influenced by external sources in forming the information, and the creator may be a group of individual persons. However, the ‘creator’ must be identified solely as the actor who has created the message, and wants to spread it to a larger populace. An individual writing thoughts in private, with no interest to spread it would thus not be a ‘creator’. An interest to spread misinformation is a key characteristic for the ‘creator’. However, whether this interest arises due to misplaced beliefs or pure malice is not of concern, and as such the ‘creator’ would be studied independent of any such assumptions. The message itself is both an immutable mobile, and a boundary object, as would be further discussed during the interessement phase. Next, multiple categories of users are identified as actors. In the real world, a person may fall under one or more of these categories. However, to gauge multiple motivations and intentionalities, while maintaining symmetry, it is crucial to take each of these as a single actor. Thus, many user actors may correspond to a single real world user. These user actors are defined as following:

52 1. Troll user:[26] A troll user would often be an ardent perpetuator of misinformation. These user actors are driven by eliciting emotional response from other online users, and like to exploit the tendency of misinformation to appear spectacular and novel. They can be considered to be acting out of pure malice, and thus are often in strong alignment with the message.

2. Identity-protective user (I-P user):[49] These users exhibit ‘identity-protective cognition’ when encountered with a message. This refers to the unconscious tendency to credit or discredit certain information if it protects one’s perceived identity, irrespective of the falsehood of the information. Thus, these users would share any information if they perceive it to be aligning with their identity and beliefs.

3. Social butterfly user (SB user): These are users who are driven by online reputation, and thus may share content that interacts with the ‘like’ feature consistently, or with the ‘comment’ feature positively. Their motivations are driven by neither malice nor benevolence.

4. Good Samaritan user (GS user): These are users who are driven by a desire to help the community. Thus a good Samaritan user may share misinformation without making truth judgements, if they believe that the information may help the larger community. For instance, they might share a fake message about the inefficacy of a drug to help society. They are thus driven primarily by benevolence.

5. Vigilant user: These are users who would check the validity of any piece of information before sharing.

These users have been defined not only by their intentionalities, but also relying on the reasoning that they may exercise when faced with misinformation and whether or not to share it. Do note that the existence of the creator and the multiple types of users and their interests are independent of the social media platform. However, their agencies and interactions would differ across these platforms. Besides these human actors, there are certain non-human actors and single-point actors (formed by other actor-networks) that are also of relevance. The different groups and communities formed on the different social media platforms are single-point actors. For Twitter, only one such group exists, which is entirely global, as a tweet is public to all. For Facebook, there could be public groups where the content is public to all, closed groups where the content is not public, but the existence of the group is, and secret groups whose content and identity are both hidden. A group’s intentionality is to loosely monitor the content shared with it so as to maintain a good standing within the community. The responsibility to monitor is the most with a public group and almost muted for a secret group. For WhatsApp, there are public groups, which anyone can join, and private groups, where only a user can add another user. Historically, there has been no responsibility to monitor the content shared

53 with these groups, but with the rise in misinformation-caused violent incidents, some governments have asked public group admins to take responsibility.[112] For private groups, however, the responsibility lies solely on the users, and not the group as a whole. Hence, looking at it as a single-point actant, it can be assumed that the private group itself does not have any responsibility. In the context of a message, each of these groups can either be interest groups, or non-interest groups. If a message aligns with the beliefs and interests of the group, the group itself is an interest group. Thus, the group actors can finally be classified as follows:

1. Twitter community

2. Facebook public interest group

3. Facebook public non-interest group

4. Facebook private interest group

5. Facebook private non-interest group

6. Facebook secret interest group

7. Facebook secret non-interest group

8. WhatsApp public interest group

9. WhatsApp public non-interest group

10. WhatsApp private interest group

11. WhatsApp private non-interest group

Furthermore, various features also act as non-human necessary conditions, mediators or sometimes even moral or political actors. These features are the ‘like’ feature present on both Twitter and Facebook, the ‘retweet’ feature, the ‘share’ feature or the ‘forward’ feature present on Twitter, Facebook and WhatsApp respectively, and the ‘comment’ or ‘reply’ feature in various forms, but ultimately interchangeable, on each of the three platforms. As discussed earlier, interactions over social media are defined by seven functional blocks. The group functional block has already been broken down to relevant actors. The other functional blocks can emerge out of user-group interactions, or user-feature interactions. For instance, an I-P user could initiate conversations easily within an interest group that aligns with their beliefs. A user’s reputation could be represented by their interactions with the ‘like’ feature, and may vary across groups. Next, consider each of the application platforms’ development and content monitoring team as a single-point actant in itself, and they could be termed Twitter CM, Facebook CM and WhatsApp CM respectively. As an actor, WhatsApp CM has no agency, given that it has no access to the end-to-end encrypted content on its platform, and in fact has no real world corresponding entity. Twitter CM has the

54 agency to not only purge a message out of the network, but even to purge a user. Facebook CM exhibits limited intentionality and agency in purging either misinformation or the user out of the network. Finally, consider the respective applications as business models. These would be termed simply by the respective platform names; Twitter, Facebook and WhatsApp. The interest for each of these platforms is to engage users for long and ensure constant information flow, to generate revenue. While doing so, their interest also lies in maintaining reputation. Thus, in a way, the interests of these business models is tangential to that of the CM actors, in that only when the misinformation flow reaches a degree when the reputation of the platform is at stake, would their interests align, but otherwise remain contrary.

6.5.2 Interessement

The interessement step identifies how various actors and their interests lock into roles and places. The ‘creator’ may be considered the primary actor here, with the intentionality to spread a message. The agency endowed to the creator is limited to the formation of the message, and sharing it with certain users, or publishing it through an online site. The message itself is an immutable mobile once drafted, matching Latour’s criteria for the same perfectly, in that it is mobile, immutable, modifiable in scale, and easily reproducible. However, it also acts as a boundary object within the network, because of the variance in persuasion ability of the message. Based on the content of the information, the actor-interactions would realign. Nevertheless, it is important to identify how different kinds of information would interact differently with the different types of users, in appealing to their respective sensibilities. When a message appeals to the sensibilities of either an identity-protective user or a good samaritan user, the ‘forward’ option on WhatsApp, the ‘share’ option on Facebook, or the ‘retweet’ option on Twitter, would act as members of moral and political gatherings. This means that for these users, a forward option represents a responsibility towards their society or political inclinations, and thus they must spread this, no matter what the likelihood of its falseness. As for the troll user or the social butterfly user, when a message appeals to their sensibilities, the respective options to share act as mediators, but do not act as moral or political entities. These options are not mere intermediaries, as the shared message carries a tag on each of the social media platforms, by either being a link to the external source, or if it is shared within the application, a ‘forwarded’ tag on WhatsApp, a ‘shared’ tag on Facebook, and a ‘retweeted’ tag on Twitter. Finally, it is safe to assume that a vigilant user never shares the message, and can as such be looked at as a non-negotiable actor in our network. Once the message reaches the groups, it becomes a substitute for the ‘creator’; the message is now a non-human gathering actor, with the intentionality to be spread. The group interactions with the message would be of two forms; a non-interest group may purge it to maintain reputation, and an interest group would keep it. Furthermore, through a group, the message would interact with other user actors, feature actors, and content monitoring actors. These interactions would shape the final structure of the actor-network.

55 6.5.3 Enrolment

Thus alliances may be formed by different users and the creator, with the message acting as a passage point. The creator and trolls are in natural alignment, and depending on the interpretation of the message, the GS user and the I-P user may join this alliance. Next, if the message interacts with the ‘like’ features sufficiently, the SB user too joins the alliance. A ‘comment’ feature actor, however, may be able to break some alliances by pointing to the falsehood of the message. This may not lead to purging of the message, but it would certainly stop the GS users by appealing to their moral sensibilities, and the SB users who would not want to ruin their good standing. The I-P users would still continue to share the message to interest groups, and trolls who are in strong alignment with the message would continue sharing it. Furthermore, a ‘comment’ as it stands now is simply a mediator, and does not by itself belong to a moral or political association. Thus the agency of the ‘comment’ actor is limited. From the alliance, the message may reach different groups. As discussed earlier, the message is now a non-human gathering actor. The message now interacts with group actors, and through group actors, it also interacts with the different types of user actors, as well as with the feature actors. If the message reaches an interest group, it would find more I-P users, and thus be shared more. Similarly, if the message, upon reaching a group interacts with the ‘like’ feature more, it would be shared further by the SB users. Even if it does not interact with the ‘like’ feature, the ‘retweeted’, ‘shared’ or ‘forwarded’ tags would also add to the apparent reputation of the message for SB users. Finally, irrespective of the group or feature interactions, it would be shared by a troll, and if the message has societal implications, it would be shared by the GS users as well. In case of non-interest groups, the group may by itself purge the message out of the network to preserve the reputation. At any stage, Twitter CM may be able to purge the message, and certain users to break the cycle. Facebook CM, too, may be able to do so for its public groups, but the message could still perpetuate on private groups. Finally, WhatsApp CM would remain an inconsequential actor. However, this act of purging, as earlier discussed, would only happen once the respective business model actors decide that their reputation may be at risk. Thus, in the actual network, this enrollment can occur only after a specific number of cycles.

6.5.4 Mobilization and Analysis

As long as the message is not purged at any stage, it would continue to perpetuate. It can be observed that following conditions ensure a high alignment and coordination in the network. 1. Broad appeal of the message: If the message includes multiple identity-protective ideas, along with a societal implication, it would appeal to a broader set of users. The message may also be likeable for multiple individuals.

2. Sharing on interest groups: Through interest groups, the chances of purging of the message reduce. Furthermore, from interest

56 groups, a message may emerge as popular (through interactions with ‘like’ and ‘comment’ features), even if it had not in the first iteration, thus attracting more users, and more chances of being shared. It may also be interesting to note that the message flowing on interest groups would also impact the platform reputation relatively, and thus delay the alignment of platform business model’s interests and the platform CM interests.

3. Sharing on WhatsApp: Since it is impossible for the message to be purged by content-monitors as there are none for WhatsApp, it would perpetuate for much longer. More interestingly, the lack of ‘like’ feature on WhatsApp means that a broader range of messages can be spread, as the message does not have to bother with forming alliances with SB users.

Furthermore, Kahan claims that identity-protective cognition has the greatest power to mislead.[49] Since I-P users tend to exist in higher proportions in an interest group, the emphasis on GS users or SB users to share misinformation also reduces, and thus the network is more aligned when exchanged by multiple I-P users (due to common beliefs). Clearly, the Twitter community has the least ability to maintain this, while through WhatsApp interest groups, networks align much more readily. Considering these conditions, the actor-network formed by sharing on WhatsApp interest groups is the most highly aligned and coordinated, and thus the most converged, while that formed by sharing on Twitter is the least converged.

6.5.5 Visual representation

Figure 6.2 represents the visual topography of the actor-network. However, as explained earlier an actor-network is not necessarily a formal topographical representation, and thus this visual representation should only be understood in concurrence with the above translation mechanisms.

6.6 Discussion

The actor-network analysis may be considered in conjunction with various ethnographic studies. Take, for instance, the conclusions from Banaji and Bhat et al as discussed in chapter 5. The I-P users are primarily Hindu nationalists in India, and many interest groups are ideologically-charged with Hindu nationalism. Thus, misinformation related to Hindu nationalism is very likely to spread. Such ideologically-charged misinformation, coupled with potential social ills of child kidnapping, is then likely to be shared by I-P users and GS users alike, and this category of misinformation is likely to find the most traction, with most alliances formed. Many of the instances of misinformation-fuelled violence thus fit neatly into the disinformation actor-network of WhatsApp.

57 Figure 6.2 Visual representation for the disinformation ANT

58 6.7 Future Work

While the actor-network analysis in this study is driven by online user behaviour and social media app structures, it might be useful to incorporate certain ethnographic elements to the study. One such element could be estimating the proportion of the many categories of users on each platform and respective group types. Furthermore, an extensive ethnographic and experimental study could also be conducted to look at different subtypes (ranging from political leaning to hobby groups) within interest groups, which could in turn be used to determine interactions with different genres of misinformation messages. Finally, Vicsek et al propose a productive convergence between ANT and Analysis (SNA), a more quantitatively driven methodology.[74] The intersection of these two may be able to predict much more about user behaviour in the diffusion of misinformation.

59 Chapter 7

Conclusion

The various challenges that social media is now posing to democratic principles are certainly not of its own making. These problems can be found at various stages in our history, either through authoritarian regimes, as authoritarian responses to a real-world problem, or through capitalist corporate control. Nonetheless, we are now witnessing a massive surge in multiple phenomena through which those in power can control the masses and subtly limit their democratic rights. The phenomena of mass surveillance, media control, and disinformation are interlinked in a system of control. This system incentivizes predictability and total control over human behaviour and actions, and the amenability of the masses towards such a system, including valuing the convenience gained over the freedom lost, further reinforces its structure. The previous discussions in this thesis indicate that technology-mediated life inadvertently has homogenizing yet dividing, comforting yet despairing effects. Although these phenomena may be seen as intrinsic to human nature, their exacerbation is strongly coupled with the design of online Internet platforms, and the policies that govern them. A great deal of responsibility thus also lies with the Internet corporations. Though the incentive of generating advertising revenue has caused many Internet companies to ignore the risks of indiscriminate algorithms and information flow, it may not prove to be sustainable. The increasing distrust in mainstream media is also trickling to social media. In that case, the saving grace for Internet companies may be to either wait for a new world order to emerge that shuns democratic principles, or step up to protect their users’ democratic rights. The Internet itself was largely hailed as a liberating force upon its creation; a force that would create better informed individuals and bring us together. The rise of social media sites such as Facebook, Twitter and WhatsApp was further seen as a levelling ground, where one could let go of their biases, and embrace a multitude of perspectives, progressing towards a better future. This utopian view still governs many of the behind these platforms, which can be a saving grace as long as it does not end in complacency. Many of these corporations are now actively involved in research in the area of misinformation, denying state surveillance efforts and ensuring more privacy for users. However, these are complex socio-technical problems, and simple technical solutions may not be enough. This thesis thus attempts to understand the problem from both social and technological sides.

60 This two-way comprehension of the problem is crucial as the solutions too must come from both ends. People need to be informed on the pitfalls of the Internet, and take responsibility for adequate usage, just as companies need to ensure that their platforms do not amplify surveillance or misinformation. Research in this field has similarly been limited by either employing purely quantitative or purely qualitative methodology. While this thesis introduces both quantitative and qualitative aspects towards studying these phenomena, the NLP analysis presented herein could benefit from qualitative political discourse analysis, while the ANT analysis could benefit from a quantitative social network analysis. Similarly, another such approach can come from cognitive psychology, that can inform the design features for the present Internet technologies such that the social ills brought on by the Internet do not persist. An interdisciplinary approach, studying both the social and technical aspects, is thus the only viable methodology that can lead to adequate solutions to these problems. Only when both social and technological spectrums of these phenomena are sufficiently addressed, can we find a fruitful conclusion to these ills. The journey is certainly long though, since the very building elements of the structure of the Internet and its business model, as well as societal structures that have arisen alongside, would have to be dismantled, and the foundations laid afresh.

Very well; and may we not rightly say that we have sufficiently discussed . . . the manner of the transition from democracy to tyranny?

Socrates[88]

61 Related Publications

• Neelesh Agrawal and Radhika Krishnan. Anonymity, Commitment, And Information: Extend- ing Kierkegaard’s Critique Of The Press To The Internet. 2020. Paper proposal accepted for the Fifteenth International Conference on Interdisciplinary Social Sciences at National and Kapodistrian University of Athens, Athens, Greece.

• Neelesh Agrawal and Radhika Krishnan. Shaping Political Discourses: Mainstream News vs Social Media. 2020. Paper proposal accepted for the Sixteenth International Conference on Technology, Knowledge and Society at the University of Illinois, Urbana-Champaign, USA.

• Neelesh Agrawal and Radhika Krishnan. Why Disinformation Prefers WhatsApp: An Actor- Network Analysis. 2020. Paper proposal accepted for the Tenth International Congress on Technology, Science and Society at Universidade de Santiago de Compostela, Galcia, Spain.

62 Bibliography

[1] Beautiful Soup Documentation. https://www.crummy.com/software/BeautifulSoup/bs4/ doc/ (accessed on 1 Jun 2020). [2] Definition of Disinformation. Merriam-Webster Dictionary Online. https://www.merriam- webster.com/dictionary/disinformation (accessed on 1 Jun 2020). [3] Definition of Fake News. Collins English Dictionary Online. https:// www.collinsdictionary.com/dictionary/english/fake-news (accessed on 1 Jun 2020). [4] Definition of Misinformation. Merriam-Webster Dictionary Online. https://www.merriam- webster.com/dictionary/misinformation (accessed on 1 Jun 2020). [5] Definition of Privacy. Oxford Dictionary on Lexico.com. https://www.lexico.com/definition/ privacy (accessed on 1 Jun 2020). [6] Documentation for Newspaper3k: Article scraping & curation. https:// newspaper.readthedocs.io/en/latest/ (accessed on 1 Jun 2020). [7] Documentation for Twitter API. https://developer.twitter.com/en/docs (accessed on 1 Jun 2020). [8] Facebook Community Standards. Facebook. https://www.facebook.com/ communitystandards/ (accessed on 1 Jun 2020). [9] firstpost.com Competetive Analysis, Marketing Mix and Traffic. Alexa. https://www.alexa.com/ siteinfo/firstpost.com (accessed on 1 Jun 2020). [10] Highest Circulated Daily Newspapers 2018 (language wise). Audit Bureau of Circula- tions. http://www.auditbureau.org/files/JD2018%20Highest%20Circulated% 20(language%20wise).pdf (accessed on 1 Jun 2020). [11] Reddit Changelog. Reddit. https://www.reddit.com/r/changelog/ (accessed on 1 Jun 2020). [12] Removing harassing subreddits. r/announcements, Reddit. https://www.reddit.com/r/ announcements/comments/39bpam/removing harassing subreddits/ (accessed on 1 Jun 2020). [13] Top Sites by Category: Top/News. Alexa. https://www.alexa.com/topsites/category/Top/ News (accessed on 1 Jun 2020).

63 [14] Verified account FAQs. Twitter Help. https://help.twitter.com/en/managing-your- account/twitter-verified-accounts (accessed on 1 Jun 2020). [15] WhatsApp Research Awards for Social Science and Misinformation. WhatsApp. https:// www.whatsapp.com/research/awards/ (accessed on 1 Jun 2020). [16] A. Agrawal. Group admins will be held responsible for misinformation, fake news in Mumbai. Medianama, 25 May 2020. https://www.medianama.com/2020/05/223-group-admins- misinformation-mumbai/ (accessed on 1 Jun 2020). [17] A. Al-Rawi. Gatekeeping Fake News Discourses on Mainstream Media Versus Social Media. Social Science Computer Review, 37(6):687–704, 2019. doi: 10.1177/0894439318795849. [18] A. Bomford. Echelon spy network revealed. BBC, 3 Nov 1999. [19] A. C. Madrigal. India’s Lynching Epidemic and the Problem With Blaming Tech. The Atlantic, 25 Sep 2018. https://www.theatlantic.com/technology/archive/2018/09/whatsapp/ 571276/ (accessed on 1 Jun 2020). [20] A. Chakravarti. Indian govt should be able to read private WhatsApp messages, Parliament Panel tells tech companies. India Today, 27 Jan 2020. https://www.indiatoday.in/technology/news/ story/indian-govt-should-be-able-to-read-private-whatsapp-messages- parliament-panel-tells-tech-companies-1640679-2020-01-27 (accessed on 1 Jun 2020). [21] A. F. Westin. Privacy And Freedom. Washington & Lee Law Review 166, 25, 1968. https: //scholarlycommons.law.wlu.edu/wlulr/vol25/iss1/20 (accessed on 1 Jun 2020). [22] A. Gabbat and agencies. NSA analysts ’wilfully violated’ surveillance systems, agency admits. The Guardian, 24 Aug 2013. [23] A. Greenberg. NSA Chief Denies Wired’s Domestic Spying Story (Fourteen Times) In Congressional Hearing. Forbes, 14 Aug 2013. [24] A. Ilsey. How the John Oliver Effect is changing the way we consume news. The California Aggie, 14 Apr 2020. https://theaggie.org/2020/04/14/how-the-john-oliver-effect-is- changing-the-way-we-consume-news/ (accessed on 1 Jun 2020). [25] A. Javaid. Ladakh diktat to WhatsApp group administrators - register yourself with police, bring photos. ThePrint, 8 Jan 2020. https://theprint.in/india/ladakh-diktat-to- whatsapp-group-administrators-register-yourself-with-police-bring- photos/346513/ (accessed on 1 Jun 2020). [26] A. Marwick and R. Lewis. Media Manipulation and Disinformation Online. Technical report, Data & Society Research Institute, 2017. https://datasociety.net/output/media-manipulation-and- disinfo-online/ (accessed on 1 Jun 2020).

64 [27] A. Paliwal. NaMo on, NaMo TV gone. India Today, 20 May 2019. https://www.indiatoday.in/ elections/lok-sabha-2019/story/namo-tv-disappears-after-elections-end- 1529664-2019-05-20 (accessed on 1 Jun 2020). [28] A. Smith and V. Banic. Fake News: How a Partying Macedonian Teen Earns Thousands Publishing Lies. NBC News, 9 Dec 2016. http://www.nbcnews.com/news/world/fake-news-how- partying-macedonian-teen-earns-thousands-publishing-liesn692451 (accessed on 1 Jun 2020). [29] B. J. Fogg. Chapter 4 - Computers as persuasive media: Simulation. In Morgan Kaufmann, editor, Persuasive Technology, pages 61–87. 2003. [30] B. Latour. The moral dilemmas of a safety belt. 1989. Translated by L Davis. Avail- able at: https://courses.ischool.berkeley.edu/i290-tpl/s11/w/images/6/69/ The Moral Dilemmas of a Safety-belt.pdf (accessed on 1 Jun 2020). [31] B. Latour. On actor-network theory: A few clarifications. Soziale Welt Jahrg, H 4, 47:369–381, 1996. [32] B. Latour. Reassembling the Social: An Introduction to Actor-Network Theory. Oxford University Press, Oxford, 2005. [33] B. Perrigo. How Volunteers for India’s Ruling Party Are Using WhatsApp to Fuel Fake News Ahead of Elections. Time, 25 Jan 2019. https://time.com/5512032/whatsapp-india-election- 2019/ (accessed on 1 Jun 2020). [34] B. Schneier. Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World. Norton, W W & Company, 2015. [35] BBC. Legitimate Expectations of Privacy. In Section 7.1: Privacy - Introduction. BBC Editorial Guidelines. https://www.bbc.com/editorialguidelines/guidelines/privacy (accessed on 1 Jun 2020). [36] C. Cadwalladr. Google, Democracy and the Truth about Internet Search. The Guardian, 4 Dec 2016. https://www.theguardian.com/technology/2016/dec/04/google- democracy-truth-internet-search-facebook (accessed on 1 Jun 2020). [37] C. O’Neil. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, Penguin Random House LLC, New York, 2016. [38] C. Silverman and L. Alexander. How Teens In the Balkans Are Duping Trump Supporters With Fake News. Buzzfeed News, 3 Nov 2016. https://www.buzzfeed.com/craigsilverman/how- macedonia-became-a-global-hub-for-pro-trump-misinfo (accessed on 1 Jun 2020). [39] D. Alvarez-Melis and M. Saveski. Topic modeling in twitter: Aggregating tweets by conversations. In Tenth International Conference on Web and Social Media, 2016. [40] D. Boyd. Google and Facebook Can’t Just Make Fake News Disappear. Wired, 2017. https://www.wired.com/2017/03/google-and-facebook-cant-just-make-fake- news-disappear/ (accessed on 1 Jun 2020).

65 [41] D. Coldeway. Reddit overhauls upvote algorithm to thwart cheaters and show the site’s true scale. TechCrunch, 7 Dec 2016. https://techcrunch.com/2016/12/06/reddit-overhauls- upvote-algorithm-to-thwart-cheaters-and-show-the-sites-true-scale/ (ac- cessed on 1 Jun 2020). [42] D. Ghose and A. Singh. Muzaffarnagar rioters used WhatsApp to fan flames, find police. Financial Express, 12 Sep 2013. https://www.financialexpress.com/archive/muzaffarnagar-rioters- used-whatsapp-to-fan-violence-find-police/1168215/ (accessed on 1 Jun 2020). [43] D. K. Jha. Modi bars ministers, bureaucrats from talking to journalists. Scroll.in, 7 Jun 2014. https://scroll.in/article/666554/modi-bars-ministers-bureaucrats- from-talking-to-journalists (accessed on 1 Jun 2020). [44] D. Kurup. In the dark about ’India’s Prism’. , 7 Jun 2016. https://www.thehindu.com/ sci-tech/technology/in-the-dark-about--prism/article4817903.ece (accessed on 1 Jun 2020). [45] D. Lyon. The Electronic Eye: The Rise of Surveillance Society. University of Minnesota Press, 1994. [46] D. Lyon. Surveillance as Social Sorting: Privacy, Risk, and Digital Discrimination. Psychology Press, 2003. [47] D. Lyon. Surveillance Studies: An Overview. Polity Press, Cambridge, 2007. [48] D. M. Blei and A. Y. Ng and M. I. Jordan. Latent Dirichlet Allocation. In the Journal of Machine Learning Research, 3:993–1022, 2003. [49] D. M. Kahan. Misinformation and Identity-Protective Cognition. Research Paper 587, Yale Law & Economics, 2 Oct 2017. doi: 10.2139/ssrn.3046603. [50] D. Narayanan and V. Ananth. How the mobile phone is shaping to be BJP’s most important weapon in elections. The Economic Times, 23 Aug 2018. https://economictimes.indiatimes.com/ news/politics-and-nation/how-the-mobile-phone-is-shaping-to-be-bjps- most-important-weapon-in-elections/articleshow/65508743.cms (accessed on 1 Jun 2020). [51] E. Dwoskin and A. Gowen. On WhatsApp, fake news is fast - and can be fatal. The Washington Post, 24 Jul 2018. https://www.washingtonpost.com/business/economy/on-whatsapp-fake- news-is-fast--and-can-be-fatal/2018/07/23/a2dd7112-8ebf-11e8-bcd5- 9d911c784c38 story.html (accessed on 1 Jun 2020). [52] E. S. Herman and N. Chomsky. Manufacturing Consent: The Political Economy of the Mass Media. The Bodley Head, 1988, 2008. [53] E. Sayes. Actor-Network Theory and methodology: Just what does it mean to say that nonhumans have agency? Social Studies of Science, 44(1):134–149, 2014. doi: 10.1177/0306312713511867. [54] G. Bhatia. State Surveillance and the Right to Privacy in India: a Constitutional Biography. National Law School of India Review, 26(2):127–158, 2015.

66 [55] G. Debenedetti. Factbox: History of mass surveillance in the United States. Reuters, 2013. https://www.reuters.com/article/us-usa-security-records- factbox/factbox-history-of-mass-surveillance-in-the-united-states- idUSBRE95617O20130607 (accessed on 1 Jun 2020). [56] G. Khybri. How Threats on Twitter Manifest In Real Life: Indian Troll Tales. The Quint, 28 Aug 2018. https://www.thequint.com/neon/gender/trolling-women-journalists- rana-ayyub (accessed on 1 Jun 2020). [57] G. Ritzer, editor. Encyclopedia of Social Theory. SAGE Publications, Inc, Thousand Oaks, CA, 2005. doi: 10.4135/9781412952552. [58] H. Allcott and M. Gentzkow. Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31:211–235, 2017. [59] H. Jenkins. Convergence Culture. New York University Press, New York, 2006. [60] H. L. Dreyfus. Anonymity versus commitment: the dangers of education on the Internet. Ethics and Information Technology, pages 15–21, 1999. [61] I. Traynor. NSA surveillance: Europe threatens to freeze US data-sharing arrangements. The Guardian, 26 Nov 2013. [62] Intelligence Community Assessment. Assessing Russian Activities and Intentions in Recent US Elections. US Government, 6 Jan 2017. https://www.dni.gov/files/documents/ICA 2017 01.pdf (accessed on 1 Jun 2020). [63] J. Bartlett. The People vs Tech. Dutton, 2018. [64] J. H. Kietzmann and K. Hermkens and I. P. McCarthy and B. S. Silvestre. Social media? Get serious! Understanding the functional building blocks of social media. Business Horizons, 54(3):241–251, 2011. doi: 10.1016/j.bushor.2011.01.005. [65] J. Klaehn and J. Pedro-Caranana˜ and D. Broudy, editor. The Propaganda Model Today: Filtering Perception and Awareness. 2018. doi: 10.16997/book27. [66] J. O’Neill. Echelon. Word Association Publishers, 2005. [67] K. Conger and D. Alba. Twitter Refutes Inaccuracies in Trump’s Tweets About Mail-In Voting. The New York Times, 28 May 2020. https://www.nytimes.com/2020/05/26/technology/twitter- trump-mail-in-ballots.html (accessed on 1 Jun 2020). [68] K. Roose. ’Shut the Site Down,’ Says the Creator of 8chan, a Megaphone for Gunmen. The New York Times, 4 Aug 2019. https://www.nytimes.com/2019/08/04/technology/8chan-shooting- manifesto.html (accessed on 1 Jun 2020). [69] K. S. Shrivastava. 40 government departments are using a social media surveillance tool - and little is known of it. Scroll.in, 4 Sep 2018. https://scroll.in/article/893015/40-government- departments-are-using-a-social-media-surveillance-tool-and-little-is- known-of-it (accessed on 1 Jun 2020).

67 [70] K. Upmanyu. India’s Disinformation War More Complex Than in West: Oxford Prof. The Quint, 6 Oct 2018. https://www.thequint.com/news/india/media-coverage-disinformation- in-india-interview-rasmus-nielsen (accessed on 1 Jun 2020). [71] L. Ha and L. A. Perez and R. Ray. Mapping Recent Development in Scholarship on Fake News and Misinformation, 2008 to 2017: Disciplinary Contribution, Topics, and Impact. American Behavioral Scientist, Aug 2019. doi: 10.1177/0002764219869402. [72] L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Workshop on Social Media Analytics, 2010. [73] L. Matney. Reddit quarantines its biggest headache. TechCrunch, 27 Jun 2019. https: //techcrunch.com/2019/06/26/reddit-quarantines-its-biggest-headache/ (ac- cessed on 1 Jun 2020). [74] L. Vicsek and G. Kiraly´ and H. Konya. Networks in the social sciences: Comparing actor-network theory and social network analysis. Corvinus Journal of Sociology and Social Policy, 2016. doi: 10.14267/CJSSP.2016.02.04. [75] M. Callon. Some Elements of a Sociology of Translation: Domestication of the Scallops and the Fish- ermen of St Brieuc Bay. The Sociological Review, 32(1 suppl):196–233, 1984. doi: 10.1111/j.1467- 954X.1984.tb00113.x. [76] M. Fisher and A. Taub. On YouTube’s Digital Playground, an Open Gate for Pedophiles. The New York Times, 3 Jun 2019. https://www.nytimes.com/2019/06/03/world/americas/youtube- pedophiles.html (accessed on 1 Jun 2020). [77] M. Foucault. Panopticism. In Discipline & Punish: The Birth of the Prison, pages 195–228. Vintage Books, 1975. Translated by A Sheridan, 1995. [78] M. McLuhan. The Gutenberg Galaxy: The Making of Typographic Man. University of Toronto Press, 1962. [79] M. S. Bernstein and others. 4chan and /b/: An Analysis of Anonymity and Ephemerality in a Large Online Community. In Fifth ICWSM, pages 50–57, 2011. [80] M. Van Alstyne and E. Brynjolfsson. Electronic Communities: Global Village or Cyberbalkans? Technical report, MIT Sloan School, 1997. https://web.mit.edu/marshall/www/papers/ CyberBalkans.pdf (accessed on 1 Jun 2020). [81] M. Vengattil and A. Kalra and S. Phartiyal. In India election, a $14 software tool helps overcome WhatsApp controls. Reuters, 15 May 2019. https://in.reuters.com/article/india- election-socialmedia-whatsapp/in-india-election-a-14-software-tool- helps-overcome-whatsapp-controls-idINKCN1SL0PZ (accessed on 1 Jun 2020). [82] M. Zuckerberg. Building Global Community. Facebook, 16 Feb 2017. https://www.facebook.com/ notes/mark-zuckerberg/building-global-community/10154544292806634/ (accessed on 1 Jun 2020).

68 [83] N. Alawadhi. 70% plunge in viral forwarded messages globally: WhatsApp to govt. Business Stan- dard, 26 Apr 2020. https://www.business-standard.com/article/technology/ 70-plunge-in-viral-forwarded-messages-globally-whatsapp-to-govt- 120042601083 1.html (accessed on 1 Jun 2020). [84] N. Hardeniya and J. Perkins and D. Chopra and N. Joshi and I. Mathur. Text Wrangling and Cleansing. In Natural Language Processing: Python and NLTK, pages 21–31. Packt Publishing Ltd, 2016. [85] N. Hassanpour. Media Disruption and Revolutionary Unrest: Evidence From Mubarak’s Quasi-Experiment. Political Communication, 31(1):1–24, 2014. doi: 10.1080/10584609.2012.737439. [86] N. Postman. Amusing Ourselves to Death: Public Discourse in the Age of Show Business. Viking Press, Penguin Random House, 1985. [87] O. Borchers. Fast Sentence Embeddings on Github. GitHub, 2019. https://github.com/ oborchers/Fast Sentence Embeddings (accessed on 1 Jun 2020). [88] Plato. Book VIII. In The Republic. The Internet Classics Archive by Daniel C Stevenson, Web Atomics, 360 BCE. Translated by Benjamin Jowett. [89] R. Caplan and D. Boyd. Who’s Playing Who? Media Manipulation in an Era of Trump. In Z. Papacharissi and P. J. Boczkowski, editor, Trump and the Media. MIT Press, 2018. [90] R. Caplan and J. Donovan and L. Hanson and J. Matthews. Algorithmic Accountability: A Primer. Technical report, Data & Society Research Institute, 18 Apr 2018. [91] R. Feingold and others. Fake News and Misinformation: The Roles of the Nation’s Digital Newsstands, Facebook, Google, Twitter, and Reddit. Technical report, Fake News/Misinformation: The Challenge and the Most Effective Solutions. Stanford Law & Policy Lab, Oct 2017. [92] R. K. Nielsen. Disinformation is everywhere in India. The Hindu, 25 Mar 2019. https://www.thehindu.com/opinion/op-ed/disinformation-is-everywhere- in-india/article26626745.ece (accessed on 1 Jun 2020). [93] R. Mehrotra and S. Sanner and W. Buntine and L. Xie. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In SIGIR, 2013. [94] R O’Harrow Jr. No Place To Hide. Free Press, 2006. [95] R. S. Mueller III (Special Counsel). Report On The Investigation Into Russian Interference In The 2016 Presidential Election. Technical report, US Department of Justice, Mar 2019. [96] S. Ahmed. In Assam, a Myth, WhatsApp and Lack of Faith in Justice System Have Fuelled Lynchings. , 12 Jun 2018. https://thewire.in/rights/assam-lynching-child-lifting- rumours (accessed on 1 Jun 2020). [97] S. Arora and Y. Liang and T. Ma. A simple but tough-to-beat baseline for sentence embeddings. In ICLR, 2017.

69 [98] S. Banaji and R. Bhat and others. WhatsApp Vigilantes: An exploration of citizen reception and circulation of WhatsApp misinformation linked to mob violence in India. Technical report, Department of Media and Communications, LSE, 2019. [99] S. Basu. Manufacturing Islamophobia on WhatsApp in India. The Diplomat, 10 May 2019. https://thediplomat.com/2019/05/manufacturing-islamophobia-on-whatsapp- in-india/ (accessed on 1 Jun 2020). [100] S. Fiddler. Echoes of Echelon in Charges of NSA Spying in Europe. The Wall Street Journal, July 2013. [101] S. Gurman. AP: Across US, police officers abuse confidential databases. AP News, 28 Sep 2016. https: //apnews.com/699236946e3140659fff8a2362e16f43 (accessed on 1 Jun 2020). [102] S. Lefait. Surveillance on Screen: Monitoring Contemporary Films and Television Programs. Scarecrow Press Inc, 2013. [103] S. Ninan. How India’s Media Landscape Changed Over Five Years. The India Forum, 30 Aug 2019. https://www.theindiaforum.in/article/how-indias-media-landscape- changed-over-five-years (accessed on 1 Jun 2020). [104] S. Rashid. Shehla Rashid On Why She Broke Up With Twitter. HuffPost India, 12 Nov 2018. https://www.huffingtonpost.in/2018/11/12/shehla-rashid-on-why-she- broke-up-with-twitter a 23586998/ (accessed on 1 Jun 2020). [105] S. Rautray and E. T. Bureau. Centre withdraws Social Media Hub Policy, AG informs SC. The Economic Times, 4 Aug 2018. https://economictimes.indiatimes.com/news/politics- and-nation/centre-scraps-plan-to-set-up-social-media-hub-sc-told/ articleshow/65256582.cms (accessed on 1 Jun 2020). [106] S. Sismondo. An Introduction to Science and Technology Studies (Second Edition). Blackwell Publishing, 2010. [107] S. Srinivasan. Should government intervene in platform-publisher relationships? The Hindu, 1 May 2020. https://www.thehindu.com/opinion/op-ed/should-government-intervene- in-platform-publisher-relationships/article31476012.ece (accessed on 1 Jun 2020). [108] S. Zuboff. The Age of Surveillance Capitalism. Hachette Book Group, 2019. [109] Schema Design and Google Trends. The Lifespan of News . https:// www.newslifespan.com/ (accessed on 1 Jun 2020). [110] Sflc India. Our Statement on Liability of WhatsApp group Admins. SFLC.in, 15 Apr 2020. https:// sflc.in/our-statement-liability-whatsapp-group-admins (accessed on 1 Jun 2020). [111] Søren Kierkegaard. Conclusions from a Consideration of the Two Ages. In HV Hong and EH Hong, editor, Kierkegaard’s Writings, XIV, Volume 14: Two Ages: The Age of Revolution and the Present Age A Literary Review, pages 60–112. Princeton University Press, 1978.

70 [112] T. Jalan. Kupwara order asks ’public’ WhatsApp groups to be registered, admins to delete fake info. Medianama, 2019. https://www.medianama.com/2019/03/223-kashmir-kupwara- whatsapp-groups/ (accessed on 1 Jun 2020). [113] T. Mathiesen. The Viewer Society: Michel Foucault’s ’Panopticon’ Revisited. Theoretical , 1(2):215–234, 1997. doi: 10.1177/1362480697001002003. [114] T. Mikolov and I. Sutskever and K. Chen and G. Corrado and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119, 2013. [115] T. Romm and C. Timberg. Trump’s growing feud with Twitter fuels free-speech concerns. The Washington Post, 30 May 2020. https://www.washingtonpost.com/technology/2020/ 05/29/words-president-matter-trumps-growing-twitter-feud-fuels-free- speech-concerns/ (accessed on 1 Jun 2020). [116] T. Soundararajan and A. Kumar and P. Nair and J. Greely. Facebook India: Towards The Tipping Point of Violence Caste and Religious Hate Speech. Technical report, Equality Labs, USA, 2019. [117] U. Tiwari. The Design & Technology behind India’s Surveillance Programmes. CIS- India, 2017. https://cis-india.org/internet-governance/blog/the-design- technology-behind-india2019s-surveillance-programmes (accessed on 1 Jun 2020). [118] V. Eubanks. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Press, 2018. [119] W. Phillips. This Is Why We Can’t Have Nice Things: Mapping the Relationship Between Online Trolling and Mainstream Culture. MIT Press, 2015. [120] Y. Benkler and H. Roberts and E. Zuckerman. Study: Breitbart-Led Right-Wing Media Ecosystem Altered Broader Media Agenda. Columbia Journalism Review, 3 Mar 2017. http://www.cjr.org/ analysis/breitbart-media-trump-harvard-study.php (accessed on 1 Jun 2020). [121] Y. Roth and D. Harvey. How Twitter is fighting spam and malicious automation. Twitter Blog, 26 Jun 2018. https://blog.twitter.com/en us/topics/company/2018/how-twitter- is-fighting-spam-and-malicious-automation.html (accessed on 1 Jun 2020). [122] Z. Merchant. Facebook removes 687 pages and accounts linked to Congress party for ’inauthen- tic behaviour’. Medianama, 1 Apr 2019. https://www.medianama.com/2019/04/223- facebook-removes-687-pages-and-accounts-linked-to-congress-party-for- inauthentic-behaviour/ (accessed on 1 Jun 2020). [123] Z. Merchant. Facebook takes down 15 pages, accounts and groups with links to BJP. Medianama, 2 Apr 2019. https://www.medianama.com/2019/04/223-facebook-takes-down-15-pages- accounts-and-groups-with-links-to-bjp/ (accessed on 1 Jun 2020). [124] Z. Merchant. WhatsApp will soon let you decide who can Invite you to a Group. Medianama, 3 Apr 2019. https://www.medianama.com/2019/04/223-whatsapp-will-soon-let- you-decide-who-can-invite-you-to-a-group/ (accessed on 1 Jun 2020).

71