Comparing Surveillance by Governmental Actors and Commercial Actors: an LDA-Assisted Analysis of News Media Discourse

Comparing surveillance by governmental actors and commercial actors: An LDA-assisted analysis of news media discourse Keywords: discourse, framing, topic modelling, LDA, surveillance, privacy, news media, digital humanities, methodology Abstract This research compares the American news media discourse on technology-practices of surveillance and matters of privacy in the context of government actors with the discourse on such practices and matters with regards to commercial actors. To this end, a hybrid, mixed method methodology is developed. A Python-based topic modelling analysis is combined with a qualitative content analysis inspired by the discipline of discourse analysis. The corpus for the topic modelling consists of tens of thousands of texts taken from the News on the Web-corpus. The corpus for the qualitative analysis consists of relevant articles identified via the topic modelling analysis, amongst other text mining procedures. These are articles in which surveillance, as well as other technology- practices related to privacy, are either negatively of positively framed. The analyses showed that government practices of surveillance are often critically approached from a law perspective and positively evaluated from a national security perspective. In the commercial actor-discourse the positive and negative evaluations of surveillance can be regarded as a struggle between economic benefits and consumer convenience on the one hand, and information security on the other. In both discourses, privacy is the most talked about issue in negative framing, as other studies have also shown. Furthermore, technological properties are found to play a big role in both discourses, especially in surveillance-critical arguments. Evaluating the effectivity of the methodology of this study, which can be regarded as experimental, it is found that topic modelling is a promising way to improve the objectiveness of the standard, keyword-based manner of selecting a corpus for a content or discourse analysis, and increase the representativeness the eventual findings. A Master Thesis for the MSc Communication and Information Sciences program of Tilburg University Author: Chantal van Elden First reader and supervisor: dr. Emmanuel Keulers Second reader: dr. Ruud Koolen Submitted: July 27 2017 Table of contents 1. Introduction 2 2. Theoretical framework 6 2.1. The characteristics of data-driven surveillance 2.2. Issues concerning surveillance Ethics: privacy harm and discrimination Philosophical and practical criticisms 2.3. Public discourse and pending questions 3. Methodology 15 3.1. Discourse Analysis, Content Analysis and Natural Language Processing Various approaches to discourse research and their theoretical and practical ramifications The methodological focus of this thesis 3.2. Method for the quantitative analysis The research corpus: origins and pre-processing Extraction of relevant articles: NER, keyword identification and collocation Topic modelling with LDA 3.3. Method for the qualitative analysis The implications and context of the corpus The coding framework 4. Results 31 4.1. Government surveillance and commercial surveillance compared in topics 4.2. Comparing the use of frames in the context of different kinds of actors The results of the topic modelling search for relevant articles Categorising the articles A closer look on the issue of privacy 5. Conclusion 41 6. Discussion 44 References Appendix 1 1. Introduction Societies around the world are becoming increasingly ‘digitalized’: human interaction, machine interaction and human-machine interaction nowadays largely takes place via computer-facilitated communication channels, and much information is converted to- or created in digital forms that can be accessed via the internet. This digitization has had, and continues to have, great effects on the way people relate to the world and on the way knowledge is generated,1 which makes the technologies, practices and discourses that constitute this digitization a popular subject of critical research into the social, cultural or ethical implications and consequences of certain dimensions or specific cases of digitization. One of the prime digitization-related subjects that has been studied in this regard is the subject of surveillance, the “systematic investigation or monitoring of the actions or communications of one or more persons” (Clarke 2012). Following the digitization of society as a whole, surveillance has also become increasingly data-driven, as the actions and communications that are observed via systems of surveillance are nowadays more often than not virtual: they consist of data or metadata that are the result of online (social) action. These data are then collected and analysed to monitor or predict the characteristics or behaviour of certain individuals, groups, or human behaviour in general, via data- mining methods.2 These data-driven surveillance practices are for example executed by security agencies, that search for meaningful patterns or relationships in certain large collections of data such as aircraft passenger-name records in order to identify high-risk individuals and subject them to pre- emptive security measures such as extra airport security checks or a prohibition to fly (Leese 2014). Commercial actors engage in similar analyses, though for very different reasons: banks and insurance companies for example may use predictive data mining to automatically asses a potential customer’s credit worthiness, and retailers may use data mining to recommend certain items to the visitors of their online platform based on their overall web behaviour – all to the end of minimising costs or increasing revenue (Cheng et al. 2015, 6). Within various academic disciplines, and then especially the field of culture studies, critical analysis of such ‘technology-practices’3 of surveillance has become increasingly popular. This research can roughly be placed in two categories: descriptive studies – either theory-oriented or case-oriented – 1 For example, according to the much-cited sociologist Castells (2011) the “revolution of communication technologies” has led to the emergence of a global ‘network society’, which is a new social structure in which the old limitations of networked interactions have been eliminated for users of new network technologies. Similarly, Mayer-Schoenberger and Cukiern (2013) observe that the ‘datafication’ of society has led to a new epistemological paradigm, where the increased quantification of social action in the form of analysable data has led to radically new ways of creating (perceived) knowledge about social behaviour (in: Van Dijck 2014). It must be noted though that the ‘new’ communication- and information technologies that drive digitization are neither completely new (Bolter and Grusin 1996; Agar 2006; De Vries 2012; Beer 2016) nor determining for social conditions (Sismondo 2010). 2 Data mining is the process of “discovering novel, interesting, and potentially useful patterns from large data sets and applying algorithms to the extraction of hidden information”, typically in order to “build an efficient predictive or descriptive model of a large amount of data that not only best fits or explains it, but is also able to generalize to new data” (Cheng et al. 2015, 1). 3 With ‘technology-practices’ I mean the assemblage of technical, cultural and organisational aspects that make up a technology as it occurs in society; it is the object, or collection of objects plus the technical processes that sustain them, as well as its use and embedment in society (Pacey 1983, 5-6). 2 and discourse research. Where critical descriptive studies are concerned, authors from various fields of studies have voiced concerns about the consequences of both governmental and commercial surveillance practices for people’s right to privacy and the risks of inadequate data security (e.g. Tavani 1999; Van Wel and Royakkers 2004; Millar 2009; Hull et al. 2010; Bauman et al. 2014; Lyon 2014; Van Dijck 2014). Moreover, in the case of pre-emptive security measures based on predictive data mining, there are warnings of discrimination and jeopardizing the legal principle of presumption of innocence (e.g. Amoore and De Goede 2008; Kerr and Earle 2013; Leese 2014). Where discourse research is concerned, scholars have identified various patterns in the way people talk about and make sense of the technologies and practices related to surveillance in the media. Researchers have for example identified certain ‘frames’ that are commonly used to talk about specific technology-practices or controversial cases of surveillance, like the ‘surveillance can counter crime and terror’-frame, the ‘surveillance has led to an Orwellian dystopia’-frame, and the ‘privacy and digital services are inherently at odds’-frame (Bernard- Wills 2011; Lischka 2015; Mols and Janssen 2016). Interestingly, existing discourse research seems especially focussed on surveillance in a state context, and consequently privacy in the context of police action, security services and government legislation, while surveillance is also executed by commercial actors – for example in the form of targeted advertising. The surveillance practices by governmental actors and commercial actors are also inherently intertwined: national security agencies for example often work together with commercial actors such as telecom providers and airliners to get the data they need (Amoore 2009; Lyon 2014). The relative lack of attention to surveillance practices by commercial actors

Comparing Surveillance by Governmental Actors and Commercial Actors: an LDA-Assisted Analysis of News Media Discourse

Diadeloso Sanctions Placed on Dickerson Campaign After Code Violations

Blue Man Group | February 11-16, 2020 | Andrew Jackson Hall

'Every Film a Critic' from Highbrowers to Rotten Tomatoers and Youtube Video Essayists

Videorecenzie Ako Fenomén Filmovej Kritiky V Digitálnej Ére

Rhode Island Comic Con Rights the Ship,RI Comic Con-Versation

INSIDE THIS CT UR Men's Soccer Loses in Elite 8 Queer Ball Lets

Before the U.S

Propriétés Du Discours De La Caméra Virtuelle Hui-Yin Wu

Xavier University Newswire

Vernacular Reviews As a Form of Co-Consumption Maarit Jaakkola

Temporal Control for Interactive Virtual Storytelling

2021 Rights Catalogue