Big Dataveillance-Emerging Challenges-David Lyon
Total Page:16
File Type:pdf, Size:1020Kb
Big Dataveillance: Emerging Challenges David Lyon, Queen’s University Draft May 3 2016. INTRODUCTION Cynics may say that the only new thing about Big Data is the name. After all, corporations and governments have been gathering big data and searching for patterns for years. But the widespread use of the term, Big Data, itself has catalyzed both corporate celebration and controversial debate, that refuses to die down. Why? Because Big Data practices throw down a gauntlet to many conventional ways of doing things – for example abandoning what were once seen as the rules for statistical analysis, and, when applied, challenging basic aspects of the rule of law. In Big Data Surveillance, it is machines that ‘see’ or ‘watch,’ not passively (as in the panopticon) but predictively. BDS ‘learns’ through correlations and reproduces what was input. Here, we discuss Big Data Surveillance, which has become increasingly important to large organizations and to ordinary people, within today’s digital modernity. The statistical and software practices now gathered together under the often-hyped heading of big data now contribute to a novel surveillance situation for which a good term is ‘Big Dataveillance.’ All major surveillance trends (Bennett et al 2014) are affected by big data. Debating Big Dataveillance is vital because even if we can’t define Big Data it’s clear that rapid developments under that name challenge our conventional capacity to respond. Basic questions are raised beyond individual privacy and rights and indeed beyond specific harms. Big Data Surveillance sharply raises issues about the agents of surveillance and about not merely limiting harms but about seeking the common, public good (van der Sloot, Broeders and Schrijvers 2016). Our lives are made transparent to organizations in ways that are often scarcely visible to us; much Big Data Surveillance is hidden from us. The flows of data are fast and circulate within a broader range of organizations than ever before. Relationships between space and time alter and with them, power relations (Bauman 2000: 9). This aspect of digital modernity is “liquid surveillance,” which has some specific characteristics and is further facilitated by big data. Big data is both highly dynamic and weakly structured, which lends ‘liquidity’ to surveillance (Bauman and Lyon 2013, Lyon 2016). This is a slippery kind of surveillance in both senses. We have less idea of what is happening (Lupton 2015:34) and how it affects our lives and less sense of what, if anything, can be done about it (cf Solove 2013). 1 In the Western world, many discovered big data in September 2013, when a “Snowden document” was released, showing that the NSA tracks the social connections of Americans and others, using a colossal cache of data from phone call and email logs. Their sophisticated graphs of some Americans’ social connections can identify their associates, their locations at certain times, their traveling companions and other personal information (Poitras and Risen 2013). The NSA also “enriches” these data with material from other sources -- public, commercial; things like bank codes, insurance information, Facebook profiles, passenger manifests, voter registration rolls, location-based services, property records and tax data. All can also be stored for later use. The official reason is to find out who might be directly or indirectly in a contact chain connected with someone of foreign intelligence interest. NSA analysts seek both “person-centric” analyses and “relationship types” that yield “community of interest” profiles. The Snowden leaks show the extent of the use of everyday social media – as well as data sourced from economic transactions, sensors, government and ubiquitous cameras (Michael and Lupton 2015:5, Lyon 2014) -- as the source of actionable data – data that can be captured, assembled, classified and analyzed. These data enable the analytics from which profiles and predictions about us are made, always with a view to enlarging our opportunities or restricting our options. They illustrate the common characteristics of big data – volume, velocity and variety – and also some ‘Vs’ that are less discussed, such as veracity and vulnerability (Saulnier 2016, cf Pasquale 2015). Some starting assumptions: Algorithms and analysis are socially shaped and in turn shape social outcomes. The technologies and practices are bound up with government and corporation, now operating together more closely than ever. Surveillance capitalism is key (Zuboff 2015). So it’s not a ‘society vs technology’ question – Big Data is techno-cultural or socio-technical. And it’s complex; Big Data probably does hold positive promise in some areas. Big Data is not raw or neutral or objective. It is constituted within already-formed- and-emergent commercial, policing, administrative and security assemblages and affect people’s lives accordingly. The primary surveillance questions are ethical. To limit ourselves to legal, technical or administrative issues is inadequate. On the other hand, data subjects are not passive but actually develop various tactics to engage online. 1. BIG DATAVEILLANCE The term ‘big dataveillance’ puts the accent on the kind of surveillance in question. It connects ‘big data’ practices with surveillance done using data trails, something that has been happening since the 1980s using computer data and using “small data” before that. It picks up clues from bits of information left behind 2 every time we log on, make a call, are monitored by sensors or cameras, make a transaction with a bank, another business or a government department. Big dataveillance, like big data generally, is far from any kind of ‘settled’ state; it’s volatile and contested (Michael and Lupton 2015, Kitchin and Lauriault 2015, Clarke 2014). Such chronic volatility, along with its adoption as a key mode of accumulation, may be part of what’s ‘new’ about big dataveillance, beyond the name. Plus, innovations in technology, the widespread use of electronic devices, and massive dependence on technology for social purposes each make real-time information readily available. This occurs in so-called liberal-democratic societies (Bigo and Tsoukala 2008). “Datafication” (Van Dijck 2014) normalizes the process that turns social action into quantified data and allows data to be the currency of exchange between those who want it for business or security purposes and those apparently willing to part with it. As Van Dijck shows, this involves beliefs in big data and trust in its agents. “Life-mining” makes available many aspects of social life not previously accessible by security agencies, corporations or the academy. “Friending” and “liking” become algorithmic relations and the results of tracking them are taken as symptoms or sensors, as ways of knowing what the public is thinking or feeling. But the technological channels are not neutral and the aggregate data do not have a direct relation with individuals whose lives contributed to them. Analysis and projection or deduction and prediction do not have self-evident links (Amoore 2011). At the same time, personalization and customization are central to big data (Andrejevic 2012: 86). “Dataveillance [has] profound consequences for the social contract between corporate platforms and government agencies on the one hand and citizens-consumers on the other” (Van Dijck 2015). Big dataveillance is facilitated by the increasing digitization of everyday life, which in turn relates to the political economy of surveillance, or what Shoshana Zuboff calls “surveillance capitalism” (Zuboff 2015), marked by its dependence on big data. Her study of Google chief economist Hal Varian’s work shows why Google workers, paid or not, experience “studied indifference.” This analysis can be extended to Big Dataveillance. 2. CHANGING PRACTICES New modes of data capture and analytics The sources for data capture have proliferated exponentially -- web data, sensors, internet of things, government data, social media, transactional data plus new analytic paths: streaming, text. Smart and autonomous cars present new issues here! So personally identifiable information is not what it was. What counts, now, is data fragments, an algorithmic assemblage. Excess data- generation has become the norm (and data minimization is viewed as deviant). 3 Reliance on algorithmic modes of data capture and analysis often reinforces discrimination, as several studies indicate (Miller 2015). Google advertising shows ads for high-income jobs are frequently shown more to men than women (Carnegie-Mellon); ads for arrest records show up on searches for distinctively black names (Harvard); advertisers can target people in low income areas with high-interest loans (FTC) (Cf Dwork and Mulligan 2013). The questions pertain to far more than just national security! Especially in mobile contexts, using social media is a new source of intelligence information, not only for corporations seeking new markets but also in national security. This is seen for instance in the LEVITATION disclosures about Canada’s CSE in their search of “radicalized youth” by checking downloads (Gallagher and Greenwald 2015). They did not obtain corporate compliance. The ATOMIC BANJO program taps directly into internet cables and sifts individual IP addresses (cf Forcese and Roach 2015). This happens most obviously in online environments of connectivity. As Van Dijck observes, data culled from social media sites, including so-called “affective” traffic from “like” and “favorite” buttons,