A Broken Utopia: Big Data Bias and the Need for a New Ethics of Faultless Responsibility
Total Page:16
File Type:pdf, Size:1020Kb
A Broken Utopia: Big data bias and the need for a new ethics of faultless responsibility In 1970, Salvador Allende, a Marxist, was elected President of Chile on the promise of implementing la vía chilena al socialismo – the Chilean Path to Socialism. Allende wanted to precisely coordinate Chile’s economy to maximise fairness: “our objective,” he said, “is total, scientific, Marxist socialism.”1 Soon after Allende’s election the government hired English ‘cybernetician’ Stafford Beer, the world’s leading pioneer in the use of computers to organize production. Beer was to apply his methods on a national scale and computerize the whole of Chile’s economy. The physical aspect of Project Cybersyn, as it came to be known, was the Operations Room; a hexagonal room fitted with white fibreglass chairs and orange cushions, in which real-time information from Chile’s factories would appear on wall-mounted screens, and from where Chilean economists could oversee, coordinate and model the nation’s economy.2 Project Cyberysn was “a dispatch from the future”; it was a prototype of today’s big data society.3,4 Project Cybersyn was meant to perfect socialism; instead, capitalism has since perfected Project Cybersyn. While Chile had to make do with the technology of the ’70s, today faster processors and ubiquitous sensors5 have made it simple for businesses to collect and process huge datasets, with massive gains in efficiency, accuracy, and profits. ‘Big data’ is the catch- all term used to refer to these “increased capabilities to amass and store data and the 1 Régis Debray and Salvador Allende, Conversations with Allende: Socialism in Chile (N.L.B., 1971). 2 Eden Medina, Cybernetic Revolutionaries - Technology and Politics in Allende's Chile (MIT Press, 2011). 3 Evgeny Morozov, ‘The Planning Machine - Project Cybersyn and the origins of the Big Data nation’, The New Yorker (New York), 13 October, 119. 4 Project Cybersyn was never fully realised. A military coup broke over the head of Chile’s government, and in September 1973 Allende died in the Presidential Palace, defending himself with an AK-47 given to him by Fidel Castro. During the coup, a military officer entered the Operations Room, took out a knife, and stabbed the screens. Stafford Beer survived, but was no longer the rich Rolls Royce-driving industrialist the Allende government had first contacted: Beer spent the final years of his life in a cottage in Toronto, writing poetry and giving private yoga lessons in exchange for incense and flowers (“Money gets in the way of everything,” he said). All of this is in Medina’s wonderful book, which, as far as I can see, is the only book published on the subject. 5 In July of this year Gizmodo reported that the automated Roomba vacuum has been surreptitiously mapping homes for the past few years: Rhett Jones, ‘Roomba's Next Big Step Is Selling Maps Of Your Home To The Highest Bidder’, Gizmodo (online), 25 July 2017 <https://www.gizmodo.com.au/2017/07/roombas-next-big- step-is-selling-maps-of-your-home-to-the-highest-bidder/>. If you need further proof that sensors are everywhere, look no further than Snowden’s leaks on the NSA. 1 analytical models applied to them for yielding knowledge”.6 But even as the technology has changed, big data has maintained the air of the utopic. Big data promises efficiency and fairness. This essay does not dispute that there are many, many positive applications of big data – but I hope to show that in other cases, the best intentions can lead to morally repugnant outcomes, even when all human involved have acted ethically. This essay will focus on situations where big data has accidentally amplified bias and prejudice, as a way of illustrating the need for a new ethics to answer the question of who should be held responsible if big data inadvertently leads to a morally bad outcome.7 This essay argues for the adoption of Luciano Floridi’s 2016 concept of faultless responsibility as the compass by which to chart society’s use of big data. The complex and distributed nature of big data makes the attribution of responsibility a difficult task: but without the attribution of responsibility, the harms of big data will go unchecked, and the breathy utopianism of big data will continue to be undercut. 1. Toy versions of the world: the basics of big data Big data is the world translated into numbers. It is big data that measures workplace productivity,8 or ranks universities,9 or tells us that Nabokov’s favourite word was “mauve”.10 Big data uses numbers to build baseball teams and personalised Spotify playlists, or to tell Facebook which ads we’re most likely to click. It’s big data that collects our online 6 Effy Vayena and John Tasioulas, ‘The dynamics of big data and human rights: the case of scientific research’ (2016) 374(2083) Philosophical Transactions of the Royal Society A, 2. 7 Because the topic of big data is so vast, I won’t be able to cover the other attention-worthy issues that arise from its use. For an introduction to the interaction between big data and privacy see Edith Ramirez, ‘The Privacy Challenges of Big Data: A View From The Lifeguard’s Chair’ (Speech delivered at the Technology Policy Institute Aspen Forum, Aspen, Colorado, 19 August 2013) <https://www.ftc.gov/sites/default/files/documents/public_statements/privacy-challenges-big-data-view- lifeguard%E2%80%99s-chair/130819bigdataaspen.pdf>; and for a brilliant overview of the challenges big data poses to the media and the democratic process, read Katherine Viner, ‘How technology disrupted the truth’, The Guardian (online), 12 July 2016 <https://www.theguardian.com/media/2016/jul/12/how-technology-disrupted- the-truth>. 8 Joshua Rothman, ‘Big Data Comes to the Office’, The New Yorker (online) 3 June 2014 <http://www.newyorker.com/books/joshua-rothman/big-data-comes-to-the-office>. 9 Robert Morse, ‘The Birth of the College Rankings’, U.S. News (online) 16 May 2008 <https://www.usnews.com/news/national/articles/2008/05/16/the-birth-of-college-rankings>. 10 Dan Piepenbring, ‘The Heretical Things Statistics Tell Us About Fiction’, The New Yorker (online) 27 July 2017 <http://www.newyorker.com/books/page-turner/the-surprising-things-statistics-tell-us-about-fiction>. 2 activity to figure out if we need a new car, or a loan,11 or even to tell if we’re pregnant.12 Big data is not any one thing, but rather, is the name given when mathematics are heavily involved in guiding human activity. There are no limits to the potential uses of big data, and, seemingly, no limits to how willing we are to let big data into our lives. Many uses are innocuous (the alert on our phone telling us to take an umbrella), but even when the use is more serious (say, calculating our insurance premiums based on our driving record, or scoring our creditworthiness) we generally still accept the role of big data, because we trust the calculation will be fair. Data, after all, is math, and math is objective: two plus two equals four no matter if you’re David Duke or Ghandi. Big data implies the possibility of turning human organisation from an art into a science. But the idea that big data is free from human bias is simply incorrect. The calculations in the examples above are powered by algorithms, and these algorithms are written by humans. Far from being a science, the data scientists who write these algorithms refer to the process as “the ‘art’ of data mining”.13 The process begins by selecting a ‘target variable’ that the algorithm is trying to calculate. The target variable is rarely simple, which means that often the data scientist will be attempting to express an amorphous, real-world problem – for example, will this person commit a violent crime? – as a maths question. They must select the numbers that correlate with the target variable – for crime: low income, number of past crimes, and so on – and build these ‘proxies’ into a mathematical model of a violent criminal. When applied to an individual, the model spits out a number telling us how statistically similar that individual is to previous violent offenders, and, therefore, how likely they are to commit violent crime. These mathematical models are, by definition, simplifications; “no model can include all of the real world’s complexity.”14 Besides, nobody can give a universal explanation of why 11 Emily Steel and Julia Angwin, ‘On the Web’s Cutting Edge, Anonymity in Name Only’, The Wall Street Journal (online) 4 August 2010 <https://www.wsj.com/news/articles/SB10001424052748703294904575385532109190198>. 12 Charles Duhigg, ‘How Companies Learn Your Secrets’, The New York Times Magazine (online) 16 February 2012 <http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?_r=2&pagewanted=all>. 13 Solon Barocas and Andrew D. Selbst, ‘Big Data’s Disparate Impact’ (2016) 104, California Law Review 671, 678. 14 Cathy O’Neil, Weapons of Math Destruction - How Big Data Increases Inequality and Threatens Democracy (Penguin Random House, 2016), Loc 289. 3 people do commit violent crimes: but big data is not interested in causation, only results.15 These ‘results’ are essentially just an elaborate form of sorting, and when data scientists build what have been called mathematical “toy versions” of the world, they make subjective choices about how best to divide the world into categories.16 Big data algorithms necessarily engage in statistical discrimination: the sorting of people into groups with others who are statistically similar.17 The danger of this statistical discrimination crossing moral boundaries is always present.