Should We Treat Data as Labor? Moving Beyond “Free”

By IMANOL ARRIETA IBARRA,LEONARD GOFF,DIEGO JIMENEZ´ HERNANDEZ´ ,JARON LANIER, AND E.GLEN WEYL ∗

In the previous paper in this session and in a plus to users (Brynjolffson et al., 2017) and is forthcoming book (Posner and Weyl, 2018), one “free” (at point of use) to users. Despite these of us argues that by creating or strengthening benefits, popular anxiety and backlash is rising. absent markets, we can simultaneously address The most common concern is employment the inequality, stagnation and sociopolitical con- and income distribution. Many fear that ar- flict afflicting developed countries. He calls such tificial intelligence (AI) systems will replace cases “radical markets” because of their trans- human workers. Economists rightly respond formative emancipatory potential. A promising that greater technological disruptions in the example was suggested years earlier by another past, while causing shifts in employment, have of us, who wrote a book (Lanier, 2013) high- largely left labor’s share of income constant or lighting the social problems with the culture of even growing (Autor, 2015). Yet recent secu- “free” online, in which users are neither paid for lar declines in labor’s share (Karabarbounis and their data contributions to digital services nor Neiman, 2014) belie its universal stability. pay directly for the value they receive from these Furthermore, the employment numbers of services. While free data for free services is a leading technology companies give little cause barter, he argued that the lack of targeting of in- for optimism. The market capitalization and centives undermines market principles of eval- value-added of firms like Facebook, Google and uation, skews distribution of financial returns are similar to or greater than a firm from the data economy and stops users from de- like Walmart, yet they employ 1-2 orders of veloping themselves into “first-class digital citi- magnitude fewer workers and our primitive at- zens”. In this paper we explore whether and how tempts to estimate the labor income shares of treating the market for data like a labor market these companies from publicly available statis- could serve as a radical market that is practical tics suggest they are a small fraction of the tradi- in the near term. tional average 60-70%. The “future” such firms represent would validate Piketty (2013)’s fore- I. The High Cost of Free Data boding of high capital shares. Simultaneously, the lack of payment to users The digital economy is perhaps the leading for data may drag on the contributions of AI source of innovation today, delivers massive sur- to productivity growth. Despite the widespread

∗ Arrieta: Department of Management Science and Engi- hype about AI, its contributions to productiv- neering, School of Engineering, Stanford University, Huang ity seem to have been limited thus far (Gordon, Engineering Center, 475 Via Ortega Avenue, Stanford, 2016; Nadella, 2017). A potential explanation CA 94305 ([email protected]). Goff: Department of relates to the role of data. The first genera- Economics,, 1022 International Affairs Building, 420 West 118th Street, New York, NY 10027 tion of AI systems largely failed to achieve their ([email protected]). Jimenez:´ Department of Economics, goals because they relied too heavily on hard- Stanford University, 579 Serra Mall, Stanford, CA 94305 coding by engineers. The new generation of AI ([email protected]). Lanier: Office of the Chief Tech- uses statistical methods called “machine learn- nology Officer, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 ([email protected]) Weyl: Microsoft ing” (ML), which adapt to patterns in examples Research, One Memorial Drive, Cambridge, MA 02142 and Yale of humans performing similar tasks (“big data”). University Department of Economics and Law School (glen- Yet the free data model has made [email protected]). We are grateful to many colleagues for productivity-related data much less acces- comments, but especially to Microsoft business leaders Satya Nadella and Kevin Scott for their encouragement. All errors are sible than consumption-oriented data. Workers our own. who expect to be compensated are the primary 1 2 PAPERS AND PROCEEDINGS MAY 2018 performers of productivity-related tasks and hate speech) or to have declining self-esteem. these often occur within firms unwilling to Thinkers promoting the idea of a “universal ba- surrender their proprietary internal data to sic income (UBI)” have even suggested dignity AI companies for free. More broadly, many based on work is becoming outdated and that AI systems depend on active participation by as AI replaces humans leisure may be a grow- humans to generate relevant data. This ranges ing source of identity (Parijs and Vanderborght, from users granting permission to access data 2017). Whatever the promise of this idea, for naturally created in the course of consumption the medium term treating online experiences as experiences, through users that go out of their purely consumption holds risks for the social way to provide examples of translations or and political fabric of developed countries. feedback on translations generated by AIs as they use these systems, to the sort of active II. Capital or Labor? labeling and analysis tasks currently supplied in digital labor markets such as Amazon’s We contend that the key aspect of the current Mechanical Turk or Mighty AI (Gray and Suri, political economy of data that causes these prob- 2017) and even to the creative content displayed lems is treating data as capital rather than as la- on blogs and video sharing sites. bor. While it might seem that assets either are However, these systems seem inefficient as one or the other, and that treatment is irrelevant, they generally do not reward those with the transitions in the social attitude towards assets greatest expertise and context (usually those across these categories have played important producing the data that others currently label in roles in history. Slavery and to a lesser extent the first place), either reassigning task to those feudalism treated (largely agricultural) work as with little context or coaxing those with context a possession of a master or lord, while liberal to provide feedback for free as part of accessing and labor reform worked to give recognition and online services (as in the case of DuoLingo or its marginal economic product to labor. To un- reCAPTCHA). They appear to be workarounds derstand what we are trying to accomplish, it is to avoid directly paying those best able to sup- useful to contrast several attitudes towards data ply high-quality data rather than efficient pro- at present under the “Data as Capital (DaC)” curement practices. A purely free data economy paradigm to those appropriate in a world where acts as a drag on productivity growth that contin- we see data as labor (DaL); we summarize these ues to lag worldwide (Byrne et al., 2016) despite in Table 1. bold hopes for AI’s potential. DaC treats data as natural exhaust from con- Finally, recent anxiety about employment and sumption to be collected by firms, while DaL the digital economy goes beyond the purely eco- treats them as user possessions that should pri- nomic. On the one hand, increasing numbers marily benefit their owners. DaC channels pay- of workers, especially away from cosmopoli- offs from data to AI companies and platforms tan and high-tech cities, are disillusioned with to encourage entrepreneurship and innovation, and disenfranchised by technological and eco- while DaL channels them to individual users nomic progress. Many believe these feelings to encourage increased quality and quantity of helped stimulate populist movements of the left data. DaC prepares for AI to displace workers and right throughout the developed world. either by supporting UBI or reserving spheres Simultaneously young people spend increas- of work where AI will fail for humans, while ing time on and have developed increasing ex- DaL sees ML as just another production tech- pertise in digital interactions such as social me- nology enhancing labor productivity and creat- dia and video games (Perrin, 2015; Aguiar et al., ing a new class of “data jobs”. DaC encourages 2017). Because such activities are overwhelm- workers to find dignity in leisure or in human ingly framed as consumption rather than produc- interactions outside the digital economy, while tion, these growing online lives are widely seen DaL views data work as a new source of “digital as running contrary to or undermining the dig- dignity”. DaC sees the online social contract as nity provided by work. Many of these young free services in exchange for prevalent surveil- people seem to have become involved with an- lance, while DaL sees the need for large-scale tisocial activities (such as cyberbullying and institutions to check the ability of data platforms VOL. 1 NO. 1 DATA AS LABOR 3

Issue Data as Capital Data as Labor Ownership Corporate Individual Incentives Entrepreneurship “Ordinary” contributions Future of work Universal Basic Income Data work Source of self-esteem Beyond work Digital dignity Social contract Free services for free data Countervailing power to create data labor market

TABLE 1—LEADINGCHARACTERISTICSOFTHE “DATA AS CAPITAL” VERSUS “DATA AS LABOR” PERSPECTIVES.

to exploit monopsony power over data providers with advances in ML that allow estimation of the and ensure a fair and vibrant market for data la- marginal effect of new data on predictions (Koh bor. and Liang, 2017) suggests a promising avenue Describing DaL versus DaC as a binary is ob- for valuing data (and one we are pursuing at Mi- viously too simplistic and extreme. Production crosoft), though there are many conceptual and function for data and the AI systems built on top computational challenges still to be overcome. of it are certainly more continuous: data, cap- Whatever the precise balance, the only “third ital (e.g. computational power), skilled labor way” out of the DaL-DaC spectrum we see is the (e.g. programmers), entrepreneurial talent and failure of AI: if AI proves to be relatively unpro- “land” (e.g. rents on network effects) all mat- ductive or irrelevant, neither DaL nor DaC will ter and these different inputs can likely be sub- much matter. But if AI lives up to even a part stituted reasonably smoothly. The socially op- of its hype, failure to move towards DaL will timal shares of each factor depends on as-yet- leave us trapped in the problems we highlight unmeasured details of production functions and with DaC. data themselves are not purely created by users: they requires firms to track, record and organize III. How Did We Get Here? user behavior. If treating data purely as capital is economi- Yet we doubt the optimal (viz.competitive) cally and socially irrational, how have we ended share of user data contributions is a negligible up in the present equilibrium? As in the nine- fraction of the total value of the digital econ- teenth century labor struggles, the usual cul- omy. While the marginal value of data in es- prits are a combination of prejudice (viz. the timating any finite dimensional quantity even- weight of precedent created by historical acci- tually steeply declines, the power of the latest dents) and privilege (viz. entrenched interests generation of ML has been its ability to tackle that derive rents from the inefficient equilib- increasingly sophisticated tasks as the quality rium). In the present setting, user expectations and quantity of data improve. Many of these of “lightweight” online experiences has con- more sophisticated tasks are impossible to even spired with the monopsony power of the tech- get started on without ample data, as the neu- nology giants (what one of us has called “siren ral networks and other learning algorithms re- servers”) to maintain the status quo. quired cannot learn the right representations of The internet economy largely began with a complex phenomena without many training ex- venture-capital fueled bubble that chased usage amples. This suggests that the returns to data with little sense for a business model. The so- may decline only gradually or there may even be cial movement for “free software” collided with increasing returns to data if more sophisticated a counter-cultural streak in Silicon Valley that tasks are disproportionately more valuable. This declared information wants to be free and built is consistent with the empirically-observed dom- users expectations of digital services being of- inance of the data economy by a few large firms. fered freely. Searching for a way to monetize Luckily, the production function for AI may this activity, Google and then Facebook turned be easier to measure than other production func- to advertising targeted using user data. This ac- tions because the relevant ML algorithms and customed users to surrendering data in exchange their performance at different times and for dif- for free services (Carrascal et al., 2013), expec- ferent data sets are usually well-documented, at tations that have persisted as the value of such least internally to companies. Combining these data to broader AI services has risen. Few users 4 PAPERS AND PROCEEDINGS MAY 2018 are even aware of the productive value of their zon and Apple) mostly follow different busi- data or the role they play in enabling ML. ness models and a productivity-oriented com- Yet historical accidents have not only en- pany like Microsoft might even benefit from trenched expectations and norms, they also have users perceiving themselves more as producers created powerful interests in maintaining the online. These other companies also lag Face- status quo. The largest siren servers, especially book and Google in the data race to train ML Facebook and Google, but also Microsoft and systems. Returning more of the gains to data la- others, benefit from the free or extremely cheap borers might help them compete in creating AI availability to them of data. While the total value systems. Smaller companies or start-ups could created by data might be much larger in a DaL also make a difference, and many (e.g. Meeco) world, users aware of the value of their data have been formed around DaL-related ideas. Yet would likely demand compensation in a range we doubt, given the economies of scale related of settings, dramatically reducing the share of to data in producing AI systems, that a smaller value that could be captured by the siren servers player could succeed without a significant part- as profits. This is just an extreme version of nership with one of the largest technology com- the standard logic of monopsony: while a usual panies. monopsonist just depresses wages, the historical Second, data laborers could organize a “data background we explain above has made it attrac- labor union” that would collectively bargain tive for siren servers to maintain a DaC equilib- with siren servers. While no individual user has rium where users are not even aware of the value much bargaining power, a union that filters plat- their data daily create for siren servers. form access to user data could credibly call a Recent evidence suggests significant monop- powerful strike. Such a union could be an ac- sony power in online task labor markets. Dube et cess gateway, making a strike easy to enforce al. (2018) use randomly varied wages on Ama- and on a social network, where users would be zon Mechanical Turk to find elasticities of the pressured by friends not to break a strike, this labor supply curve facing a task-poster that are might be particularly effective. A union could well below unity. These small task-posters al- also be useful in certifying data quality and guid- most certainly have more elastic residual labor ing users to develop their earning potential. supply than does a siren server, suggesting ex- Finally, governments can play an important treme monopsony power in the latter case: a role in helping facilitate DaL both on the pos- question we have been investigating in on-going itive and negative side. On the positive side, work with Microsoft data. In on-going work us- new regulatory frameworks such as the Euro- ing a large Microsoft program that pays users pean General Data Protection Regulations are in loyalty points for Bing searches, we esti- increasingly shifting ownership rights in data mate even smaller elasticities in the number of to the users that generate them. Data collec- searches performed among active users of the tors increasingly must allow users to understand, program. This reinforces the idea that monop- withdraw and transfer their data across competi- sony may be an important force blocking the po- tors. On the other hand, existing labor laws tential productivity gains from DaL fit poorly with a world where much data la- bor may be done in the course of consumption IV. Sources of Countervailing Power experiences rather than as a dedicated activity. Adapting labor laws to defend workers against The inefficient exploitation of labor by con- monopsony while allowing the flexibility data centrated capital was a constant theme of po- work will require a combination of economic litical economy before the Cold War. Gal- and technical sophistication that we hope labor braith (1952) summarized various solutions to economists can increasingly provide to support this problem as forms of “countervailing power” policy-makers. by large scale social institutions. In the data economy, the first and most nat- V. A Radical Data Market ural balancing factor is competition. While Facebook and Google rely heavily on DaC, Ultimately, we believe all three of these fac- other leading technology companies (e.g. Ama- tors must coordinate for DaL to succeed, just VOL. 1 NO. 1 DATA AS LABOR 5 as in historical labor movements. Whatever the Dube, Aindrajit, Jeff Jacobs, Suresh Naidu, mix, however, building a market for data labor and Siddharth Suri, “Monopsony in On- offers economists an exciting chance to design a line Labor Markets,” 2018. This paper is market on a much broader scale than most work under preparation. Contact Suresh Naidu at on market design in the past (Roth, 2015). For [email protected] for a copy. example, we are currently working to use reg- Frey, Carl Benedikt and Michael A. Osborne, ularized measures of the marginal value of data “The Future of Employment: How Suscep- points to design and make transparent efficient tible are Jobs to Computerisation?,” Techno- payments for data workers. With studies pro- logical Forecasting and Social Change, 2017, jecting that AI might automate as many as 50% 114, 254–280. of jobs in the coming decades (Frey and Os- Galbraith, John Kenneth, American Capital- borne, 2017), data labor has the potential to con- ism, New York: Houghton Mifflin, 1952. stitute a significant fraction of national income. Gordon, Robert J., The Rise and Fall of Amer- At the same time, economists, in their roles as ican Growth: The U.S. Standard of Living advisors to governments and technology compa- since the Civil War, Princ, 2016. nies, are likely to play a central role in defining Gray, Mary L. and Siddharth Suri, “The Hu- the texture of these markets. A radical market mans Working Behind the AI Curtain,” Har- in data labor offers a near-term opportunity for vard Business Review, January 9 2017. economists, in collaboration with the other so- Karabarbounis, Loukas and Brent Neiman, cial and computer scientists they regularly work “The Global Decline of the Labor Share,” with in the technology industry, to bring years of Quarterly Journal of Economics, 2014, 129 research in labor economics and market design (1), 61–103. to bear on a central social problem of our times. Koh, Pang Weh and Percy Liang, “Under- standing Black-Box Predictions Via Influ- REFERENCES ence Functions,” in “Proceedings of Machine Learning Research,” Vol. 70 2017, pp. 1885– Aguiar, Mark, Mark Bils, Kerwin Kofi 1894. Charles, and Erik Hurst, “Leisure Luxuries Lanier, Jaron, Who Owns the Future?, New and the Labor Supply of Young Men,” 2017. York: Simon & Schuster, 2013. http://www.nber.org/papers/w23552. Nadella, Satya, Hit Refresh: The Quest to Re- Autor, David H., “Why Are There Still So discover Microsoft’s Soul and Imagine a Bet- Many Jobs? The History and Future of Work- ter Future for Everyone, New York: Harper place Automation,” Journal of Economic Per- Business, 2017. spectives, 2015, 29 (3), 3–30. Parijs, Philippe Van and Yannick Vander- Brynjolffson, Erik, Felix Eggers, and borght, Basic Income: A Radical Proposal Avinash Gannamaneni, “Using Massive for a Free Society and a Sane Economy, Cam- Online Choice Experiments to Measure bridge, MA: Harvard University Press, 2017. Changes in Well-being,” 2017. Latest version Perrin, Andrew, “Social Media Usage: 2005- available from authors. 2015,” Technical Report, Pew Research Cen- Byrne, David M., John G. Fernald, and Mar- ter 2015. e shall B. Reinsdorf, “Does the United States Piketty, Thomas, Le Capital au XXI Siecle` , have a Productivity Slowdown or a Measure- Paris: Editions´ du Seuil, 2013. ment Problem?,” Brookings Papers on Eco- Posner, Eric A. and E. Glen Weyl, Radical nomic Activity, 2016, (Spring), 109–182. Markets: Uprooting Capitalism and Democ- Carrascal, Juan Pablo, Christopher Riederer, racy for a Just Society, Princeton, NJ: Prince- Vijay Erramilli, Mauro Cherubini, and ton University Press, 2018. Rodrigo de Oliveira, “Your Browsing Be- Roth, Alvin E., Who Gets What – and Why: The havior for a Big Mac: Economics of Per- New Economics of Matching and Market De- sonal Information Online,” in “Proceedings of sign, New York: Houghton Mifflin Harcourt, the 22Nd International Conference on World 2015. Wide Web” WWW ’13 ACM New York, NY, USA 2013, pp. 189–200.