Running Experiments on Amazon Mechanical Turk

Judgment and Decision Making, Vol. 5, No. 5, August 2010, pp. XX–XX Running experiments on Amazon Mechanical Turk Gabriele Paolacci∗ Advanced School of Economics, Ca’ Foscari University of Venice Jesse Chandler Woodrow Wilson School of Public and International Affairs, Princeton University Panagiotis G. Ipeirotis Leonard N. Stern School of Business, New York University Abstract Although Mechanical Turk has recently become popular among social scientists as a source of experimental data, doubts may linger about the quality of data provided by subjects recruited from online labor markets. We address these potential concerns by presenting new demographic data about the Mechanical Turk subject population, reviewing the strengths of Mechanical Turk relative to other online and offline methods of recruiting subjects, and comparing the magnitude of effects obtained using Mechanical Turk and traditional subject pools. We further discuss some additional benefits such as the possibility of longitudinal, cross cultural and prescreening designs, and offer some advice on how to best manage a common subject pool. Keywords: experimentation, online research 1 Introduction demonstrate that the population of Mechanical Turk is at least as representative of the U.S. population as tra- Mechanical Turk started in 2005 as a service to “crowd- ditional subject pools. Further, we show that it is shift- source” labor intensive tasks and is now being used as a ing to include more international participants. In Section source of subjects for experimental research (e.g. Eriks- 3, we review the logic underlying concerns with collect- son & Simpson, 2010; Alter et al., in press). How- ing data using Mechanical Turk and present the strengths ever, a combination of unfamiliarity with what online and potentials of Mechanical Turk relative to other online labor markets are (and how to use them), uncertainty and offline methods of recruiting subjects. In Section 4, about the demographic characteristics of their partici- we present the results of a comparative study involving pants and concerns about data quality from this sample classic experiments in judgment and decision-making; may make some researchers wary of using Mechanical we found no differences in the magnitude of effects ob- Turk to collect data. To address these concerns we report tained using Mechanical Turk and using traditional sub- demographic characteristics of Mechanical Turk workers, ject pools. Section 5 concludes by offering some advice highlight some of the unique practical and methodologi- on payment of individual subjects. cal strengths of Mechanical Turk as a source of research subjects and compare classic judgment and decision making effects in this population and more traditional subject 2 Amazon Mechanical Turk populations. The article is organized as follows. In Section 2, 2.1 Mechanical Turk: The service we introduce the main features of Mechanical Turk and Mechanical Turk is a crowdsourcing web service that co- ordinates the supply and the demand of tasks that re- ∗The authors thank Massimo Warglien and contributors to the Ex- perimental Turk blog for useful discussions, and the editor Jonathan quire human intelligence to complete. Mechanical Turk Baron for helpful suggestions. Gabriele Paolacci receives financial sup- is named after an 18th century chess playing “automa- port by Fondazione Coin. Panagiotis G. Ipeirotis is supported by the Na- ton” that was in fact operated by a concealed person. It is tional Science Foundation under Grant No. IIS-0643846 and by a New an online labor market where employees (called workers) York University Research Challenge Fund. Address: Gabriele Paolacci, Advanced School of Economics, Ca’ Foscari University, Cannaregio are recruited by employers (called requesters) for the ex- 873, 30121 Venice, Italy. Email: [email protected]. ecution of tasks (called HITs, acronym for Human Intel- 411 Judgment and Decision Making, Vol. 5, No. 5, August 2010 Running experiments on Mechanical Turk 412 ligence Tasks) in exchange for a wage (called a reward). on average to complete. This resulted in an hourly aver- Both workers and requesters are anonymous although re- age wage of $1.66, which is superior to the median reser- sponses by a unique worker can be linked through an ID vation wage of $1.38/hour (Horton & Chilton, in press). provided by Amazon. Requesters post HITs that are vis- Participants from 66 countries responded. The plurality ible only to workers who meet predefined criteria (e.g., of workers was from the United States (47%), but with a country of residence or accuracy in previously completed significant number of workers from India (34%). We will tasks). When workers access the website, they find a list now present the demographics for American workers. A of tasks sortable according to various criteria, including more detailed breakdown of demographics (including ta- size of the reward and maximum time allotted for the bles and graphs and an analysis for India-based workers) completion. Workers can read brief descriptions and see is presented by Ipeirotis (2010). previews of the tasks before accepting to work on them. Gender and age distribution. Across U.S.-based work- Tasks are typically simple enough to require only a ers, there are significantly more females (64.85%) than few minutes to be completed such as image tagging, au- males (35.15%). The relative overabundance of women dio transcriptions, and survey completion. More com- is consistent with research on subjects recruited through plicated tasks are typically decomposed into series of the Internet (Gosling et al., 2004) and may reflect women smaller tasks including the checking and validation of having greater access to computers (either at home or at other workers’ HITs. Once a worker has completed a work) or gender differences in motivation. Workers who task, the requester who supplied that task can pay him. took part to our survey were 36.0 years old on average Rewards can be as low as $0.01, and rarely exceed $1. (min = 18, max = 81, median = 33) and thus slightly Translated into an hourly wage, the typical worker is will- younger then both the U.S. population as a whole and ing to work for about $1.40 an hour (Horton & Chilton, the population of Internet users. in press). Education and income level. In general, the (self- A requester can reward good work with bonuses and reported) educational level of U.S. workers is higher than punish poor quality work by refusing payment or even the general population. This is partially explained by blocking a worker from completing future tasks. Re- the younger age of Mechanical Turk users but may also questers who fail to provide sufficient justification for re- reflect higher education levels among early adopters of jecting a HIT can be filtered out by a worker, preventing technology. Despite being more educated, Mechanical future exploitation. Some may wonder about who is will- Turk workers report lower income. The shape of the ing to work for so low wages. With the goal of providing distribution roughly matches the income distribution in experimenters with a typology of the recruitable work- the general U.S. population. However, it is noticeable force in Mechanical Turk, we now present the results of that the income level of U.S. workers on Mechanical a demographic survey conducted in February, 2010. Turk is shifted towards lower income levels (U.S. Cen- sus, 2007). For example, while 45% of the U.S. Inter- 2.2 Demographics of Mechanical Turk net population earns below $60K/yr, the corresponding percentage across U.S.-based Mechanical Turk workers It is reasonable to assume that only those in poor coun- is 66.7%. (This finding is consistent with the earlier sur- tries would be willing to work for such low wages. How- veys that compared income levels on Mechanical Turk ever, until recently, Amazon.com was paying cash only workers with income level of the general U.S. population to workers that had a bank account in the U.S., with other (Ipeirotis, 2009). We should note that, despite the dif- workers paid with Amazon.com gift cards. This policy ferences with the general population, on all of these de- discouraged workers from other countries, and past demographic variables, Internet subject populations tend to mographic surveys found that 70–80% of workers were be closer to the U.S. population as a whole than subjects from the United States and that Mechanical Turk workers recruited from traditional university subject pools. were relatively representative of the population of U.S. Motivation. We asked respondents to report their mo- Internet users (Ipeirotis, 2009; Ross et al., 2010). Re- tivations to participate in Mechanical Turk by selecting cently, however, the population dynamics on Mechanical from a set of predefined options. Only 13.8% of the Turk have changed significantly, with a greater proportion U.S.-based workers reported that Mechanical Turk was of Indian subjects in recent experiments (e.g. Eriksson & their primary source of income. However, 61.4% re- Simpson, 2010), suggesting the need for a fresh survey of ported that earning additional money was an important the workers. driver of participation to the website. We should note We collected demographics of 1,000 Mechanical Turk though, that many workers also participate to Mechani- users. The survey was conducted over a period of three cal Turk for non-monetary reasons, such as entertainment weeks in February 2010. Each respondent was paid $0.10 (40.7%) and “killing time” (32.3%). In fact, 69.6% of the for participating in the survey, which required 3 minutes U.S.-based workers reported that they consider Mechani- Judgment and Decision Making, Vol. 5, No. 5, August 2010 Running experiments on Mechanical Turk 413 cal Turk is a fruitful way to spend free time (e.g., instead When designing a HIT, researchers can either use Me- of watching TV), a finding which is consistent with pre- chanical Turk’s rudimentary in-house survey platform, or vious results (Chandler & Kapelner, 2010; Horton, Rand provide a link to an external site for workers to follow.

Running Experiments on Amazon Mechanical Turk

A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk

Amazon Mechanical Turk Developer Guide API Version 2017-01-17 Amazon Mechanical Turk Developer Guide

Dspace Cover Page

Amazon Mechanical Turk Requester UI Guide Amazon Mechanical Turk Requester UI Guide

Privacy Experiences on Amazon Mechanical Turk

Crowdsourcing for Speech: Economic, Legal and Ethical Analysis Gilles Adda, Joseph Mariani, Laurent Besacier, Hadrien Gelas

Using Amazon Mechanical Turk and Other Compensated Crowd Sourcing Sites Gordon B

Employment and Labor Law in the Crowdsourcing Industry

Amazon AWS Certified Solutions Architect

Five Processes in the Platformisation of Cultural Production: Amazon and Its Publishing Ecosystem

Requester Best Practices Guide

PRINCE: Provider-Side Interpretability with Counterfactual Explanations in Recommender Systems Azin Ghazimatin, Oana Balalau, Rishiraj Saha, Gerhard Weikum