Dspace Cover Page
Total Page:16
File Type:pdf, Size:1020Kb
Human-powered sorts and joins The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Adam Marcus, Eugene Wu, David Karger, Samuel Madden, and Robert Miller. 2011. Human-powered sorts and joins. Proc. VLDB Endow. 5, 1 (September 2011), 13-24. As Published http://www.vldb.org/pvldb/vol5.html Publisher VLDB Endowment Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/73192 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike 3.0 Detailed Terms http://creativecommons.org/licenses/by-nc-sa/3.0/ Human-powered Sorts and Joins Adam Marcus Eugene Wu David Karger Samuel Madden Robert Miller {marcua,sirrice,karger,madden,rcm}@csail.mit.edu ABSTRACT There are several reasons that systems like MTurk are of interest Crowdsourcing marketplaces like Amazon’s Mechanical Turk (MTurk) database researchers. First, MTurk workflow developers often imple- make it possible to task people with small jobs, such as labeling im- ment tasks that involve familiar database operations such as filtering, ages or looking up phone numbers, via a programmatic interface. sorting, and joining datasets. For example, it is common for MTurk MTurk tasks for processing datasets with humans are currently de- workflows to filter datasets to find images or audio on a specific sub- signed with significant reimplementation of common workflows and ject, or rank such data based on workers’ subjective opinion. Pro- ad-hoc selection of parameters such as price to pay per task. We de- grammers currently waste considerable effort re-implementing these scribe how we have integrated crowds into a declarative workflow operations because reusable implementations do not exist. Further- engine called Qurk to reduce the burden on workflow designers. In more, existing database implementations of these operators cannot this paper, we focus on how to use humans to compare items for sort- be reused, because they are not designed to execute and optimize ing and joining data, two of the most common operations in DBMSs. over crowd workers. We describe our basic query interface and the user interface of the A second opportunity for database researchers is in query opti- tasks we post to MTurk. We also propose a number of optimiza- mization. Human workers periodically introduce mistakes, require tions, including task batching, replacing pairwise comparisons with compensation or incentives for work, and take longer than traditional numerical ratings, and pre-filtering tables before joining them, which silicon-based operators. Currently, workflow designers perform ad- dramatically reduce the overall cost of running sorts and joins on the hoc parameter tuning when deciding how many assignments of each crowd. In an experiment joining two sets of images, we reduce the HIT to post in order to increase answer confidence, how much to pay overall cost from $67 in a naive implementation to about $3, without for each task, and how to combine several human-powered operators substantially affecting accuracy or latency. In an end-to-end experi- (e.g., multiple filters) together into a single HIT. These parameters ment, we reduced cost by a factor of 14.5. are amenable to cost-based optimization, and introduce an exciting new landscape for query optimization and execution research. To address these opportunities, we have built Qurk [11], a declar- 1. INTRODUCTION ative query processing system designed to run queries over a crowd Crowd-sourced marketplaces, like Amazon’s Mechanical Turk (MTurk), of workers, with crowd-based filter, join, and sort operators that op- make it possible to recruit large numbers of people to complete small timize for some of the parameters described above. Qurk’s executor tasks that are difficult for computers to do, such as transcribing an au- can choose the best implementation or user interface for different op- dio snippet or finding a person’s phone number on the Internet. Em- erators depending on the type of question or properties of the data. ployers submit jobs (Human Intelligence Tasks, or HITs in MTurk The executor combines human computation and traditional relational parlance) as HTML forms requesting some information or input from processing (e.g., filtering images by date before presenting them to workers. Workers (called Turkers on MTurk) perform the tasks, in- the crowd). Qurk’s declarative interface enables platform indepen- put their answers, and receive a small payment (specified by the em- dence with respect to the crowd providing work. Finally, Qurk au- ployer) in return (typically 1–5 cents). tomatically translates queries into HITs and collects the answers in These marketplaces are increasingly widely used. Crowdflower, a tabular form as they are completed by workers. startup company that builds tools to help companies use MTurk and Several other groups, including Berkeley [5] and Stanford [13] other crowdsourcing platforms now claims to more than 1 million have also proposed crowd-oriented database systems motivated by tasks per day to more than 1 million workers and has raised $17M+ in the advantages of a declarative approach. These initial proposals, in- venture capital. CastingWords, a transcription service, uses MTurk to cluding our own [11], presented basic system architectures and data automate audio transcription tasks. Novel academic projects include models, and described some of the challenges of building such a a word processor with crowdsourced editors [1] and a mobile phone crowd-sourced database. The proposals, however, did not explore the application that enables crowd workers to identify items in images variety of possible implementations of relational operators as tasks taken by blind users [2]. on a crowd such as MTurk. In this paper, we focus on the implementation of two of the most Permission to make digital or hard copies of all or part of this work for important database operators, joins and sorts, in Qurk. We believe we personal or classroom use is granted without fee provided that copies are are the first to systematically study the implementation of these op- not made or distributed for profit or commercial advantage and that copies erators in a crowdsourced database. The human-powered versions of bear this notice and the full citation on the first page. To copy otherwise, to these operators are important because they appear everywhere. For republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present example, information integration and deduplication can be stated as their results at The 38th International Conference on Very Large Data Bases, a join between two datasets, one with canonical identifiers for enti- August 27th - 31st 2012, Istanbul, Turkey. ties, and the other with alternate identifiers. Human-powered sorts Proceedings of the VLDB Endowment, Vol. 5, No. 0 are widespread as well. Each time a user provides a rating, product Copyright 2011 VLDB Endowment 2150-8097/11/07... $ 10.00. review, or votes on a user-generated content website, they are con- The basic data model is relational, with user-defined scalar and tributing to a human-powered ORDER BY. table functions (UDFs) used to retrieve data from the crowd. Rather Sorts and joins are challenging to implement because there are a than requiring users to implement these UDFs in terms of raw HTML variety of ways they can be implemented as HITs. For example, to forms or low-level declarative code, most UDFs are implemented us- order a list of images, we might ask users to compare groups of im- ing one of several pre-defined Task templates that specify informa- ages. Alternatively, we might ask users for numerical ratings for each tion about how Qurk should present questions to the crowd. image. We would then use the comparisons or scores to compute the To illustrate a simple example, suppose we have a table of celebri- order. The interfaces for sorting are quite different, require a differ- ties, with schema celeb(name text, img url). ent number of total HITs and result in different answers. Similarly, We want the crowd to filter this table and find celebrities that are we explore a variety of ways to issue HITs that compare objects for female. We would write: computing a join, and study answer quality generated by different interfaces on a range of datasets. SELECT c.name Besides describing these implementation alternatives, we also ex- FROM celeb AS c plore optimizations to compute a result in a smaller number of HITs, WHERE isFemale(c) which reduces query cost and sometimes latency. Specifically, we isFemale look at: With defined as follows: • Batching: We can issue HITs that ask users to process a vari- TASK isFemale(field) TYPE Filter: able number of records. Larger batches reduce the number of Prompt: "<table><tr> \ HITs, but may negatively impact answer quality or latency. <td><img src=’%s’></td> \ • Worker agreement: Workers make mistakes, disagree, and at- <td>Is the person in the image a woman?</td> \ tempt to game the marketplace by doing a minimal amount </tr></table>", tuple[field] YesText: "Yes" of work. We evaluate several metrics to compute answer and NoText: "No" worker quality, and inter-worker agreement. Combiner: MajorityVote • Join pre-filtering: There are often preconditions that must be true before two items can be joined. For example, two people Tasks have types, which define the low-level implementation and are different if they have different genders. We introduce a way interface that is generated. For the case of a Filter task, it takes tuples for users to specify such filters, which require a linear pass by as input, and produces tuples that users indicate satisfy the question the crowd over each table being joined, but allow us to avoid a specified in the Prompt field.