Crowdsourced Data Management: Overview and Challenges

Guoliang Li Yudian Zheng Ju Fan Jiannan Wang Reynold Cheng Tsinghua Hong Kong Renmin SFU Hong Kong University University University University Outline

Overview (30min) – Motivation (5min) – Workflow (15min) – Platforms (5min) Part 1 – Difference from Other Tutorials (5min) ◦ Fundamental Techniques (100min) – Quality Control (60min) – Cost Control (20min) – Latency Control (20min) ◦ Crowdsourced Management (40min) – Crowdsourced (20min) Part 2 – Crowdsourced Optimizations (10min) – Crowdsourced Operators (10min) ◦ Challenges (10min)

2 Crowdsourcing:Motivation ◦ A new computation model – Coordinating the crowd ( workers) to do micro-tasks in order to solve computer- hard problems. ◦ Examples – Categorize the products and create product taxonomies from the user’s standpoint. – An example question – Select the product category of Samsung S7 – Phone – TV – Movie 3 Crowdsourcing:Applications

◦ Wikipedia – Collaborative knowledge ◦ reCAPTCHA – Digitalizing newspapers ◦ Foldit – fold the structures of selected proteins ◦ App Testing – Test apps

4 Crowdsourcing: Popular Tasks o Sentiment Analysis – Understand conversation: positive/negative o Search Relevance – Return relevant results on the first search o Content Moderation – Keep the best, lose the worst o Data Collection – Verify and enrich your business data o Data Categorization – Organize your data o Transcription – Turn images and audio into useful data 5 Crowdsourcing Space Granularity

Macro

19 19 ExamplesExamples

! ESP! ESPGame(Luis Game(Luis von Ahn) von Ahn) ! Object:! Object: Images Images Labeling Labeling ! Human! Human task: onlinetask: online game, game, two players two players guessingguessing one common oneESP common itemGame item reCAPTCHA Micro

Money Entertainment Hidden Volunteer Incentive 6 Crowdsourcing Category ◦ Game vs Payment – Simple tasks • Both payment and game can achieve high quality – Complex tasks • Game has better quality = 0.93, SD = 0.10) have a significantly T(111) = 3.16, p = 0.008, d = 0.44 higher average response quality than paid contributors (M = 0.86, SD =0.21). For the image- annotation task, the difference in means between paid con- Qualitytributors (Mis = 0.98, SD = 0.12) and players (M = 0.96, SD simple = 0.12) is not significant T(85) 0.42, p < 0.68, d = 0.534. ratherFor the complex annotation task on the other hand players important!have a significantly higher response quality (M = 0.92, SD = 0.09) than paid contributors (M = 0.78, SD = 0.21) T(52) complex = 1.21, p < 0.001, d = 0.534. This is an increase of almost 18% in response quality.

7 Discussion Figure 9: The 95% confidence intervals of the means for each of the four groups. The intervals are calculated from This study sheds light on the question of how the quality of 100,000 bootstrap samples (Efron and Tibshirani 1986). crowdsourcing with games compares to paid crowdsourc- ing. Based on our experiment with two different tasks of sample size (N=50) and our substantial effect sizes (Co- varying complexity and controlled populations, we found hens's d > 0.6 on average) to be sufficient to meet ANO- games to generate higher quality data for complex tasks. VA's normality criterion. The quality increase in our case was almost 18% for the We conducted a Levene’s test for homogeneity and did complex web-fragment annotation task. We did not find not find a significant deviation of our data from the equal such a significant difference in the average response quali- variance assumption F(3,196) = 1.87, p = 0.13. Conse- ty for the less complex image-annotation task. quently, we use a series of Welch two sample t-tests as our Given these results, we can answer our initial research post-hoc tests and apply Holm’s method (Holm 1979) to question RQ1. For our complex task player have a signifi- account for multiple comparisons. The Welch test uses an cantly higher response quality. It is also possible to re- estimate of degrees of freedom (Df) that can be much low- spond to RQ2 as there is a significant interaction between er than the actual sample size. We report Df in integer pre- task complexity and incentive structure. cision. A possible explanation of this interaction is that players Df SS MS F p sig. are more selective than paid contributors are. Player chose (I)ncentive 1 0.22 0.22 8.45 0.003 ** our games for their entertainment if the game and the un- (T)ask 1 1.53 1.53 60.09 0.000 *** derlying task does not appeal to them they will not chose to IxT 1 0.27 0.27 10.70 0.001 ** play the game or quit to play soon. In contrast, paid con- Residuals 197 4.01 0.03 tributors are more interested in payment. As long as the job Table 2: ANOVA results of main and interaction effects be- pays the bills, it is not as important if you like it. tween the factors incentive structure (game and payment) An indication for a higher selectiveness in players is the and task complexity (web-fragment and image annotation). number of players for both games. With 837 players in ten Higher Complexity lower Response Quality As seen in Table 2 we found that the presumed complexity difference of both tasks had a significant impact on the re- sponse quality. This seem to be obvious but it illustrates that our initial assumption that our tasks differ in com- plexity is in fact true. This finding is in-line with results from a survey on Crowdflower. Upon completion of a task, Crowdflower asks contributors to take a survey. One ques- tion in this survey regards the ease of job rated on a scale from 0 (hard) to 5 (easy). The less complex image annota- tion task received an average score of 4.5 (N=43) the more complex web-annotation task a score of 3.3 (N=51).

Players have a higher Response Quality Figure 10: Comparison between the incentives game and The incentive structure also has a significant impact on re- payment in terms of response quality with both annotation sponse quality as seen in Table 2 and Figure 10. Players (M tasks combined.

107 Crowdsourcing:Workflow

◦ Requester – Submit Tasks Submit tasks Collect answers

◦ Platforms Publish – Task Management tasks Platforms

Find interested tasks Return answers ◦ Workers – Worker on Tasks

8 Crowdsourcing Requester:Workflow

◦ Design Tasks • Task Type • Design Strategies – UI, API, Coding ◦ Upload Data ◦ Set Tasks • Price • Time • Quality ◦ Publish Task • Pay • Monitor 9 Crowdsourcing Requester:Task Type

◦ Task Type

Please choose the brand of the phone What are comment features? Apple Same band Samsung Same color Blackberry Similar price Other Same size

Please submit a picture of a Please fill the attributes of the product phone with the same size as the left one. Brand Price Size Camera

Submit 10 Crowdsourcing Requester: Task Design

◦ UI Choose the best category for the image Kitchen Bath Living Bed ◦ API

◦ Coding (Your own Server) innerhtml

11 Crowdsourcing Requester: Task Setting

◦ HIT – A group of micro-tasks (e.g., 5) ◦ Price, Assignment, Time

12 Crowdsourcing Requester: Task Setting

◦ Quality Control – Qualification test - Quiz Create some test questions to enable a quiz that workers must pass to work on your task.

– Hidden test - Training Add some questions with ground truths in your task so workers who get them wrong will be eliminated.

– Worker selection Ensure high-quality results by eliminating workers who repeatedly fail test questions in your task

13 Crowdsourcing Requester: Publish ◦ Prepay cost for workers + cost for platform +cost for test

◦ Monitor

14 Crowdsourcing: Workers

◦ Task Selection ◦ Task Completion ◦ Workers are not free Cost - Make Money ◦ Workers are not oracle Quality - Make errors - Malicious workers ◦ Workers are dynamic Latency - Hard to predict

15 Crowdsourcing:Platforms ◦ (AMT)

¨ Requesters ¨ HIT (k tasks) ¨ Workers

more than 500,000 workers from 190 countries

16 Crowdsourcing:Platforms ◦ CrowdFlower

¨ Requesters ¨ HIT (k tasks) ¨ Workers

17 AMT vs CrowdFlower

AMT CrowdFlower

Task Design: UI √ √

Task Design: API √ √ × Task Design: Coding √

Quality: Qualification Test √ √ × Quality: Hidden Test √

Quality: Worker Selection √ √ Task Types All Types All Types

18 AMT Task Statistics

http://www.mturk-tracker.com 19 Other Crowdsourcing Platforms

◦ Macrotask – Upwork • https://www.upwork.com – Zhubajie • http://www.zbj.com ◦ Microtask – ChinaCrowds (cover all features of AMT and CrowdFlower) • http://www.chinacrowds.com

iOS Android 20 Crowdsourcing:Challenges ◦ Crowd is not free ◦ Reduce monetary cost Cost

Crowdsourcing

Latency Quality

◦ Crowd is not real-time ◦ Crowd may return incorrect answers ◦ Reduce time ◦ Improve quality 21 Crowdsourced Data Management

◦ A crowd-powered database ,$-./0-"$1)23 9#:"#0(#$ VLDB’12 [4], SIGMOD’17 [9]). VLDB’16 [1] investigates human system ,-"./ !")-#%) factors involved in task assignment and completion. VLDB’15 [7] focuses on truth inference in quality control. ICDE’15 [2] reviews – Users require to write code =!>?6)@# ,$-./0-"$1)23 !"#$% >723"73# some crowdsourcing operators, crowdsourced data mining and so- cial applications. VLDB’12 [4] introduces crowdsourcing plat- to utilize crowdsourcing !"#$% &'()*)+#$ !"#$%&'() forms and discusses general design principles for crowdsourced !"#$%&'(')* !"#$%+#,- data management. SIGMOD’17 [9] focuses on quality, cost, and platforms !"#$%+#,- !"#$%$&' ()'*+ !"#$%&'(')* latency control for crowdsourced data management. Compared – Encapsulates the *%$%&)%&+) ./ .0 ./ .0 with these tutorials, we focus on the fundamental techniques for building a practical crowdsourced database system. Moreover, we complexities of interacting ,$-./0-"$1)23 &'#$7(-$0 will systemically review crowdsourcing operators and optimization techniques, which are proposed in recent five years. with the crowd 0.'12*"#"+% 0.'129'&( 0.'12*'.% – Make DB more powerful 0.'123':; 0.'12<$= 0.'12<&( 2. SYSTEM DESIGN OVERVIEW Several crowdsourcing database systems [6, 14, 12, 5] are re- ◦ Crowd-powered interface 0.'120'-(% 0.'120'##"+% 0.'12>&## cently proposed to encapsulate the complexities of leveraging the crowd for query processing. In this part, we introduce an overview ◦ Crowd-powered Operators ,$-./0-"$1)23 ;<#1"(-$ of the design of these systems. Data model. Existing crowdsourcing database systems are built 3.-%4 3$)7 8()1". 3$)7 B$%"(+/ • ◦ Crowdsourcing Optimization 5(6"."(+" 8))&?(@"(% !"$)'(&(? A")&?( !"2-+%&'( on top of the traditional relational data model, where data is spec- ified as a schema that consists of relations and each relation has 3$)7) 8()1".) a set of attributes. However, the difference is that crowdsourc- ,$-./0-"$1)234 567(8-$* ing database systems employ an open-world assumption that either some attributes of a tuple or even an entire tuple can be crowd- Figure 1: Architecture of Crowdsourcing DB Systems.22 sourced on demand based on queries from the requester. Query language. Most of the crowdsourcing query languages appropriate tasks to workers to make full use of workers’ unique follow• the standard SQL syntax and semantics, and extend SQL by Answer Reasoning. talents. (4) We can model the tasks and deduce adding features that support crowdsourced operations. the answers of tasks based on those of other tasks. For example, Data Define Language (DDL) is used to define how a system in- in crowdsourced join, if we get crowd’s answers that US = United vokes crowdsourcing for query processing. CrowdDB [6] intro- States and US = America, we can deduce that United States = duces a keyword CROWD in a CREATE TABLE clause to define Latency Reduction. America. (5) We can model workers’ behavior which attributes or relation tuples can be crowdsourced. Deco [14] and design effective model to reduce the latency. defines conceptual schema that partitions attributes into anchor at- Crowdsourcing Systems & Operators. Using the aforementioned tributes and dependent attribute-groups and specifies fetch and res- techniques, recent efforts have been made to develop crowdsourc- olution rules. Qurk [12] employs user defined functions (UDFs) ing database systems, such as CrowdDB [6], Qurk [12], Deco [14], to define crowd-based expressions, which can be easily integrated and CrowdOP [5]. For achieving high crowdsourcing query pro- with SQL. CrowdOp [5] introduces keyword ByPass to define which cessing performance, the systems focus on optimizing cost (cheap), attributes are not needed for crowdsourcing. latency (fast) and quality (good). Moreover, there are also tech- Data Manipulation Language (DML) is used to express crowd- niques that focus on designing individual crowdsourced operators, sourcing requirements. Typically, the existing systems support two k including selection [13, 16, 20], join [18, 19], top- /sort [3, 11], types of manipulation semantics. (1) Crowd-Powered Collection aggregation [10, 8], and collect [17, 15], solicits the crowd to fill missing attributes of existing tuples or col- Tutorial Structure. The 3 hours’ tutorial is split into 2 sections. lect more tuples; (2) Crowd-Powered Query asks the crowd to per- • In the first section (1.5 hours), we first give an overview of crowd- form data processing operations, such as selection, join, sort, etc., sourcing (20 min), including motivation of crowdsourcing, basic on the underlying data. Moreover, due to the open-world nature of concepts (e.g., workers, requesters), crowdsourcing platforms, crowd- crowdsourcing, some systems [14, 5] also define constraints on the sourcing workflow, and crowdsourcing applications. Then we talk number of returned results or the cost budget for crowdsourcing. about an overview of crowdsourcing database systems (20 min, see Architecture. The architecture of a typical crowdsourcing database Section 2), and fundamental techniques in designing crowdsourced system• is illustrated in Figure 1. A SQL-like query is issued by a operators (see Section 3.1), including task design (10 min), truth in- crowdsourcing requester and is first processed by a QUERY OPTI- ference (10 min), task assignment (10 min), answer reasoning (10 MIZER. Like traditional databases, the QUERY OPTIMIZER parses min), and latency reduction (10 min). the query into a tree-structure query plan, and then applies opti- In the second section (1.5 hours), we first discuss different crowd- mization strategies to produce an optimized query plan. sourced operators (60 min), e.g., selection, join, topk, sort, max/min, However, the key difference is that the tree nodes in a query plan count, collect, fill (see Section 3.2). Finally we provide emerging are crowd-powered operators. Typically, a crowd-powered opera- challenges (15 min) in Section 4. We leave 15 min for Q&A to tor abstracts a specific type of operation that can be processed by interact with the tutorial audience. the crowd. Recent years have witnessed many studies on devel- Tutorial Audience. The intended audience include all VLDB at- • oping crowd-powered operators, such as crowd-powered selection tendees from both research and industry communities. We will not (CrowdSelect) [13, 16, 20], join (CrowdJoin) [18, 19], collect require any prior background knowledge and a basic understanding (CrowdCollect and CrowdFill) [17, 15], top-k/sort (CrowdSort of database (e.g., selection, join) will be helpful. and CrowdTopK) [3, 11], and aggregation (CrowdCount, Crowd- Differences from Existing Tutorials. There are existing crowd- Max and CrowdMin) [10, 8]. Figure 2 shows how operators are sourcing• tutorials (e.g., in VLDB’16 [1], VLDB’15 [7], ICDE’15 [2], supported by the existing systems.

2 Tutorial Outline

◦ Fundamental Optimization Requester 2. CROWDSOURCING OVERVIEW job result Crowdsourcing Workflow. Suppose a requester (e.g., Microsoft Crowdsourced Data Management product team) has a set of computer-hard tasks (e.g., entity resolu- tion tasks that find the objects referring to the same entity). As tra- – Quality Control Crowdsourced Optimization ditional entity-resolution algorithms are still far from perfect [93], Crowdsourced Operator the requester wants to utilize crowdsourcing to achieve high qual- Selection Collection Join Topk / Sort Aggregation Categorize – Cost Control ity. To this end, the requester first designs the tasks (e.g., a task Skyline Planning Schema Matching Mining Spatial !!!" for every pair of objects that asks workers to check whether the two objects refer to the same entity) and sets the price of each task. Quality Control Cost Control Latency Control – Latency Control Then the requester publishes their tasks on a crowdsourcing plat- Worker Modeling Pruning Task Pricing Task Selection form e.g., AMT [1]. Workers who are willing to perform such tasks Latency Modeling Worker Elimination Answer Deduction accept the tasks, answer them and submit the answers back to the ◦ Crowd-powered Database Round Model Answer Aggregation Sampling platform. The platform collects the answers and reports them to Task Assignment Miscellaneous Statistical Model the requester. If a worker has accomplished a task, the requester ◦ Crowd-powered Operators who publishes the task can approve or disapprove the worker’s an- Task Design swers. The approved workers will get paid from the requester. As Task Type: Single Choice; Multiple Choice; Fill-in-blank; Collection the crowd has contextual knowledge and cognitive ability, crowd- – Selection/Join/Group Task Setting: Pricing; Timing; Quality sourced entity resolution can improve the quality [93, 95, 96]. Applications. There are many successful applications that utilize Crowdsourcing Platform crowdsourcing to solve computer-hard tasks. For example, Von – Topk/Sort Requester Workers Ahn et al. digitized newspapers for The New York Times by getting Collect Answer Answer Task Internet users to transcribe words from scanned texts [92]. Their Monitor Task Select Task method achieved accuracy exceeding 99% and has transcribed over – Collection/Fill Publish Task Browse Task 440 million words. As another example, Eiben et al. utilized a Figure 1: Overview of Crowdsourced Data Management. game-driven crowdsourcing method to enhance a computationally ◦ Challenges designed enzyme [25]. Crowdsourcing can also benefit data man- agement applications, such as data cleaning [94, 76], data integra- about three fundamental techniques in crowdsourcing (1 hour), i.e., tion [47, 100, 60], and knowledge construction [8, 11]. quality control, cost control and latency control (see Section 3). Task Design. There are several important task types that are widely Specifically, we summarize the factors considered in each23 case and used in real-world crowdsourcing platforms. (1) Single-Choice discuss the pros and cons of different factors in the techniques. Task. Workers select a single answer from multiple options. For In the second section (1.5 hours), we first discuss optimization example, in sentiment analysis, given a review, it asks workers to techniques in crowdsourced data management by balancing the trade- assess the sentiment of the review (options: Positive, Neutral, Neg- offs among the three fundamental factors (20 min) and then in- ative). (2) Multiple-Choice Task. Workers select multiple answers troduce crowdsourced database systems (20 min), e.g., CrowdDB from multiple options. For example, given a picture, workers se- [33], Deco [74], Qurk [68], including their design, query plan gen- lect the objects that appear in the picture (Options: Monkey, Tree, erations and optimizations (see Section 4). Then we discuss how Banana, Beach, Sun). (3) Fill-in-blank Task. Workers need to fill- existing studies address crowdsourced operators (20 min), e.g., se- in-blank for an object. For example, given a professor, workers lection, collection, join, topk, sort, categorize, aggregation, skyline, are asked to fill the university of the professor. (4) Collection Task. planning, schema matching, mining and spatial crowdsourcing, in Workers need to collect information, e.g., collecting 100 US univer- order to optimize cost, quality or latency (see Section 4.2). Finally sities. Single/multiple choice tasks are closed-world tasks and the we provide emerging challenges (15 min) in Section 5. We leave workers only need to select from given options, while fill/collection 15 min for Q&A to interact with the tutorial audience. tasks are open-world tasks and the workers can provide any results. Tutorial Audience. The intended audience include all SIGMOD Task Setting. The requester also needs to determine some task attendees from both research and industry communities. We will settings based on his/her requirements. (1) Pricing. The requester not require any prior background knowledge and a basic under- needs to price each task, usually varying from a few cents to sev- standing of database (e.g., selection, join) will be helpful. eral dollars. Note that pricing is a complex game-theoretic prob- Differences from Existing Tutorials. Although there are exist- lem. Usually, high prices can attract more workers, thereby reduc- ing crowdsourcing tutorials (e.g., in VLDB’16 [7], VLDB’15 [34], ing the latency; but paying more does not always improve answer ICDE’15 [18], VLDB’12 [24]), most of them focus on a specific quality [31]. (2) Timing. The requester can set time constraints for part of crowdsourcing. VLDB’16 [7] investigates human factors a task. For each task, the requester can set the time bound (e.g., involved in task assignment and completion. VLDB’15 [34] fo- 10 minutes) to answer it, and the worker must answer it within cuses on truth inference in quality control. ICDE’15 [18] reviews this time bound. (3) Quality Control. The requester can select individual crowdsourcing operators, crowdsourced data mining and the quality-control techniques provided by the crowdsourcing plat- social applications. VLDB’12 [24] introduces crowdsourcing plat- form, or design his/her own methods (see Section 3). forms and discusses design principles for crowdsourced data man- Crowdsourcing Platforms. We also introduce existing crowd- agement. Compared with these tutorials, we will summarize an sourcing platforms, e.g., AMT [1], CrowdFlower [2], ChinaCrowd [4], overview of a wide spectrum of work on crowdsourced data man- and discuss their features, differences, and functionalities (includ- agement, with a special focus on the fundamental techniques for ing whether the requester can control task assignment, whether the controlling quality, cost and latency. We will also introduce crowd- requester can select specific workers). sourcing systems and operators, including the works published very 3. FUNDAMENTAL TECHNIQUES recently [27, 37, 40, 41, 50, 65, 71, 89, 90, 96, 108, 110, 111, 112, We review three fundamental techniques, i.e., quality control, 113], which can give the audience an update of the crowdsourcing cost control and latency control, and discuss their trade-offs. techniques. Moreover, we will also provide emerging challenges. Existing Works

16

14

12

10 SIGMOD VLDB 8 ICDE 6 SUM 4

2

0 2011 2012 2013 2014 2015 2016 2017

24 Existing Works

Paper Number 35 30 25 20 15 10 5 0 quality cost latency system operation mining spatial

25 Differences with Existing Tutorials

◦ VLDB’16 – Human factors involved in task assignment and completion. ◦ VLDB’15 – Truth inference in quality control ◦ ICDE’15 – Individual crowdsourcing operators, crowdsourced data mining and social applications ◦ VLDB’12 – Crowdsourcing platforms and Design principles ◦ Our Tutorial – Control quality, cost and latency – Design Crowdsourced Database 26 Outline

◦ Crowdsourcing Overview (30min) – Motivation (5min) – Workflow (15min) – Platforms (5min) Part 1 – Difference from Other Tutorials (5min) ◦ Fundamental Techniques (100min) – Quality Control (60min) – Cost Control (20min) – Latency Control (20min) ◦ Crowdsourced Database Management (40min) – Crowdsourced Databases (20min) Part 2 – Crowdsourced Optimizations (10min) – Crowdsourced Operators (10min) ◦ Challenges (10min)

27 Why Quality Control? ◦ Huge Amount of Crowdsourced Data

Statistics in AMT: Over 500K workers Over 1M tasks

◦ Inevitable noise & error

◦ Goal: Obtain reliable information in Crowdsourced Data 28 Crowdsourcing Workflow

◦ Requester deploys tasks and budget on crowdsourcing platform (e.g., AMT) ◦ Workers interact with platform (2 phases) (1) when a worker comes to the platform, the worker will be assigned to a set of tasks (task assignment); (2) when a worker accomplishes tasks, the platform will collect answers from the worker (truth inference).

29 Outline of Quality Control ◦ Part I. Truth Inference – Problem Definition – Condition 1: with ground truth • Qualification Test & Hidden Test – Condition 2: without ground truth • Unified Framework • Differences in Existing Works • Experimental Results

◦ Part II. Task Assignment – Problem Definition – Differences in Existing Works

30 Part I. Truth Inference ◦ An Example Task What is the current affiliation for Michael Franklin ?

A. University of California, Berkeley B. University of Chicago

I support A. UCB !

31 Principle: Redundancy

◦ Collect Answers from Multiple Workers What is the current affiliation for Michael Franklin ?

A. University of California, Berkeley B. University of Chicago

I think I choose I support I vote B ! B ! A ! B!

How to infer the truth of the task ? 32 Outline of Quality Control ◦ Part I. Truth Inference – Problem Definition – Condition 1: with ground truth • Qualification Test & Hidden Test – Condition 2: without ground truth • Unified Framework • Differences in Existing Works • Experimental Results

◦ Part II. Task Assignment – Problem Definition – Differences in Existing Works

33 Truth Inference Definition Given different tasks’ answers collected from workers, the target is to infer the truth of each task.

tasks answers workers tasks

Truth?

Truth Inference Truth?

Truth?

34 A Simple Solution ◦ Majority Voting Take the answer that is voted by the majority (or most) of workers.

◦ Limitation Treat each worker equally, neglecting the diverse quality for each worker.

Random Expert Good at Spammer Search Answer

35 The Key to Truth Inference

◦ The key is to know each worker’s quality

Suppose quality of 4 workers are known

36 How to know worker’s quality ?

◦ 1. If a small set of tasks with ground truth are known in advance (e.g., refer to experts)

We can estimate each worker’s quality based on the answering performance for the tasks with known truth

◦ 2. If no ground truth is known in advance

The only way is to estimate each worker’s quality based on the collected answers from all workers for all tasks 37 Outline

◦ Part I. Truth Inference – Problem Definition – Condition 1: with ground truth • Qualification Test & Hidden Test – Condition 2: without ground truth • Unified Framework • Existing Works • Experimental Results

◦ Part II. Task Assignment – Problem Definition – Differences in Existing Works

38 1. A Small Set of Ground Truth is Known

◦ Qualification Test (like an “exam”)

Assign the tasks (with known truth) to the worker when the worker comes at first time e.g., if the worker answers 8 over 10 tasks correctly, then the quality is 0.8

◦ Hidden Test (like a “landmine”) Embed the tasks (with known truth) in all the tasks assigned to the worker e.g., each time 10 tasks are assigned to a worker, then 10 tasks compose of 9 real tasks (with unknown truth), and 1 task with known truth 39 1. A Small Set of Ground Truth is Known

◦ Limitations of two approaches

(1) need to know ground truth (may refer to experts);

(2) waste of money because workers need to answer these “extra” tasks;

(3) as reported (Zheng et al. VLDB’17), these techniques may not improve much quality.

Thus the assumption of “no ground truth is known” is widely adopted by existing works

40 Outline

◦ Part I. Truth Inference – Problem Definition – Condition 1: with ground truth • Qualification Test & Hidden Test – Condition 2: without ground truth • Unified Framework • Existing Works • Experimental Results

◦ Part II. Task Assignment – Problem Definition – Differences in Existing Works

41 2. If No Ground Truth is Known

◦ How to know each worker’s quality given the collected answers for all tasks ? Answers: Current B affiliation ?

B A. UCB B. Chicago B

A Current affiliation ?

A A. B. Recruit.ai

B 42 Unified Framework in Existing Works

◦ Input: Workers’ answers for all tasks

◦ Algorithm Framework: Initialize Quality for each worker while (not converged) { Quality for each worker Truth for each task ; Truth for each task Quality for each worker ; }

◦ Output: Quality for each worker and Truth for each task

43 Inherent Relationship 1

◦ 1. Quality for each worker Truth for each task Quality: Truth:

Current affiliation ? B 1.0 A. UCB (1.0 from B worker 3) B. Chicago (1.0 + 1.0 B from workers 1 & 2)

1.0 A Current affiliation ? A. Google(1.0 from A worker 2) B. Recruit.ai(1.0 + 1.0 from workers 1 & 3) 1.0 B 44 Inherent Relationship 2

◦ 2. Truth for each task Quality for each worker Truth: Quality:

Current B affiliation ? 1.0 B A. UCB correct: 2/2 B. Chicago B A Current 0.5 affiliation ? correct: 1/2 A. Google A B. Recruit.ai 0.5 B

correct: 1/2 45 Outline

◦ Part I. Truth Inference – Problem Definition – Condition 1: with ground truth • Qualification Test & Hidden Test – Condition 2: without ground truth • Unified Framework • Existing Works • Experimental Results

◦ Part II. Task Assignment – Problem Definition – Differences in Existing Works

46 Existing works ◦ Classic Method D&S [Dawid and Skene. JRSS 1979] ◦ Recent Methods (1) Database Community: CATD [Li et al. VLDB14], PM [Li et al. SIGMOD14], iCrowd [Fan et al. SIGMOD15], DOCS [Zheng et al. VLDB17] (2) Data Mining Community: ZC [Demartini et al. WWW12], Multi [Welinder et al. NIPS 2010], CBCC [Venanzi et al. WWW14] (3) Community: GLAD [Whitehill et al. NIPS09], Minimax [Zhou et al. NIPS12], BCC [Kim et al. AISTATS12], LFC [Raykar et al. JLMR10], KOS [Karger et al. NIPS11], VI-BP [Liu et al. NIPS12], VI-MF [Liu et al. NIPS12], LFC_N [Raykar et al. JLMR10] 47 Differences in Existing works

Tasks ◦ Different Task Types What type of tasks they focus on ? E.g., single-label tasks …

◦ Different Task Models How they model each task ? E.g., task difficulty …

Workers

◦ Different Worker Models How they model each worker ? E.g., worker probability (a value) …

48 Tasks: Different Tasks Types

◦ Decision-Making Tasks (yes/no task) e.g., Demartini et al. WWW12, Is Bill Gates currently Whitehill et al. NIPS09, Kim et the CEO of Microsoft ? al. AISTATS12, Venanzi et al. Yes No WWW14, Raykar et al. JLMR10 ◦ Single-Label Tasks (multiple choices) Identify the sentiment of e.g., Li et al. VLDB14, Li et al. the tweet: …… SIGMOD14, Demartini et al. WWW12, Whitehill et al. Pos Neu Neg NIPS09, Kim et al. AISTATS12 ◦ Numeric Tasks (answer with numeric values)

What is the height for Mount Everest ? e.g., Li et al. VLDB14, Li et al. SIGMOD14 m 49 Tasks: Different Tasks Models

◦ Task Difficulty: a value If a task receives many contradicting (or ambiguous) answers, then it is regarded as a difficult task. e.g., Welinder et al. NIPS 2010, Ma et al. KDD16

◦ Diverse Domains: a vector

Sports Politics Entertainment

Did Michael Jordan win more NBA Sports championships than Kobe Bryant?

Is there a name for the song that FC Sports & Barcelona is known for? Entertainment

50 Tasks: Different Task Models (cont’d) ◦ Diverse Domains (cont’d) To obtain the each task’s model: (1) Use machine learning approaches e.g., LDA [Blei e al. JMLR03], TwitterLDA [Zhao et al. ECIR11]. (2) Use entity linking (map entity to knowledge bases).

Did Michael Jordan win more NBA championships than Kobe Bryant?

51 Workers: Different Worker Models

◦ Worker Probability: a value p ∈[0,1] The probability that the worker answers tasks correctly e.g., a worker answers 8 over 10 tasks correctly, then the worker probability is 0.8. e.g., Demartini et al. WWW12, Whitehill et al. NIPS09 ◦ Confidence Interval: a range [ p − ε, p + ε] is related to the number of tasks answered => the more answers collected, the smaller is. e.g., two workers answer 8 over 10 tasks and 40 over 50 tasks correctly, then the latter worker has a smaller . e.g., Li et al. VLDB14 52 Workers: Different Worker Models (cont’d)

◦ Confusion Matrix: a matrix Capture a worker’s answer for different choices given a specific truth Pos Neu Neg Given that the truth of a Pos task is “Neu”, the Neu probability that the worker answers “Pos” is 0.3. Neg e.g., Kim et al. AISTATS12, Venanzi et al. WWW14 ◦ Bias τ & Variance σ : numerical task Answer follows Gaussian distribution: ans ~ N(t +τ,σ ) e.g., Raykar et al. JLMR10 53 Workers: Different Worker Models (cont’d)

◦ Quality Across Diverse Domains: a vector

Sports Politics Entertainment

How to decide the scope of domains ? Idea: Use domains from Knowledge Bases

e.g., Ma et al. KDD16, Zheng et al. VLDB17 54 Summary of Truth Inference Methods

Method Task Type Task Model Worker Model

Majority Voting Decision-Making Tas k , Single-Choice Tas k No No

Mean / Median Numeric Tas k No No ZC [Demartini et Worker Decision-Making Tas k , Single-Choice Tas k No al. WWW12] Probability GLAD [Whitehill Tas k Worker Decision-Making Tas k , Single-Choice Tas k et al. NIPS09] Difficulty Probability D&S [Dawid and Confusion Skene. JRSS Decision-Making Tas k , Single-Choice Tas k No Matrix 1979] Minimax [Zhou Diverse Decision-Making Tas k , Single-Choice Tas k No et al. NIPS12] Domains BCC [Kim et al. Confusion Decision-Making Tas k , Single-Choice Tas k No AISTATS12] Matrix CBCC [Venanzi Confusion Decision-Making Tas k , Single-Choice Tas k No et al. WWW14] Matrix LFC [Raykar et Confusion Decision-Making Tas k , Single-Choice Tas k No al. JLMR10] Matrix Worker CATD [Li et al. Decision-Making Tas k , Single-Choice Tas k , No Probability, VLDB14] Numeric Tas k Confidence 55 Summary of Truth Inference Methods (cont’d)

Method Task Type Task Model Worker Model

PM [Li et al. Decision-Making Tas k , Single-Choice Worker No SIGMOD14] Tas k , Numeric Tas k Probability Diverse Multi [Welinder Domains, Decision-Making Tas k Diverse Domains et al. NIPS 2010] Worker Bias, Worker Variance KOS [Karger et Worker Decision-Making Tas k No al. NIPS11] Probability VI-BP [Liu et al. Confusion Decision-Making Tas k No NIPS12] Matrix VI-MF [Liu et al. Confusion Decision-Making Tas k No NIPS12] Matrix LFC_N [Raykar Numeric Tas k No Worker Variance et al. JLMR10] iCrowd [Fan et al. Decision-Making Tas k , Single-Choice Diverse Diverse Domains SIGMOD15] Tas k Domains FaitCrowd [Ma et Decision-Making Tas k , Single-Choice Diverse Diverse Domains al. KDD16] Tas k Domains DOCS [Zheng et Decision-Making Tas k , Single-Choice Diverse Diverse Domains al. VLDB17] Tas k Domains 56 Outline

◦ Part I. Truth Inference – Problem Definition – Condition 1: with ground truth • Qualification Test & Hidden Test – Condition 2: without ground truth • Unified Framework • Existing Works • Experimental Results

◦ Part II. Task Assignment – Problem Definition – Differences in Existing Works

57 Experimental Results (Zheng et al. VLDB17)

◦ Statistics of Datasets

# Answers Dataset # Tasks # Workers Description Per Task Sentiment Given a tweet, the Analysis 1000 20 185 worker will identify the [Zheng et al. sentiment of the tweet VLDB17] Given an image, the Duck worker will identify [Welinder et 108 39 39 whether the image al. NIPS10] contains a duck or not Given a pair of products, Product the worker will identify [Wang et al. 8315 3 85 whether or not they refer VLDB12] to the same product

58 Experimental Results ◦ Observations (Sentiment Analysis)

#workers’ answers Not all workers are of conform to long-tail very high quality phenomenon (Li et al. VLDB14) 59 Experimental Results (cont’d)

◦ Change of Quality vs. #Answers (Sentiment Analysis) Observations:

1. The quality increases with #answers;

2. The quality improvement is significant with few answers, and is marginal with more answers;

3. Most methods are similar, except for Majority Voting (in pink color).

60 Experimental Results (cont’d) ◦ Performance on more datasets

Dataset “Duck” Dataset “Product”

61 Which method is the best ?

◦ Decision-Making & Single-Label Tasks – “Majority Voting” if sufficient data is given (each task collects more than 20 answers); – “D&S [Dawid and Skene JRSS 1979]” if limited data is given (a robust method); – “Minimax [Zhou et al. NIPS12]” and “Multi [Welinder et al. NIPS 2010]” as advanced techniques.

◦ Numeric Tasks – “Mean” since it is robust in practice; – “PM [Li et al. SIGMOD14]” as advanced techniques.

62 Take-Away for Truth Inference

◦ The key to truth is to compute each worker’s quality

◦ if some truth is known:

qualification test and hidden test;

◦ if no truth is known:

(1) relationships between “quality for each worker” and “truth for each task”

(2) different task types & models and worker models

63 Crowdsourcing Workflow

◦ Requester deploys tasks and budget on crowdsourcing platform (e.g., Amazon Mechanical Turk) ◦ Workers interact with platform (2 phases) (1) when a worker comes to the platform, the worker will be assigned to a set of tasks (task assignment); (2) when a worker accomplishes tasks, the platform will collect answers from the worker (truth inference).

64 Part II. Task Assignment

◦ Existing platforms support online task assignment

“External HIT”

◦ Intuition: requesters want to wisely use the budgets

I am requester, and I want to use We are workers ! my budgets very well !

How to allocate suitable tasks to workers? 65 Task Assignment Problem

Given a pool of n tasks, which set of the k tasks should be batched in a HIT and assigned to the worker?

Example: HIT Suppose we have n=4 tasks, and each time HIT k=2 tasks are assigned as a HIT.

66 This problem is complex!

◦ Simple enumeration: “n choose k” combinations

(n = 100, k = 5) è 100M assignments

◦ Need efficient (online) assignment

Fast response to worker’s request

◦ Develop efficient heuristics

Assignment time linear in #tasks: O(n)

67 Outline

◦ Part I. Truth Inference – Problem Definition – Condition 1: with ground truth • Qualification Test & Hidden Test – Condition 2: without ground truth • Unified Framework • Existing Works • Experimental Results

◦ Part II. Task Assignment – Problem Definition – Existing Works

68 Main Idea

Most suitable tasks

3 factors for characterizing a suitable task: Answer uncertainty Worker quality Requesters’ objectives

69 Factor 1: Answer Uncertainty

◦ Consider a decision-making task (yes/no)

0 yes 1 yes 3 no 2 no 2 yes 3 yes 1 no 0 no

◦ Select a task whose answers are the most uncertain or inconsistent

e.g., Liu et al. VLDB12, Roim et al. ICDE12

70 Factor 1: Answer Uncertainty

◦ Entropy (Zheng et al. SIGMOD15) Given c choices for!a task and the distribution of answers for a task p = (p1, p2 ,..., pc ) The task’s entropy is: ! c H(p) = − p log p ∑ i=1 i i e.g., a task receives 1 “yes” and 2 “no”, then the distribution is (1/3, 2/3), and entropy is 0.637.

◦ Expected change of entropy (Roim et al. ICDE12) (1/3, 2/3) should be more uncertain than (10/30, 20/30): ! ! E[H(p′)]− H(p) 71 Factor 2: Worker Quality

◦ Assign tasks to the worker with the suitable expertise

Sports Politics Entertaiment

◦ Uncertainty: consider the matching domains in tasks and the worker e.g., Ho et al. AAAI12, Zheng et al. VLDB17

72 Factor 3: Objectives of Requesters

◦ Requesters may have different objectives (aka “evaluation metric”) for different applications

Application Sentiment Analysis Entity Resolution

Task

Evaluation Accuracy F-score (“equal” label) Metric

73 Factor 3: Objectives of Requesters ◦ Solution in QASCA (Zheng et al. SIGMOD15) (1) Leverage the answers collected from workers to create a “distribution matrix”; (2) leverage the “distribution matrix” to estimate the quality improvement for a specific set of selected tasks. ◦ Idea: Select the best set of tasks with highest quality improvement in the specified evaluation metric.

improvement: : 9% : 6%

74 Factor 3: Objectives of Requesters ◦ Other Objectives

(1) Threshold on entropy (e.g., Li et al. WSDM17) e.g., in the final state, each task should have constraint that its entropy ≥ 0.6.

(2) Threshold on worker quality (e.g., Fan et al. SIGMOD15) e.g., in the final state, each task should have overall aggregated worker quality ≥ 2.0.

(3) Maximize total utility (e.g., Ho et al. AAAI12) e.g., after the answer is given, the requester receives some utility related to worker quality, and the goal is to assign tasks that maximize the total utility. 75 Task Assignment Factor 1: Factor 2: Factor 3: Method Answer Uncertainty Worker Quality Requesters’ Objectives OTA [Ho et al. Majority Worker probability Maximize total utility AAAI12] A threshold on confidence CDAS [Liu et Majority Worker probability + early termination of confident al. VLDB12] tasks iCrowd [Fan et Maximize overall worker Majority Diverse domains al. SIGMOD15] quality AskIt! [Roim et Entropy-based No No al. ICDE12] QASCA [Zheng Maximize specified et al. Confusion matrix Maximize specified quality quality SIGMOD15] DOCS [Zheng Expected change of Diverse domains No et al. VLDB17] entropy CrowdPOI [Hu Expected change of Worker probability No et al. ICDE16] accuracy Opt-KG [Li et Majority No ≥ threshold on entropy al. WSDM17]

76 Take-Away for Task Assignment

◦ Require online and efficient heuristics

◦ Key idea: assign the most suitable task to worker, based on:

(1) uncertainty of collected answers; (2) worker quality; and (3) requester’ objectives.

77 Public Datasets & Codes

◦ Public crowdsourcing datasets (http://i.cs.hku.hk/~ydzheng2/crowd_survey/datasets.html).

◦ Implementations of truth inference algorithms (https://github.com/TsinghuaDatabaseGroup/crowdsourcin g/tree/master/truth/src/methods).

◦ Implementations of task assignment algorithms (https://github.com/TsinghuaDatabaseGroup/CrowdOTA).

78 Reference – Truth Inference

[1] ZenCrowd: G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In WWW, pages 469–478, 2012. [2] EM: A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. J.R.Statist.Soc.B, 30(1):1–38, 1977. [3] Most Traditional Work (D&S): A.P.Dawid and A.M.Skene. Maximum likelihood estimation of observererror-rates using em algorithm. Appl.Statist., 28(1):20–28, 1979. [4] iCrowd: J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng. icrowd: An adaptivecrowdsourcing framework. In SIGMOD, pages 1015–1030, 2015. [5] J. Gao, Q. Li, B. Zhao, W. Fan, and J. Han. Truth discovery andcrowdsourcing aggregation: A unified perspective. VLDB, 8(12):2048–2049, 2015 [6] CrowdPOI: H. Hu, Y. Zheng, Z. Bao, G. Li, and J. Feng. Crowdsourced poi labelling:Location-aware result inference and task assignment. In ICDE, 2016. [7] P. Ipeirotis, F. Provost, and J. Wang. Quality management on amazonmechanical turk. In SIGKDD Workshop, pages 64–67, 2010. [8] M. Joglekar, H. Garcia-Molina, and A. G. Parameswaran. Evaluating thecrowd with confidence. In SIGKDD, pages 686–694, 2013. [9] G. Li, J. Wang, Y. Zheng, and M. J. Franklin. Crowdsourced datamanagement: A survey. TKDE, 28(9):2296–2319, 2016. [10] CATD: Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. PVLDB,8(4):425–436, 2014. [11] PM: Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts inheterogeneous data by truth discovery and source reliability estimation. InSIGMOD, pages 1187–1198, 2014. [12] KOS / VI-BP / VI-MF: Q. Liu, J. Peng, and A. T. Ihler. Variational inference for crowdsourcing. In NIPS, pages 701–709, 2012. [13] CDAS: X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. CDAS: Acrowdsourcing data analytics system. PVLDB, 5(10):1040–1051, 2012 79 Reference – Truth Inference (cont’d)

[14] FaitCrowd: F. Ma, Y. Li, Q. Li, M. Qiu, J. Gao, S. Zhi, L. Su, B. Zhao, H. Ji, and J. Han.Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation. In KDD, pages 745–754. ACM, 2015. [15] V. C. Raykar and S. Yu. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research,13:491–518, 2012. [16] V. C. Raykar, S. Yu, L. H. Zhao, A. K. Jerebko, C. Florin, G. H. Valadez,L. Bogoni, and L. Moy. Supervised learning from multiple experts: whom totrust when everyone lies a bit. In ICML, pages 889– 896, 2009. [17] LFC: V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, andL. Moy. Learning from crowds. JMLR, 11(Apr):1297–1322, 2010. [18] Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng. Truth Inference in Crowdsourcing: Is the Problem Solved? VLDB 2017. [19] DOCS: Yudian Zheng, Guoliang Li, Reynold Cheng. DOCS: A Domain-Aware Crowdsourcing System Using Knowledge Bases. VLDB 2017. [20] CBCC: M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi.Community-based bayesian aggregation models for crowdsourcing. In WWW,pages 155–164, 2014. [21] Minimax: D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom ofcrowds by minimax entropy. In NIPS, pages 2195–2203, 2012. [22] P. Smyth, U. M. Fayyad, M. C. Burl, P. Perona, and P. Baldi. Inferring groundtruth from subjective labelling of venus images. In NIPS, pages 1085–1092,1994. [23] Multi: P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In NIPS, pages 2424–2432, 2010. [24] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035–2043, 2009. [25] BCC: H.-C. Kim and Z. Ghahramani. Bayesian classifier combination. In AISTATS, pages 619–627, 2012. [26] Aditya Parameswaran ,Human-Powered Data Management , http://msrvideo.vo.msecnd.net/rmcvideos/185336/dl/185336.pdf 80 Reference – Truth Inference (cont’d)

[27] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993–1022, 2003. [28] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR, pages 338–349, 2011. [29] X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. PVLDB, 6(2):37–48, 2012. [30] X. Liu, X. L. Dong, B. C. Ooi, and D. Srivastava. Online data fusion. PVLDB, 4(11):932–943, 2011. [31] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993–1022, 2003. [32] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR, pages 338–349, 2011.

81 Reference – Task Assignment

[1] CDAS: X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. CDAS: Acrowdsourcing data analytics system. PVLDB, 5(10):1040–1051, 2012 [2] OTA: C.-J. Ho and J. W. Vaughan. Online task assignment in crowdsourcingmarkets. In AAAI, 2012. [3] QASCA: Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng. QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications. SIGMOD 2015. [4] C.-J. Ho, S. Jabbari, and J. W. Vaughan. Adaptive task assignment forcrowdsourced classification. In ICML, pages 534–542, 2013. [5] CrowdPOI: H. Hu, Y. Zheng, Z. Bao, G. Li, and J. Feng. Crowdsourced poi labelling:Location-aware result inference and task assignment. In ICDE, 2016. [6] DOCS: Yudian Zheng, Guoliang Li, Reynold Cheng. DOCS: A Domain-Aware Crowdsourcing System Using Knowledge Bases. VLDB 2017. [7] AskIt: R. Boim, O. Greenshpan, T. Milo, S. Novgorodov, N. Polyzotis, and W. C. Tan. Asking the right questions in crowd data sourcing. In ICDE, 2012. [8] iCrowd: J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng. icrowd: An adaptivecrowdsourcing framework. In SIGMOD, pages 1015–1030, 2015. [9] Opt-KG: Qi Li, Fenglong Ma, Jing Gao, Lu Su, and Christopher J Quinn, Crowdsourcing High Quality Labels with a Tight Budget, WSDM 2016. [10] Jing Gao, Qi Li, Bo Zhao, Wei Fan, and Jiawei Han, Enabling the Discovery of Reliable Information from Passively and Actively Crowdsourced Data, KDD'16 tutorial.

82 Outline

◦ Crowdsourcing Overview (30min) – Motivation (5min) – Workflow (15min) – Platforms (5min) Part 1 – Difference from Other Tutorials (5min) ◦ Fundamental Techniques (100min) – Quality Control (60min) – Cost Control (20min) – Latency Control (20min) ◦ Crowdsourced Database Management (40min) – Crowdsourced Databases (20min) Part 2 – Crowdsourced Optimizations (10min) – Crowdsourced Operators (10min) ◦ Challenges (10min)

83 Cost Control o Goal – How to reduce monetary cost? o Cost = �×� Ѳ�: number of tasks Ѳ�: cost of each task o Challenges ѲHow to reduce �? ѲHow to reduce �?

84 Classification of Existing Techniques o How to reduce �? ѲTask Pruning ѲAnswer Deduction ѲTask Selection ѲSampling o How to reduce �? ѲTask Design

85 Task Pruning o Key Idea – Prune the tasks that machines can do well

o Easy Task vs. Hard Task

Are they the same? Are they the same?

IPHONE 6 = iphone 6 IBM = Big Blue

o How to quantify "difficulty" – Similarity value – Match probability

• Jiannan Wang, Tim Kraska, Michael J. Franklin, Jianhua Feng: CrowdER: Crowdsourcing Entity Resolution. VLDB 2012 • Steven Euijong Whang, Peter Lofgren, Hector Garcia-Molina: Question Selection for Crowd Entity Resolution. VLDB 201386 Task Pruning (cont’d) o Workflow (non-iterative) 1. Rank tasks based on "difficulty" 2. Prune the tasks whose difficulty ≤ threshold o Pros – Support a large variety of applications o Cons – Only work for easy tasks (i.e., the ones that machines can do well)

87 Classification of Existing Techniques o How to reduce �? ѲTask Pruning ѲAnswer Deduction ѲTask Selection ѲSampling o How to reduce �? ѲTask Design

88 Answer Deduction o Key Idea – Prune the tasks whose answers can be deduced from existing crowdsourced tasks o Example: Transitivity

?

? Deduced

?

• Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng: Leveraging transitive relations for crowdsourced joins. SIGMOD 2013 • Donatella Firmani, Barna Saha, Divesh Srivastava: Online Entity Resolution Using an Oracle. PVLDB 2016 89 Answer Deduction (cont’d) o Workflow (iterative) 1. Pick up some tasks from a task pool 2. Collect answers of the tasks from the Crowd 3. Remove the tasks whose answers can be deduced

Step 1

Step 2

Task Pool

Step 3 90 Answer Deduction (cont’d) o Pros – Work for both easy and hard tasks

? o Cons – Human errors can be amplified

Wrong

91 Classification of Existing Techniques o How to reduce �? ѲTask Pruning ѲAnswer Deduction ѲTask Selection ѲSampling o How to reduce �? ѲTask Design

92 Task Selection o Key Idea – Select the most beneficial tasks to o Example 1: Active Learning – Most beneficial for training a model Supervised Learning Active Learning

• Mozafari et al. Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning. PVLDB 2014 • Gokhale et al. Corleone: hands-off crowdsourcing for entity matching. SIGMOD 2014 93 Task Selection o Key Idea – Select the most beneficial tasks to crowdsource

o Example 2: Top-k – Most beneficial for getting the top-k results Which picture visualizes the best SFU Campus?

Rank by computers

The most beneficial task: VS.

Xiaohang Zhang, Guoliang Li, Jianhua Feng: Crowdsourced Top-k Algorithms: An Experimental Evaluation. PVLDB 2016 94 Task Selection (cont’d) o Workflow (iterative) 1. Select a set of most beneficial tasks 2. Collect their answers from the Crowd 3. Update models and results o Pros – Allow for a flexible quality/cost trade-off o Cons – Hurt latency (since only a small number of tasks can be crowdsourced at each iteration)

95 Classification of Existing Techniques o How to reduce �? ѲTask Pruning ѲAnswer Deduction ѲTask Selection ѲSampling o How to reduce �? ѲTask Design

96 Sampling o Key Idea – Ask the crowd to work on sample data o Example: SampleClean Who published more?

1 0.95 211 0.9 0.85 0.8 0.75 255 0.7 Quality 0.65 0.6 0.55 173 0.5 50 250 450 650 850 1050 1250 Sample Size

Jiannan Wang, Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Tim Kraska, Tova Milo: A sample-and-clean framework for fast and accurate query processing on dirty data. SIGMOD Conference 2014: 469-480 97 Sampling (Cont’d) o Workflow (iterative) 1. Generate tasks based on a sample 2. Collect the task answers from the Crowd 3. Infer the results of the full data o Pros – Provable bounds for quality (e.g., the paper count is 211±5 with 95% probability) o Cons – Limited to certain applications (e.g., it does not work for max) 98 Classification of Existing Techniques o How to reduce �? ѲTask Pruning ѲAnswer Deduction ѲTask Selection ѲSampling o How to reduce �? ѲTask Design

99 Task Design (Cont’d) o Key Idea – Optimize o Example 1: Count

How many are female? 2

Adam Marcus, David R. Karger, Samuel Madden, Rob Miller, Sewoong Oh: Counting with the Crowd. PVLDB 2012 100 Task Design (Cont’d) o Key Idea – Optimize User Interface o Example 2: Image Labeling

Luis von Ahn, Laura Dabbish: Labeling images with a computer game. CHI 2004: 319-326 101 Summary of Cost Control o Two directions ѲHow to reduce n? DB ѲHow to reduce c? HCI o DB and HCI should work together o Non-iterative and iterative workflows are both widely used

102 Outline

◦ Crowdsourcing Overview (30min) – Motivation (5min) – Workflow (15min) – Platforms (5min) Part 1 – Difference from Other Tutorials (5min) ◦ Fundamental Techniques (100min) – Quality Control (60min) – Cost Control (20min) – Latency Control (20min) ◦ Crowdsourced Database Management (40min) – Crowdsourced Databases (20min) Part 2 – Crowdsourced Optimizations (10min) – Crowdsourced Operators (10min) ◦ Challenges (10min)

103 Latency Control o Goal – How to reduce latency? o Latency = �×� Ѳ�: number of tasks Ѳ�: latency❌of each task o Latency = The completion time of the last task

104 Classification of Latency Control 1. Single Task

– Reduce the latency of a Single task single task

2. Single Batch – Reduce the latency of a batch of tasks Single batch

3. Multiple Batches – Reduce the latency of Multiple batches multiple batches of tasks

Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin: CLAMShell: Speeding up Crowds for Low-latency Data Labeling. PVLDB 2015 105 Single-Task Latency Control • Latency consists of – Phase 1: Recruitment Time – Phase 2: Qualification and Training Time – Phase 3: Work Time • Improve Phase 1 – See the next slide • Improve Phase 2 – Remove this phase by applying other quality control techniques (e.g., worker elimination) • Improve Phase 3 – Better User Interfaces 106 Reduce Recruitment Time o Retainer Pool – Pre-recruit a pool of crowd workers

Workers sign up in advance Alert when task is ready

Get paid: Get paid: 0.5 cent per minute 0.5 cent per minute

Wait at most: Wait at most: 5 minutes 5 minutes

Michael S. Bernstein, Joel Brandt, Robert C. Miller, David R. Karger: Crowds in two seconds: enabling realtime crowd-powered interfaces. UIST 2011 107 Classification of Latency Control 1. Single Task

– Reduce the latency of a Single task single task

2. Single Batch – Reduce the latency of a batch of tasks Single batch

3. Multiple Batches – Reduce the latency of Multiple batches multiple batches of tasks

Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin: CLAMShell: Speeding up Crowds for Low-latency Data Labeling. PVLDB 2015 108 Single-Batch Latency Control o Idea 1: Pricing Model – Model the relationship between task price and completion time

o Predict worker behaviors [1,2] – Recruitment Time – Work Time o Set task price – Fixed Pricing [2] – Dynamic Pricing [3]

[1]. Wang et al. Estimating the completion time of crowdsourced tasks using survival analysis models. CSDM 2011 [2]. S. Faradani, B. Hartmann, and P. G. Ipeirotis. What’s the right price? pricing tasks for finishing on time. In AAAI Workshop, 2011. [3]. Y. Gao and A. G. Parameswaran. Finish them!: Pricing algorithms for human computation. PVLDB 2014. 109 Single-Batch Latency Control o Idea 2: Straggler Mitigation – Replicate a task to multiple workers and return the result of the fastest worker

Y N

Straggler

Y mitigation N (e.g., MapReduce, Spark)

Y N

Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin: CLAMShell: Speeding up Crowds for Low-latency Data Labeling. PVLDB 2015110 Classification of Latency Control 1. Single Task

– Reduce the latency of a Single task single task

2. Single Batch – Reduce the latency of a batch of tasks Single batch

3. Multiple Batches – Reduce the latency of Multiple batches multiple batches of tasks

Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin: CLAMShell: Speeding up Crowds for Low-latency Data Labeling. PVLDB 2015 111 Multiple-Batches Latency Control o Why multiple batches? – To save cost • Answer Deduction (e.g., leverage transitivity) • Task Selection (e.g., active learning)

Active Learning

112 Multiple-Batches Latency Control

o Two extreme cases – Single task per batch: high latency – All tasks in one batch: high cost

o Idea 1 – Choose the maximum batch size that does not hurt cost [1,2] o Idea 2 – Model as a latency budget allocation problem [3]

1. Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng: Leveraging transitive relations for crowdsourced joins. SIGMOD 2013 2. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and A. Y. Halevy. Crowd-powered find algorithms. ICDE 2014. 3. Verroios et al.. tdp: An optimal latency budget allocation strategy for crowdsourced MAXIMUM operations. SIGMOD 2015113 Summary of Latency Control o Latency – The completion time of the last task o Classification of Latency Control – Single-Task • Retainer Pool • Better UIs – Single-Batch • Pricing Model • Straggler Mitigation – Multiple-Batches

• Batch size 114 Two Take-Away o There is no free lunch – Cost control • Trades off quality (or/and latency) for cost – Latency control • Trades off quality (or/and cost) for latency o Learn from other communities – Task Design (from HCI) – Straggler Mitigation (from Distributed System)

115 Reference – Cost Control

1. Y. Amsterdamer, S. B. Davidson, T. Milo, S. Novgorodov, and A. Somech. Oassis: query driven crowd mining. In SIGMOD, pages 589–600. ACM, 2014 2. X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193–202, 2013 3. G. Demartini, D. E. Difallah, and P. Cudre-Mauroux. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In WWW, pages 469–478, 2012. 4. B. Eriksson. Learning to top-k search using pairwise comparisons. In AISTATS, pages 265–273, 2013. 5. C. Gokhale, S. Das, A. Doan, J. F. Naughton, N. Rampalli, J. W. Shavlik, and X. Zhu. Corleone: hands-off crowdsourcing for entity matching. In SIGMOD, pages 601–612, 2014. 6. A. Gruenheid, D. Kossmann, S. Ramesh, and F. Widmer. Crowdsourcing entity resolution: When is A=B? Technical report, ETH Zurich. 7. S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD, pages 385–396, 2012. 8. H. Heikinheimo and A. Ukkonen. The crowd-median algorithm. In HCOMP, 2013. 9. S. R. Jeffery, M. J. Franklin, and A. Y. Halevy. Pay-as-you-go user feedback for dataspace systems. In SIGMOD, pages 847–860, 2008. 10. H. Kaplan, I. Lotosh, T. Milo, and S. Novgorodov. Answering planning queries with the crowd. PVLDB, 6(9):697–708, 2013. 11. A. R. Khan and H. Garcia-Molina. Hybrid strategies for finding the max with the crowd. Technical report, 2014. 12. A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. PVLDB, 6(2):109–120, 2012. 13. B. Mozafari, P. Sarkar, M. Franklin, M. Jordan, and S. Madden. Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB, 8(2):125–136, 2014. 14. A. G. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it’s okay to ask questions. PVLDB, 4(5):267–278, 2011. 116 Reference – Cost Control

15. T. Pfeiffer, X. A. Gao, Y. Chen, A. Mao, and D. G. Rand. Adaptive polling for information aggregation. In AAAI, 2012. 16. B. Trushkowsky, T. Kraska, M. J. Franklin, and P. Sarkar. Crowdsourced enumeration queries. In ICDE, pages 673–684, 2013. 17. V. Verroios and H. Garcia-Molina. Entity resolution with crowd errors. In ICDE, pages 219–230, 2015. 18. N. Vesdapunt, K. Bellare, and N. N. Dalvi. Crowdsourcing algorithms for entity resolution. PVLDB, 7(12):1071–1082, 2014. 19. J. Wang, T. Kraska, M. J. Franklin, and J. Feng. CrowdER: crowdsourcing entity resolution. PVLDB, 5(11):1483–1494, 2012. 20. J. Wang, S. Krishnan, M. J. Franklin, K. Goldberg, T. Kraska, and T. Milo. A sample-and-clean framework for fast and accurate query processing on dirty data. In SIGMOD, pages 469–480, 2014. 21. J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, 2013. 22. S. Wang, X. Xiao, and C. Lee. Crowd-based deduplication: An adaptive approach. In SIGMOD, pages 1263–1277, 2015. 23. S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. PVLDB, 6(6):349–360, 2013. 24. T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In MobiSys, pages 77–90, 2010. 25. P. Ye, U. EDU, and D. Doermann. Combining preference and absolute judgements in a crowd- sourced setting. In ICML Workshop, 2013. 26. C. J. Zhang, Y. Tong, and L. Chen. Where to: Crowd-aided path selection. PVLDB, 7(14):2005– 2016, 2014.

117 Reference – Latency Control

1. J. P. Bigham et al. VizWiz: nearly real-time answers to visual questions. UIST, 2010. 2. M. S. Bernstein, J. Brandt, R. C. Miller, and D. R. Karger. Crowds in two seconds: enabling realtime crowd-powered interfaces. UIST, 2011. 3. M. S. Bernstein, D. R. Karger, R. C. Miller, and J. Brandt. Analytic Methods for Optimizing Realtime Crowdsourcing. Collective Intelligence, 2012. 4. Y. Gao and A. G. Parameswaran. Finish them!: Pricing algorithms for human computation. PVLDB, 7(14):1965–1976, 2014 5. S. Faradani, B. Hartmann, and P. G. Ipeirotis. What’s the right price? pricing tasks for finishing on time. In AAAI Workshop, 2011. 6. D. Haas, J. Wang, E. Wu, and M. J. Franklin. Clamshell: Speeding up crowds for low-latency data labeling. PVLDB, 9(4):372–383, 2015 7. A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and A. Y. Halevy. Crowd-powered find algorithms. In ICDE, pages 964–975, 2014 8. V. Verroios, P. Lofgren, and H. Garcia-Molina. tdp: An optimal-latency budget allocation strategy for crowdsourced MAXIMUM operations. In SIGMOD, pages 1047–1062, 2015. 9. T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In MobiSys, pages 77–90, 2010.

118 Outline

◦ Crowdsourcing Overview (30min) – Motivation (5min) – Workflow (15min) – Platforms (5min) Part 1 – Difference from Other Tutorials (5min) ◦ Fundamental Techniques (100min) – Quality Control (60min) – Cost Control (20min) – Latency Control (20min) ◦ Crowdsourced Database Management (40min) – Crowdsourced Databases (20min) Part 2 – Crowdsourced Optimizations (10min) – Crowdsourced Operators (10min) ◦ Challenges (10min)

119 Why Crowdsourcing DB Systems o Limitations of Traditional DB Systems Table: car make model body_style price Volve S80 Sedan $10K Volve XC60 SUV $20K BMW X5 SUV $25K ? Prius Sedan $15K

SELECT * # of rows FROM car WHERE make = “Toyota” 0

Problem: Close world assumption

120 REVIEWR1: ReviewR1 AUTOMOBILER2: AutomobileR2 R3: Image

review make model sentiment id make model style image make model style color quality

a1 VolvoS80 Sedan m m m ...The 2014 Volvo S80 is the 1 2 3 r1 flagship model for the brand... a2 Toyota Avalon Sedan

REVIEWR1: ReviewR1 AUTOMOBILER2: AutomobileR2 R3: Image a3 Volvo XC60 SUV ...S80 is a Volvo model having r2 m m m review make model sentiment id make model style problemsimage inmake oil pump..model style colora4 Toyotaquality Corolla Sedan 4 5 6 a VolvoS80 Sedan a5 BMW X5 SUV R1: Review R2: Automobile1 ...TheRm1BMW3: Image X5 is surprisinglym2 m3 ...The 2014 Volvo S80 is the r3 r1 agile for a big SUV.. a6 ToyotaCamry Sedan flagship model for the brand... a2 Toyota Avalon Sedan review make model sentiment id make model style image make model style color quality Why Crowdsourcinga3 Volvo XC60 SUV DB Systems ...S80 is a Volvo model having IMAGE R3 r2 problems in oil pump.. a4 Toyota Corolla Sedan m4 m5 m6 a1 VolvoS80 Sedan m1 m2 m3 ...The 2014 Volvo S80 is the a5 BMW X5 SUV R1: Review R2: Automobiler1 ...TheRBMW3: Image X5 is surprisingly flagship model for the brand...or3 Limitationsa2 Toyota ofAvalonTraditionalSedan DB Systems agile for a big SUV.. a6 ToyotaCamry Sedan review make model sentiment id make model style image make model style color quality Table: car_imagea3 Volvo XC60 SUV ...S80 is a Volvo model having IMAGE R3 a Volvor2 S80 Sedan 1 problems in oil pump.. m1 a Toyotam2 Corollam3 Sedan m4 m5 m6 ...The 2014 Volvo S80 is the 4 r1 a Toyota Avalon Sedan flagship model for the brand... 2 a5 BMW X5 SUV ...The BMW X5 is surprisingly r a3 Volvo3 agile XC60 for a big SUV.. SUV a ToyotaCamry Sedan ...S80 is a Volvo model having 6 r2 problems in oil pump.. a4 Toyota Corolla Sedan m4 m5 m6 SELECT * Table: car a5 BMW X5 SUV ...The BMW X5 is surprisingly FROM car C, car_image M make model body_style price r3 agile for a big SUV.. a6 ToyotaCamry Sedan WHERE M.make = C.make AND xxx xxx xxx xxx M.model = C.model AND xxx xxx xxx xxx M.color = ”red” …… …… …… …… # of rows Problem: Machine-hard tasks 0 121 Crowdsourcing DB Systems o Integrating crowd functionality to DB – Close world à Open world – Processing DB-hard queries

Task 1: Collect Car Info. Task 2: Filter Red Cars

Make Yes Model No Body Style

` Task 3: Join Car Images …… Yes …… No …… Is it “BMW X5”?

122 Existing Crowd DB Systems o CrowdDB – UC Berkeley & ETH Zurich o Qurk – MIT o Deco – Stanford o CDAS – NUS o CDB – Tsinghua

123 System Architecture

Requester SQL-Like Crowdsourcing Query Language Query Result Query Processing

Query Parser UI Templates Parameters Crowdsourcing Initial Plan Crowdsourcing Query Operator Design Optimization Query Optimizer HIT Manager Optimized Plan HIT 1 HIT 2

Query Executor XXX XXX σ HIT 3 …… R S

Crowd Relational Crowdsourcing Crowdsourced Platforms Data Model Data Interaction 124 REVIEWR1: ReviewR1 AUTOMOBILER2: AutomobileR2 R3: Image Running Examplereview make model sentiment id make model style image make model style color quality a1 VolvoS80 Sedan m m m ...The 2014 Volvo S80 is the 1 2 3 r1 flagship model for the brand... a2 Toyota Avalon Sedan car_reviewREVIEW R1 R1 AUTOMOBILE R2 REVIEWR : ReviewR1 R1: Review AUTOMOBILER : AutomobileR2 carR2R2: Automobile R : Image R3: Imagea3 Volvo XC60 SUV 1 2 ...S80 is a3 Volvo model having r 2 a Toyota Corolla Sedan m4 m5 m6 review make model reviewsentimentmake modelid sentimentmake modelid stylemake problemsmodelimage inmake oilstyle pump..model imagestyle makecolor4 modelquality style color quality

a5 BMW X5 SUV R : Review R : Automobile a1 VolvoS80R : Image Sedan 1 2 a1 VolvoS80 Sedan ...Them1BMW3 X5 is surprisinglym2 m1 m3 m2 m3 ...The 2014 Volvo S80 is the...The 2014 Volvo S80 is the r3 r1 r1 agile for a big SUV.. a6 ToyotaCamry Sedan flagship model for the brand...flagship model fora2 the brand...Toyota Avalona2 SedanToyota Avalon Sedan review make model sentiment id make model style image make model style color quality a3 Volvo XC60a3 SUVVolvo XC60 SUV ...S80 is a Volvo model having...S80 is a Volvo model having IMAGE R3 r2 r2 problems in oil pump.. problems in oil pump..a4 Toyota Corollaa4 SedanToyota Corollam4 Sedan m5 m4 m6 m5 m6 a1 VolvoS80 Sedan m1 m2 m3 ...The 2014 Volvo S80 is the a5 BMW X5a5 SUVBMW X5 SUV R1: Review R1: Review R2: Automobiler1 R2: Automobile ...TheRBMW3: Image X5 is surprisingly...TheR BMW3: Image X5 is surprisingly flagship model for the brand... r3 a2 r3Toyota Avalon Sedan agile for a big SUV.. agile for a big SUV..a6 ToyotaCamrya6 SedanToyotaCamry Sedan sentiment review makereviewmodelmake model sentimentid make id modelmake stylemodel imagestyle makea3 imagemodelVolvomakestyle XC60modelcolor SUVstylequality color quality ...S80 is a Volvo model having IMAGE R3 car_imageIMAGE R3 R3 r2 a1 Volvoa1 S80Volvo SedanS80 Sedanm1 a Toyotamm2 Corollamm3 Sedan mm4 m5 m6 ...The 2014 Volvo S80 is the problems in oil pump.. 4 1 2 3 r ...The 2014 Volvo S80 is the 1 flagship modelr1 for the brand... a Toyota Avalon Sedan flagship model for the brand...2 a2 Toyota Avalon Sedan a5 BMW X5 SUV ...The BMW X5 is surprisingly r3 a3 Volvoa3 XC60Volvo SUV XC60 SUV a ToyotaCamry Sedan ...S80 is a Volvo model having agile for a big SUV.. 6 r ...S80 is a Volvo model having 2 r2 a Toyota Corolla Sedan m4 m5 m6 problems in oil pump..problems in oil pump.. 4 a4 Toyota Corolla Sedan m4 m5 m6 a BMW X5 SUV Example Query: 5 a5 BMW X5 SUV ...The BMW X5...Theis surprisinglyBMW X5 is surprisingly r3 r Find black cars with high-quality images agile for a big3 SUV..agile for a big SUV.. a6 Toyotaa CamryToyota SedanCamry Sedan 6 and positive reviews 125 Crowdsourcing DB Systems o System Overview ѲCrowdDB ѲQurk ѲDeco ѲCDAS ѲCDB o Operator Design ѲDesign Principles 126 CrowdDB Query Language o CrowdSQL: Crowdsource missing data

Missing Columns Missing Tuples

review make model sentiment make model style color xxx Volvo S80 ? ? ? ? ?

CREATE TABLE car_review CREATE CROWD TABLE car ( ( review STRING, make STRING, make CROWD STRING, model STRING, model CROWD STRING, color STRING, sentiment CROWD STRING style STRING, ); PRIMARY KEY (make, model) );

127 CrowdDB Query Language o CrowdSQL: Crowdsource DB-hard tasks

Crowd-powered Filtering Crowd-Powered Ordering

The Vovlo S80 is the flagship model of this brand…

Is the review positive? Which one is better?

SELECT review SELECT image i FROM car_review FROM car_image WHERE sentiment ~= "pos"; WHERE subject = "Volvo S60" ORDER BY CROWDORDER(”clarity");

128 CrowdDB Query Processing ◦ Crowd operators for data missing

SELECT * FROM car C, car_review R WHERE C.make = R.make AND C.model = R.model AND C.make = “Volvo”

Fill out the missing Car review

C.make=R.make review C.model=R.model make CrowdJoin model

car_review R sentiment

Mak e= Fill out the Car data σ “ Volvo”

make Volvo CrowdProbe model car C color style 129 CrowdDB Query Processing o Crowd operators for DB-hard tasks

SELECT * SELECT * FROM company C1, company C2 FROM image M WHERE C1.name ~= C2.name ORDER BY CROWDORDER (“clarity”)

CrowdCompare 130 CrowdDB Query Optimization o Strategy: Rule-based optimizer – Pushing down selects � make=“Volvo” – Determining join orders car C car_review R Fill out the missing Car review

C.make=R.make review C.model=R.model make CrowdJoin model

car_review R sentiment

Mak e= Fill out the Car data σ “ Volvo”

make Volvo CrowdProbe model car C color style 131 Crowdsourcing DB Systems o System Overview ѲCrowdDB ѲQurk ѲDeco ѲCDAS ѲCDB o Operator Design ѲDesign Principles 132 Qurk Query Language o SQL with User-Defined Functions (UDFs) SELECT i.image FROM car_image i WHERE isBlack(i)

TASK isBlack(field) TYPE Filter: Prompt: "

\ \ \
Is the car in black color?
", tuple[field] YesText: "Yes" NoText: "No" Combiner: MajorityVote

Is the car in black color?

Yes No

133 Qurk Query Processing o Designing crowd-powered operators – Crowd Join: Designing better interfaces

Simple Join

Naïve Batching Smart Batching 134 Qurk Query Processing o Designing crowd-powered operators – Crowd Sort: Designing better interfaces

Rate the visualization of image Which one visualizes better?

A is better

worst best B is better 1 2 3 4 5 6

Rating-Based Comparing-Based Interface Interface

135 Qurk Query Optimization o Join: Feature filtering optimization

SELECT * FROM car_image M1 JOIN car_image M2 ON sameCar(M1.img, M2.img) AND POSSIBLY make(M1.img) = make(M2.img) AND POSSIBLY style(M1.img) = style(M2.img) Filtering pairs with different makes & colors o Is filtering feature always helpful? – Filtering cost vs. join cost • What if all cars has the same style – Causing false negatives, e.g., color – Disagreement among the crowd 136 Crowdsourcing DB Systems o System Overview ѲCrowdDB ѲQurk ѲDeco ѲCDAS ѲCDB o Operator Design ѲDesign Principles 137 Deco Query Language o Conceptual Relation Car ( make, model, [door-num], [style])

Anchor Attributes Dependent Attribute-groups o Raw Schema CarA ( make, model) // Anchor table CarD1 ( make, model, door-num) //Dependent table CarD2 ( make, model, style) // Dependent table o Fetch Rules: How to collect data

∅ ⇒ make, model //Ask for a new car make, model ⇒ door-num//Ask for d-n of a given car make, model ⇒ style //Ask for style of a given car

138 Deco Query Language o Resolution rules

image ⇒ style: majority-of-3 // majority vote ∅ ⇒ make,model: dupElim //eliminate duplicates o Query – Collecting style and color of at least 8 SUV cars – SQL Query:

SELECT make, model, door-num, style FROM Car WHERE style = “SUV” MINTUPLES 8 • Standard SQL Syntax and Semantics • New keyword: MINTUPLES

139 Deco Query Processing o Crowd Operator: Fetch

Fetch Fetch Fetch [∅⇒ma,mo] [ma,mo⇒st] [ma,mo⇒dn]

Collect styleof a given car Collect styleof a given car Collect New Car Make Volvo Make Volvo Make Model S80 Model S80 Model Style Door-Num o Machine Operators – Scan: insert a collected tuple into raw table – Resolve: e.g., majority-of-3, dupElim – DLOJoin: traditional join 140 Deco Query Optimization o Example – Current Status of the database CarA CarD2 make model make model Style Volve S80 Volve XC60 SUV Toyota Corolla BMW X5 SUV BMW X5 Volvo S80 Sedan Volvo XC60 – Selectivity of [style=‘SUV’] = 0.1 – Selectivity of dupElim = 1.0 – Each fetch incurs $0.05 o How will a query be evaluated?

141 Deco Query Processing

MinTuples [8]

DLOJoin [ma,mo]

Filter Resolve [st=“SUV”] [maj3]

DLOJoin Scan Fetch [ma,mo] [CarD1] [ma,mo⇒dn]

Resolve Resolve [dupElim] [maj3]

Scan Fetch Scan Fetch [CarA] [∅⇒ma,mo] [CarD2] [ma,mo⇒st]

142 Deco Query Optimization o Cost Estimation – Let us consider a simple case – Resolve [dupElim] MinTuples • Target: 8 SUV cars [8] • DB: 2 SUV cars, 1 Sedan

Filter car, and 1 unknown car [st=“SUV”] • Estimated: 2.1 SUV

Resolve – Fetch [dupElim] • Target: (8 – 2.1) SUV cars • Sel [style=‘SUV’] = 0.1 Scan Fetch [CarA] [∅⇒ma,mo] • Fetch 59 cars – Cost: 59 * $0.05 = $2.95 143 Deco Query Optimization o Better Plan: Reverse Query Plan ……

Filter Resolve [st=“SUV”] [maj3]

DLOJoin Scan Fetch [ma,mo] [CarD1] [ma,mo⇒dn]

Resolve Resolve [dupElim] [maj3]

Collect New Car Scan Fetch Scan Fetch [CarA] [st⇒ma,mo] [CarD2] [ma,mo⇒st] Style SUV Make

Model Reverse Plan incurs less cost in this query 144 Crowdsourcing DB Systems o System Overview ѲCrowdDB ѲQurk ѲDeco ѲCDAS ѲCDB o Operator Design ѲDesign Principles 145 CDAS Query Language o SQL with Crowdsourcing on demand – Crowdsourcing when columns are unknown

SELECT c.*, i.image, r.review FROM car_image i, car_review r WHERE r.sentiment = “pos” AND i.color = “black” AND r.make = i.make AND r.model = i.model

Is the review matching with the image?

The Vovlo S80 is the flagship model of this brand…

Is the review positive? Is the car in black?

146 CDAS Query Processing o Designing Crowd Operators – CrowdFill: filling missing values – CrowdSelect: filtering items –CSelectCrowdJoin operator : matchingCJoin itemsoperator from multipleCFill operatorsources S J F o (R3, {C1,C2,C3}) o (R3, R1, {C1,C2}) o ({R3, R3.color})

Select Images Join Image and Review Fill Car Attributes Conditions: C1: make=... color of car in the image: C1: make C2: model=... 1: black C : model C3: style=... 2 2: red ...The 2014 Your Choice: 3: blue Your Choice: Volvo S80 is Yes, it does the flagship Yes model for the Your Choice: No, it doesn’t brand... No

147 CDAS Query Processing o Performance metrics – Monetary cost: Unit price * # of HITs – Latency: # of crowdsourcing rounds o Optimization Objectives: – Cost Minimization: finding a query plan minimizing the monetary cost – Cost Bounded Latency Minimization: finding a query plan with bounded cost and the minimum latency o Key Optimization Idea – Cost-based query optimization – Balance the tradeoff between cost and latency

148 CDAS Query Processing

3) Determining Join orders

J R3.make=R2.make CJoin o2 R3.model=R2.model 2) CFill-CJoin for Joins

1) Optimizing Selections J CJoin o1 R1.make=R2.make R1.model=R2.model

S F make CSelect o2 CFill o1 quality = ”high”

S S CSelect o1 R2 CSelect o3 color = ”black” sentiment = ”pos”

R3 R1

149 CDAS Query Optimization o Cost-Latency Tradeoff

C style = ”Sedan” 4: C4: style = ”Sedan” CSelect oS 4 AND C3: make = ”Volvo” C3: make = ”Volvo” AND C2: quality = ”high” S CSelect o3 AND C1: color = ”black” C2: quality = ”high” S CSelect o2 S CSelect o5 C1: color = ”black” S CSelect o1

R3 R3

Less cost, higher latency More cost, lower latency

How to balance cost-latency tradeoff? 150 CDAS Query Optimization o How to implement Join – CJoin: Compare every pairs – CFill: Fill missing join attributes o A Hybrid CFill-CJoin Optimization SELECT * FROM car R2, car_image R3 WHERE R2.make = R3.make AND R2.model = R3.model

R .make=R .make Make N0 2 3 R2.model=R3.model

J Volvo BMW Toyota CJoin o1 N1 m5 N4 m6 N5 Style F a2 a5 CFill o 4 style Sedan SUV a4 a 6 F make N2 m1 m2 N3 CFill o 3 m3 a3 m4 R2 R3 a1 151 CDAS Query Optimization o Complex query optimization – The latency constraint allocation problem

J Latency CJoin o2 constraint l

Latency Latency constraint L - l constraint L-l CJoin oJ 1 Latency constraint L S CSelect o2

S CSelect o1

R3 152 Crowdsourcing DB Systems o System Overview ѲCrowdDB ѲQurk ѲDeco ѲCDAS ѲCDB o Operator Design ѲDesign Principles 153 CDB Query Language o Collect Semantics – Fill Semantics FILL car_image.color WHERE car_image.make = “Volvo”; – Collect Semantics

COLLECT car.make, car.model WHERE car.style = “SUV”; o Query Semantics SELECT * FROM car_image M, car C, car_review R WHERE M.(make,model) CROWDJOIN C.(make,model) AND R.(make, model) CROWDJOIN C.(make,model) AND M.color CROWDEQUAL “red” 154 CDB Query Processing o Graph-Based Query Model – Computing matching probabilities each CROWDJOIN – Building a query graph that connects tuple pairs with matching probabilities larger than a threshold

car_review car car_image

“red”

155 CDB Query Processing o Graph-Based Query Model – Crowdsource all edges (Yes/No tasks) – Coloring edges by the crowd answers – Result tuple: a path containing all CROWDJOINs car_review car car_image

“red”

156 CDB Query Optimization

o Monetary cost control – Traditional goal: finding an optimal join order – CDB goal: selecting minimum number of edges

“red”

Traditional 2 tasks + 5 tasks + 1 task = 8 tasks

157 CDB Query Optimization

o Monetary cost control – Traditional goal: finding an optimal join order – CDB goal: selecting minimum number of edges

“red”

Traditional 2 tasks + 5 tasks + 1 task = 8 tasks CDB 5 tasks NP-HARD è Various Heuristics 158 CDB Query Optimization o Latency control – Partitioning the graph into connected components – Crowdsourcing each components in parallel

159 CDB Query Optimization o Quality control – Probabilistic truth inference model

– Entropy-based task assignment model o Other Task Types – Single-choice & Multi-choice tasks – Fill-in-blank tasks – Collection tasks 160 Take-Away for System Design o Data Model – Relational model – Open world assumption o Query Language – Extending SQL – Supporting interactions with the crowd o Query Processing – Tree-based vs. Graph-based – Crowd-powered operators – Optimization: Quality, Cost, and Latency

161 Crowdsourcing DB Systems o System Overview ѲCrowdDB ѲQurk ѲDeco ѲCDAS ѲCDB o Operator Design ѲDesign Principles 162 Design Principles o Leveraging crowdsourcing techniques – Quality Controlling • Truth Inference: inferring correct answers • Task Assignment: assigning tasks judiciously – Cost Controlling • Answer Deduction: avoiding unnecessary costs • Task Selection: selecting most beneficial tasks – Latency Controlling • Round Reduction: reducing # of rounds – Task Design • Interface Design: interacting with crowd wisely

163 Crowdsourced Selection o Objective – Identifying items satisfying some conditions o Key Idea – Task Assignment: cost vs. quality

Find all images containing SUV cars from an image set

For each image # of o (x,y): x YES, y No YES answers o Truth Inference • Output PASS? • Output FAIL? o Task Assignment • Ask one more? # of NO answers A. G. Parameswaran et al.: CrowdScreen: algorithms for filtering data with humans. SIGMOD Conference 2012: 361-372 164 Crowdsourced Selection o Key Idea – Latency Controlling: cost vs. latency

Find 2 images with SUV cars from 100 images

Sequential ✅ ❌ ❌ C: 4 L: 4 ✅ Round 1 Round 2 Round 3 Round 4

Parallel ……✅ C: 100 L: 1 ✅ ❌ ❌ ✅ ……❌ Round 1 Hybrid C: 4 L: 3 ✅ ❌ ❌ ✅ � Round 1 Round 2 Round 3

A. D. Sarma et al.: Crowd-powered find algorithms. ICDE 2014: 964-975 165 Crowdsourced Join o Objective – Identifying record pairs referring to same entity o Key Idea – Answer Deduction, e.g., using Transitivity

Task Pool

• Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng: Leveraging transitive relations for crowdsourced joins. SIGMOD 2013 • Donatella Firmani, Barna Saha, Divesh Srivastava: Online Entity Resolution Using an Oracle. PVLDB 2016 166 Crowdsourced Join o Key Idea – Task Selection, e.g., selecting beneficial tasks

One task deduced No task deduced

• Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng: Leveraging transitive relations for crowdsourced joins. SIGMOD 2013 • S. E. Whang, P. Lofgren, H. Garcia-Molina: Question Selection for Crowd Entity Resolution. PVLDB 6(6): 349-360 (2013) 167 Crowdsourced TopK/Sort o Objective – Finding top-k items (or a ranked list) wrt. Criterion o Key Idea – Truth Inference: Resolve conflicts among crowd Which picture visualizes the best SFU Campus?

A B C D

2 C o Ranking Inference over Pair-wise conflicts among crowd 1 Voting A B 1 3 2 • Max Likelihood Inference A > B 3 D • NP-hard

S. Guo, et al. : So who won?: dynamic max discovery with the crowd. SIGMOD Conference 2012: 385-396 168 Crowdsourced TopK/Sort o Key Idea – Task Selection: Most beneficial for getting the top-k results

What are the top-2 picture that visualizes the best SFU Campus?

Rank by computers

The most beneficial task: VS. Difficult to computers

Xiaohang Zhang, Guoliang Li, Jianhua Feng: Crowdsourced Top-k Algorithms: An Experimental Evaluation. PVLDB 2016 169 Crowdsourced Collection o Objective – Collecting a set of new items o Key Idea – Truth Inference: Inferring item coverage

o Species Estimation Algo. • Observing the rate at which new species are identified over time • inferring how close to the true number of species you are

B. Trushkowsky et al.: Crowdsourced enumeration queries. ICDE 2013: 673-684 170 Crowdsourced Collection o Key Idea – Task Assignment: satisfying result distribution

o Diverse distributions among workers • E.g., collecting movies with publishing decades Worker 1 0.5 o Old Fashioned

0 1950s1960s1970s1980s1990s2000s

Worker 2 o New 1 Fashioned 0.5

0 1950s1960s1970s1980s1990s2000s J. Fan et al.: Distribution-Aware Crowdsourced Entity Collection. TKDE 2017 171 Crowdsourced Fill o Objective – Filling missing cells in a table o Key Idea: Task Design – Microtask vs. partially-filled table with voting – Real-Time collaboration for concurrent workers – Compensation scheme with budget

172 Crowdsourced Count o Objective – Estimating number of certain items o Key Idea – Task Design: Leveraging crowd to estimate

How many are female? 2

Adam Marcus, David R. Karger, Samuel Madden, Rob Miller, Sewoong Oh: Counting with the Crowd. PVLDB 2012 173 Take-Away for Crowd Operators

CrowdSelect CrowdJoin CrowdSort CrowdCollect CrowdFillCrowdCount Truth √ √ √ √ × × Inference Task √ × √ √ × × Assignment Answer × √ × × × × Deduction Task × √ √ × × × Selection Round √ √ × × × × Reduction Interface × √ √ × √ √ Design

174 System Comparison

CrowdDB Qurk Deco CDAS CDB

CrowdSelect √ √ √ √ √

CrowdJoin √ √ √ √ √

CrowdSort √ √ × × √

CrowdTopK √ √ × × √ Crowd Powered CrowdMax √ √ × × √ Operators CrowdMin √ √ × × √

CrowdCount × × × × √

CrowdCollect √ × √ × √

CrowdFill √ × √ √ √

175 System Comparison

CrowdDB Qurk Deco CDAS CDB

Cost √ √ √ √ √ Optimization Latency × × × √ √ Objectives Quality √ √ √ √ √

Truth Inference √ √ √ √ √

Task × × × × √ Assignment

Design Answer × × × × √ Techniques Reasoning

Task Design √ √ √ √ √

Latency × × × √ √ Reduction

176 Reference

1. M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61–72, 2011. 2. A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211–214, 2011. 3. H. Park, R. Pang, A. G. Parameswaran, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: A system for declarative crowdsourcing. PVLDB, 2012. 4. J. Fan, M. Zhang, S. Kok , M. Lu, and B. C. Ooi. Crowdop: Query optimization for declarative crowdsourcing systems. IEEE Trans. Knowl. Data Eng., 27(8):2078–2092, 2015. 5. G. Li, C. Chai, J. Fan, X. Weng, J. Li, Y. Zheng, Y. Li, X. Yu, X. Zhang, H. Yuan. CDB: Optimizing Queries with Crowd-Based Selections and Joins. in SIGMOD, 2017. 6. A. G. Parameswaran et al.: CrowdScreen: algorithms for filtering data with humans. SIGMOD Conference 2012: 361-372. 7. A. D. Sarma et al.: Crowd-powered find algorithms. ICDE 2014: 964-975. 8. Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng: Leveraging transitive relations for crowdsourced joins. SIGMOD 2013. 9. Donatella Firmani, Barna Saha, Divesh Srivastava: Online Entity Resolution Using an Oracle. PVLDB 2016. 10. S. E. Whang, P. Lofgren, H. Garcia-Molina: Question Selection for Crowd Entity Resolution. PVLDB 6(6): 349- 360 (2013). 11. S. Guo, et al. : So who won?: dynamic max discovery with the crowd. SIGMOD Conference 2012: 385-396. 12. Xiaohang Zhang, Guoliang Li, Jianhua Feng: Crowdsourced Top-k Algorithms: An Experimental Evaluation. PVLDB 2016. 13. B. Trushkowsky et al.: Crowdsourced enumeration queries. ICDE 2013: 673-684. 14. J. Fan et al.: Distribution-Aware Crowdsourced Entity Collection. TKDE 2017. 15. H. Park, J. Widom: CrowdFill: collecting structured data from the crowd. SIGMOD Conference 2014: 577-588. 16. Adam Marcus, David R. Karger, Samuel Madden, Rob Miller, Sewoong Oh: Counting with the Crowd. PVLDB 2012.

177 Outline

◦ Crowdsourcing Overview (30min) – Motivation (5min) – Workflow (15min) – Platforms (5min) Part 1 – Difference from Other Tutorials (5min) ◦ Fundamental Techniques (100min) – Quality Control (60min) – Cost Control (20min) – Latency Control (20min) ◦ Crowdsourced Database Management (40min) – Crowdsourced Databases (20min) Part 2 – Crowdsourced Optimizations (10min) – Crowdsourced Operators (10min) ◦ Challenges (10min)

178 The 6 Crowdsourcing Challenges

◦ Benchmarking ◦ Scalability ◦ Truth Inference ◦ Privacy ◦ Macro-Tasks ◦ Mobile Crowdsourcing

179 1. Benchmarking

◦ Database Benchmarks

TPC-C, TPC-H, TPC-DI,…

◦ Crowdsourcing No standard benchmarks

◦ Existing public datasets (link) are inadequate

180 1. Benchmarking

◦ Existing public datasets are inadequate, because:

◦ Each task often receives 5 or less answers ◦ Most tasks are single-label tasks ◦ Very few numeric tasks ◦ Lack ground truth ◦ Expensive to get ground truth for 10K tasks

181 2. Scalability

◦ Hard to Scale in Crowdsourcing to tackle the 3Vs of ?

◦ (1) workers are expensive; (2) answers can be erroneous; (3) existing works focus on specific problems, e.g., active learning (Mozafari et al. VLDB14), entity matching (Gokhale et al. SIGMOD14).

182 2. Scalability: Query Optimization

◦ Query Processing in Traditional RDBMS

Logical Query Physical Query Parser Query Plan Rewriter Query Plan Optimization

183 2. Scalability: Query Optimization

◦ Query optimization in crowdsourcing is challenging:

(1) handle 3 optimization objectives

(2) humans are more unpredictable than machines

184 3. Truth Inference

◦ Not fully solved (Zheng et al. VLDB17)

◦ We have surveyed 20+ methods:

(1) No best method;

(2) The oldest method (David & Skene JRSS 1979) is the most robust;

(3) No robust method for numeric tasks (the baseline “Mean” performs the best !)

185 4. Privacy

◦ (1) Requester

Wants to protect the privacy of their tasks from workers

e.g., tasks may contain sensitive attributes, e.g., medical data.

186 4. Privacy

◦ (2) Workers

Want to have privacy- preserving requirement & worker profile

e.g., personal info of workers can be inferred from the worker’s answers, e.g., location, gender, etc.

187 5. Macro-Tasks

◦ Existing works focus on simple micro-tasks

Is Bill Gates currently Identify the sentiment of the CEO of Microsoft ? the tweet: …… Yes No Pos Neu Neg

◦ Hard to perform big and complex tasks, e.g., writing an essay

(1) macro-tasks are hard to be split and accomplished by multiple workers; (2) workers may not be interested to perform a time-consuming macro-task. 188 6. Mobile Crowdsourcing ◦ Emerging mobile crowdsourcing platforms e.g., gMission (HKUST), ChinaCrowd (Tsinghua) ◦ Challenges (1) Other factors (e.g., spatial distance, mobile user interface) affect workers’ latency and quality; ◦ (2) Different mechanisms traditional crowdsourcing platforms: workers request tasks from the platform;

for mobile crowdsourcing platform: only workers close to the crowdsourcing task can be selected.

189 Thanks ! Q & A

Guoliang Li Yudian Zheng Ju Fan Jiannan Wang Reynold Cheng Tsinghua Hong Kong Renmin SFU Hong Kong University University University University