The Jabberwocky Programming Environment for Structured Social Computing
Total Page:16
File Type:pdf, Size:1020Kb
The Jabberwocky Programming Environment for Structured Social Computing Salman Ahmad Alexis Battle Zahan Malkani Sepandar D. Kamvar [email protected] [email protected] [email protected] [email protected] ABSTRACT Dog ManReduce Dormouse We present Jabberwocky, a social computing stack that con- sists of three components: a human and machine resource management system called Dormouse, a parallel program- API ming framework for human and machine computation called Deploy Dog Compiler ManReduce Script Runtime ManReduce, and a high-level programming language on top Dormouse Master of ManReduce called Dog. Dormouse is designed to enable cross-platform programming languages for social computa- Library tion, so, for example, programs written for Mechanical Turk Dormouse can also run on other crowdsourcing platforms. Dormouse Compute Clusters Crowd Workers Dog Script User-Defined ManReduce also enables a programmer to easily combine crowdsourcing Functions Library platforms or create new ones. Further, machines and peo- ple are both first-class citizens in Dormouse, allowing for Figure 1: Overview of Jabberwocky natural parallelization and control flows for a broad range of data-intensive applications. And finally and importantly, Dormouse includes notions of real identity, heterogeneity, has been used to address large-scale goals ranging from la- and social structure. We show that the unique properties beling images [23], to finding 3-D protein structures [3], to of Dormouse enable elegant programming models for com- creating a crowdsourced illustrated book [8], to classifying plex and useful problems, and we propose two such frame- galaxies in Hubble images [1]. works. ManReduce is a framework for combining human In existing paradigms, human workers are often treated as and machine computation into an intuitive parallel data flow homogeneous and interchangeable, which is useful in han- that goes beyond existing frameworks in several important dling issues of scale and availability. However, the limited ways, such as enabling functions on arbitrary communication notions of identity, reputation, expertise, and social relation- graphs between human and machine clusters. And Dog is a ships limit the scope of tasks that can be addressed with these high-level procedural language written on top of ManReduce systems. Incorporating real identities, social structure, and that focuses on expressivity and reuse. We explore two appli- expertise modeling has proven valuable in a range of ap- cations written in Dog: bootstrapping product recommenda- plications, for example, in question-answering with Aard- tions without purchase data, and expert labeling of medical vark [11]. Building general frameworks for human computa- images. tion that include these notions will enable complex applica- tions to be built more easily. ACM Classification: H5.2 [Information interfaces and pre- sentation]: User Interfaces. - Graphical user interfaces. A second drawback of existing platforms is that each defines a stand-alone system with rigid structure and requirements, General terms: Languages, Human Factors and thus demands significant work in order to integrate hu- Keywords: social computing, crowdsourcing man computation into larger applications. Each new applica- tion may require building a pipeline from the ground up, and INTRODUCTION in many cases, a new community. Particularly for complex In the last few years, there has been a heightened interest in applications, which may involve several steps of human com- human computation, where tasks that are difficult for com- putation using different crowdsourcing platforms interleaved puters (such as image labeling or transcription) are split into with machine computation, constructing such a pipeline can microtasks and dispatched to people. Human computation be a tedious effort. In practice, complex systems are discour- aged, and most uses of human computation avoid multiple interleaved processing steps. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are To address these issues, we designed Jabberwocky, a social not made or distributed for profit or commercial advantage and that copies computing stack that consists of Dormouse, ManReduce, and bear this notice and the full citation on the first page. To copy otherwise, to Dog. Dormouse is the “virtual machine” layer of the Jab- republish, to post on servers or to redistribute to lists, requires prior specific berwocky stack, consisting of low-level software libraries permission and/or a fee. UIST’11, October 16-19, 2011, Santa Barbara, CA, USA. that interact with both people and traditional computing ma- Copyright 2011 ACM 978-1-4503-0716-1/11/10...$10.00. chines. Dormouse maintains real identities, rich user pro- files, and social relationships for the people who comprise DORMOUSE the system, and allows end users to define arbitrary person- The lowest level of the Jabberwocky stack is Dormouse, a level properties and social structures in the system. Further, “virtual machine” layer that enables cross-platform social Dormouse allows programmers to interact with several dif- computation. Similar to process virtual machines (such as ferent crowdsourcing platforms using the same primitives. the Java Virtual Machine) in traditional computing, Dor- This enables the development of cross-platform program- mouse sits on top of existing crowdsourcing platforms, pro- ming languages for social computing. And finally, because viding a platform-independent programming environment that Dormouse defines communications protocols for both peo- abstracts away the details of the underlying crowdsourcing ple and machines, programmers can very naturally interact platform. Importantly, Dormouse also enables programmers with both in unified control flows even for complex parallel to seamlessly create new crowdsourcing communities and processing tasks. add social features (such as worker profiles and relationships) to existing ones. On top of Dormouse, we built ManReduce, a programming framework inspired by MapReduce [4] (and related to Crowd- Design Goals Forge [13]) to facilitate complex data processing tasks. ManRe- Our design goals for Dormouse are to: duce, like MapReduce, gives the programmer the ability to Support cross-platform programming languages for social specify map and reduce steps, but allowing either step to computing. Programming languages run on Dormouse can be powered by human or machine computation. The data work with any crowdsourcing platform that hooks into Dor- flow, resource allocation, and parallelization necessary for mouse, and can support execution across several platforms in each step are handled by ManReduce with no onus on the the same program. For example, a programmer may, in one programmer. In addition to combining machine and human step, routes tasks to a large number of inexpensive workers computation, ManReduce also provides the ability to choose from one crowdsourcing platform, and in a next step, routes particular types of people to complete each task (based on tasks to a smaller number of vetted experts from another plat- Dormouse), and allows arbitrary dependencies between mul- form, without the needing to learn the separate (and often tiple map and reduce steps. Many interesting social com- complex) API calls from multiple platforms. puting applications fit naturally into this paradigm, as they frequently involve the need for parallelization of subtasks Make it easy for programmers to build new crowdsourcing across people or machines, and subsequent aggregation such communities. In addition to being able to reside on top of as writing a summary or averaging ratings. As a simple ex- existing crowdsourcing platforms, Dormouse makes it easy ample, conducting a survey and tabulating summary statis- for programmers to create new worker communities given a tics for each question (breaking down according to a variety set of e-mail addresses. of demographics) can be expressed using a human map step that sends the survey in parallel to many people, and one or Enable rich personal profiles and social structures. Pro- more machine reduce steps on the output that aggregate the gramming languages run on Dormouse allow the program- responses keyed by question and/or user demographic. mer to route tasks based personal properties such as expertise and demographic, and also to set and modify expertise levels based on task performance. Further, programmers may route While ManReduce offers flexibility and power, it can be too tasks based on social structure (for example, an application low-level for several classes of applications. In many cases, that matches technical papers to reviewers may route papers it is useful to trade some flexibility for expressivity, main- to computer science graduate students to review, and then to tainability, and reuse. To that end, we designed Dog, a high- their advisor to validate the review). level scripting language that compiles into ManReduce. In- spired by the Pig [19] and Sawzall [21] languages for data Combine human and machine computation into a single mining, Dog defines a small but powerful set of primitives parallel computing framework. Programming languages run for requesting computational work from either people or ma- on Dormouse allow the programmer to allocate machines and chines, and for interfacing with workers (and their proper- people in similar ways, leading to more natural control flows ties) through Dormouse. In addition,