The Jabberwocky Programming Environment for Structured Social Computing

Salman Ahmad Alexis Battle Zahan Malkani Sepandar D. Kamvar [email protected] [email protected] [email protected] [email protected]

ABSTRACT Dog ManReduce Dormouse We present Jabberwocky, a social computing stack that con- sists of three components: a human and machine resource management system called Dormouse, a parallel program- API ming framework for human and machine computation called Deploy Dog Compiler ManReduce Script Runtime ManReduce, and a high-level programming language on top Dormouse Master of ManReduce called Dog. Dormouse is designed to enable cross-platform programming languages for social computa- Library tion, so, for example, programs written for Mechanical Turk Dormouse can also run on other crowdsourcing platforms. Dormouse Compute Clusters Crowd Workers Dog Script User-Defined ManReduce also enables a programmer to easily combine crowdsourcing Functions Library platforms or create new ones. Further, machines and peo- ple are both first-class citizens in Dormouse, allowing for Figure 1: Overview of Jabberwocky natural parallelization and control flows for a broad range of data-intensive applications. And finally and importantly, Dormouse includes notions of real identity, heterogeneity, has been used to address large-scale goals ranging from la- and social structure. We show that the unique properties beling images [23], to finding 3-D protein structures [3], to of Dormouse enable elegant programming models for com- creating a crowdsourced illustrated book [8], to classifying plex and useful problems, and we propose two such frame- galaxies in Hubble images [1]. works. ManReduce is a framework for combining human In existing paradigms, human workers are often treated as and machine computation into an intuitive parallel data flow homogeneous and interchangeable, which is useful in han- that goes beyond existing frameworks in several important dling issues of scale and availability. However, the limited ways, such as enabling functions on arbitrary communication notions of identity, reputation, expertise, and social relation- graphs between human and machine clusters. And Dog is a ships limit the scope of tasks that can be addressed with these high-level procedural language written on top of ManReduce systems. Incorporating real identities, social structure, and that focuses on expressivity and reuse. We explore two appli- expertise modeling has proven valuable in a range of ap- cations written in Dog: bootstrapping product recommenda- plications, for example, in question-answering with Aard- tions without purchase data, and expert labeling of medical vark [11]. Building general frameworks for human computa- images. tion that include these notions will enable complex applica- tions to be built more easily. ACM Classification: H5.2 [Information interfaces and pre- sentation]: User Interfaces. - Graphical user interfaces. A second drawback of existing platforms is that each defines a stand-alone system with rigid structure and requirements, General terms: Languages, Human Factors and thus demands significant work in order to integrate hu- Keywords: social computing, crowdsourcing man computation into larger applications. Each new applica- tion may require building a pipeline from the ground up, and INTRODUCTION in many cases, a new community. Particularly for complex In the last few years, there has been a heightened interest in applications, which may involve several steps of human com- human computation, where tasks that are difficult for com- putation using different crowdsourcing platforms interleaved puters (such as image labeling or transcription) are split into with machine computation, constructing such a pipeline can microtasks and dispatched to people. Human computation be a tedious effort. In practice, complex systems are discour- aged, and most uses of human computation avoid multiple interleaved processing steps. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are To address these issues, we designed Jabberwocky, a social not made or distributed for profit or commercial advantage and that copies computing stack that consists of Dormouse, ManReduce, and bear this notice and the full citation on the first page. To copy otherwise, to Dog. Dormouse is the “virtual machine” layer of the Jab- republish, to post on servers or to redistribute to lists, requires prior specific berwocky stack, consisting of low-level software libraries permission and/or a fee. UIST’11, October 16-19, 2011, Santa Barbara, CA, USA. that interact with both people and traditional computing ma- Copyright 2011 ACM 978-1-4503-0716-1/11/10...$10.00. chines. Dormouse maintains real identities, rich user pro- files, and social relationships for the people who comprise DORMOUSE the system, and allows end users to define arbitrary person- The lowest level of the Jabberwocky stack is Dormouse, a level properties and social structures in the system. Further, “virtual machine” layer that enables cross-platform social Dormouse allows programmers to interact with several dif- computation. Similar to process virtual machines (such as ferent crowdsourcing platforms using the same primitives. the Java Virtual Machine) in traditional computing, Dor- This enables the development of cross-platform program- mouse sits on top of existing crowdsourcing platforms, pro- ming languages for social computing. And finally, because viding a platform-independent programming environment that Dormouse defines communications protocols for both peo- abstracts away the details of the underlying crowdsourcing ple and machines, programmers can very naturally interact platform. Importantly, Dormouse also enables programmers with both in unified control flows even for complex parallel to seamlessly create new crowdsourcing communities and processing tasks. add social features (such as worker profiles and relationships) to existing ones. On top of Dormouse, we built ManReduce, a programming framework inspired by MapReduce [4] (and related to Crowd- Design Goals Forge [13]) to facilitate complex data processing tasks. ManRe- Our design goals for Dormouse are to: duce, like MapReduce, gives the programmer the ability to Support cross-platform programming languages for social specify map and reduce steps, but allowing either step to computing. Programming languages run on Dormouse can be powered by human or machine computation. The data work with any crowdsourcing platform that hooks into Dor- flow, resource allocation, and parallelization necessary for mouse, and can support execution across several platforms in each step are handled by ManReduce with no onus on the the same program. For example, a programmer may, in one programmer. In addition to combining machine and human step, routes tasks to a large number of inexpensive workers computation, ManReduce also provides the ability to choose from one crowdsourcing platform, and in a next step, routes particular types of people to complete each task (based on tasks to a smaller number of vetted experts from another plat- Dormouse), and allows arbitrary dependencies between mul- form, without the needing to learn the separate (and often tiple map and reduce steps. Many interesting social com- complex) API calls from multiple platforms. puting applications fit naturally into this paradigm, as they frequently involve the need for parallelization of subtasks Make it easy for programmers to build new crowdsourcing across people or machines, and subsequent aggregation such communities. In addition to being able to reside on top of as writing a summary or averaging ratings. As a simple ex- existing crowdsourcing platforms, Dormouse makes it easy ample, conducting a survey and tabulating summary statis- for programmers to create new worker communities given a tics for each question (breaking down according to a variety set of e-mail addresses. of demographics) can be expressed using a human map step that sends the survey in parallel to many people, and one or Enable rich personal profiles and social structures. Pro- more machine reduce steps on the output that aggregate the gramming languages run on Dormouse allow the program- responses keyed by question and/or user demographic. mer to route tasks based personal properties such as expertise and demographic, and also to set and modify expertise levels based on task performance. Further, programmers may route While ManReduce offers flexibility and power, it can be too tasks based on social structure (for example, an application low-level for several classes of applications. In many cases, that matches technical papers to reviewers may route papers it is useful to trade some flexibility for expressivity, main- to computer science graduate students to review, and then to tainability, and reuse. To that end, we designed Dog, a high- their advisor to validate the review). level scripting language that compiles into ManReduce. In- spired by the Pig [19] and Sawzall [21] languages for data Combine human and machine computation into a single mining, Dog defines a small but powerful set of primitives parallel computing framework. Programming languages run for requesting computational work from either people or ma- on Dormouse allow the programmer to allocate machines and chines, and for interfacing with workers (and their proper- people in similar ways, leading to more natural control flows ties) through Dormouse. In addition, it defines simple inter- for parallelization. faces for using, creating, and sharing functions (human or machine) and microtask templates, making it easy to quickly Make it easy for programmers to create and reuse human implement a wide range of applications. tasks. Dormouse has a straightforward mechanisms to cre- ate new templates for human tasks, and importantly, to reuse those created by others. This minimizes redundant work and Together, the components of the Jabberwocky stack allow allows programmers to focus on control flows rather than cre- programmers to implement applications in a few lines of ating and optimizing task templates. code that would otherwise require writing large amounts of ad hoc infrastructure. We explore several such applications in Architecture this paper, including a journal transcription application that Dormouse is implemented as a set of software components maintains the privacy of the journal-writer, a recommender that reside on a Dormouse master machine. These compo- system that requires no pre-existing co-occurrence data, and nents communicate with one another, as well as an exter- a medical image tagging application that allows experts to nal Dormouse machine cluster and a set of human workers. leverage low-cost generalists (and generalists to learn from These software components are described below and shown experts). in Figure 2. Dormouse Data Standard tasks (e.g. Label and Compare) that can be called from Library (G) any Dormouse application. A human task is an object that 1 (1) consists of a UI template , input parameters, and a return

Application value. Dormouse Workers Crowd Workers (2) The Dormouse Service Adapters (H) manage communication

Application Dormouse Crowdsourcing between Dormouse and other crowdsourcing platforms, such Package Communities Platforms as Mechanical Turk. To hook into Dormouse, a crowdsourc- (3) (12) ing platform need only provide an API that minimally allows Front End (F) for the posting of tasks. The Task Pool uses these adapters to

DeployApplication() API (D) PerformTask() (11) invoke the target service’s API to post tasks. Dormouse Server (E)

(4) Process Deployment (8) (13) To show how these components interact with one another, (6) Process (10) Task Pool (C) Queue (B) we walk through deploying an application on the Dormouse User Profiles Compute Clusters (5) Service Adapters (H) architecture. Figure 2 illustrates the process. (9) Task Records The developer begins by writing an application that links to Processes (7) Application Tasks State the Dormouse Standard Library (1), packaging the applica- Data Store (A) tion using the Dormouse Command-line Utility (2), and de- ploying the package through the Dormouse Server (3). Figure 2: Internal Architecture of Dormouse The Dormouse Server unpacks the application, reads the The Dormouse Data Store (A) is a set of distributed SQL manifest file to find the main executable (the program that databases that store user profiles, community information, contains the top-level control flow of the application) and and application-specific information and data for each run- sends that executable to the Process Queue (4). ning process. The Process Queue begins running the executable (5), which The Dormouse Process Queue (B) schedules the top level may pause for one of two reasons: it either needs access control flow of an application, as well as any computational to machine resources to perform a computational step or it steps that require the Dormouse machine cluster. All Dor- needs to ask human workers to perform a task. In either case, mouse applications are registered with the Process Queue. the executable automatically saves its state to the Dormouse Data Store using the Dormouse Standard Library serializa- The Dormouse Task Pool (C) maintains a set of tasks that are tion routines (7), and temporarily terminates. waiting to be performed by human workers. Each task in the pool is annotated with specifications for the type of workers The Process Queue exploits the parallelism in computational that are eligible to complete the task. Worker specifications steps by running them over a compute cluster (6). It does are predicates represented by a binary expression tree and this by copying the application package to each node in the encoded using JSON. cluster and invoking the runtime hooks provided by the Dor- mouse Standard Library to selectively execute a single step The Dormouse API (D) is a low-level API that contains func- over a subset of key-value input parameters. Once all of the tions to process and update Dormouse system resources (for steps from all the different nodes are finished, the results are example, to register a task, to update a profile, to terminate sent back to the Process Queue, and then to the executable, an application, etc.). which starts to run again. The Dormouse Server (E) is a continuously running service When the executable reaches a point where no further work that manages programs as they are executed, human tasks can be done without human input, it saves its state, termi- as they are dispatched, people as they join or leave, etc., by nates, and outputs a list of human tasks using JSON. The calling Dormouse API functions when appropriate. For ex- executable will be started again once the human tasks are ample, when the Dormouse Server gets a request to run a new complete. Dormouse package, it calls add, which adds the package to the Process Queue. The Process Queue takes the JSON output and sends infor- mation to the Task Pool (8). The Task Pool creates a record The Dormouse Front End (F) is a Ruby on Rails web ap- that includes a link back to the process, worker specifica- plication that serves as a UI for both workers and develop- tion information, and the name of the Service Adapter, if ers. Workers use the Frontend to manage profile information, any, that should be used to post tasks (9). The Task Pool in- view tasks for which they are eligible, and select and com- vokes the “submit” method on the specified Service Adapter plete tasks. Developers can use the interface to manage their (10), which performs the steps necessary to post each task on developer accounts and deploy their applications. the appropriate crowdsourcing platform (11). If no Service Adapter is provided, the tasks will be available on Dormouse The Dormouse Standard Library (G) provides runtime hooks to existing users (12). that communicate between an application and Dormouse, convenient data serialization routines, and a set of human 1written in ERB, a template language used by Ruby on Rails Once a task is completed, the Task Pool sends the answer back to the Process Queue (13). When all needed human 1 map :name => "survey_map" do |key, value| tasks are finished, the process is re-executed, starting up from 2 where it left off. The process continues until it once again 3 task = Survey.prepare 4 :task_name => "Respond to survey", needs human input or it finishes running. When the process is 5 :replication => 1000 finished, it outputs a return code signaling the Process Queue 6 to mark it as done and sends the developer a notification, 7 task.ask do |answer| which includes a download link to a JSON file containing 8 for a in answer 9 emit(a["question"], a["response"]) the results. 10 end 11 end MANREDUCE 12 end 13 In order to make Dormouse readily usable for complex data- 14 reduce :step =>"Average",:name =>"avg_reduce" intensive applications, we specify a programming frame- work called ManReduce, based on the simple functional Figure 3: Survey ManReduce Script. programming paradigm and resource management scheme used by MapReduce [4]. ManReduce shares some concep- tual similarities with the independently conceived Crowd- ManReduce can be run simply on the command line, and it is Forge [13], but ManReduce has some important advantages then deployed it on the Dormouse server. Once all the tasks to both MapReduce and Crowdforge that we discuss below. have been completed, ManReduce will write the output to a destination file and terminate the process. Note that ManRe- Design Goals duce automatically serializes all worker answers, allowing us With ManReduce, while providing power and flexibility, we to re-run (and even debug) a script many times, automatically aimed to maintain a conceptually simple design that can be using the available answers to compute new (or corrected) rapidly understood and easily used. In addition, ManReduce statistics without re-doing human tasks. Thus, designing, has three key features that, in combination, allow us to de- coding, and executing ManReduce programs is straightfor- velop a wide range of applications for social computation: ward. the ability to dispatch jobs both to machines and people, to MapReduce with both humans and machines. In the ManRe- utilize social structure and choose which people to whom to duce framework, either people or machines can perform the send jobs, and to introduce arbitrary dependencies between work necessary for both map and reduce steps. This allows multiple map and reduce steps. These features, in particular, users to implement complex applications which interleave are absent in both CrowdForge and MapReduce. We discuss both human and machine steps. each core contribution below. Many computation tasks fit naturally into this hybrid ManRe- Conceptual simplicity and ease of use. We begin with a duce paradigm. For example, ManReduce makes it easy clean conceptual foundation for our framework. Like in to specify a human-assisted information extraction system. MapReduce, a program is broken down into steps that are This application has a variety of uses for large corpora of each written as a map or a reduce . A ManReduce map step documents, where automatic information extraction cuts down consists of a set of small, equivalent chunks of work, per- on human work significantly, but alone does not always pro- formed in parallel when possible on independent inputs, pro- vide sufficient accuracy. A machine map step could specify ducing outputs in the form of key-value pairs. A reduce step automatic information extraction of facts from a set of sci- collects several input items (according to a shared key), and entific papers, such as genetic and environmental risk fac- performs some computation over all of them to produce a fi- tors for a set of candidate diseases. Then, a human reduce nal output. These two steps can be used to encode a wide step could aggregate the proposed facts about each disease, range of parallelized computational applications. check their accuracy and convert them into a summary. The ManReduce code for this example is shown in Figure 4. In addition to the conceptual simplicity, our particular im- plementation is easy to use and allows a programmer to cre- In the example explored in Figure 4, the reduce step includes ate full applications in few lines of code. The ManReduce asking people to aggregate and summarize a set of facts. As framework is written in Ruby2. Internally, map and reduce shown on line 12, this was accomplished by instantiating a are convenience wrappers that instantiate a Ruby Step ob- Dormouse Task object in Ruby and calling the ask method. ject that accepts an anonymous function. A simple ManRe- Custom human map and reduce steps can include any of duce script, shown in Figure 3, conducts a survey (utilizing the human tasks available through the Dormouse libraries, or the Dormouse function Survey). In this example, a map step custom human functions. Likewise, machine map and reduce survey map sends the survey to human workers, and a re- steps can call functions from existing Ruby libraries and cus- duce step avg reduce takes the mean of their responses tom libraries, as used on line 2. In addition, the ManRe- to each question. Notice that the user does not need to write duce libraries include a range of pre-defined map and reduce any scaffolding code, such as parsing input or writing output. steps, including image labelling, ranking items from a list, The appendix includes the full source code of Survey and and free text-entry. The built-in libraries, and the ability to Average define and share custom steps, provide a large and grow- ing codebase with which to easily create new ManReduce 2We are also working on Python and Java implementations. scripts. 1 map :name => :extract_disease_facts do |key, 1 map :name => "collect_survery" do |key,value| value| 2 ... 2 facts = RiskExtractor.extract (value) 3 end 3 4 4 for fact in facts do 5 reduce :name => "sort_by_gender", :from => " 5 emit (fact["disease"], fact["risk_factor" collect_survey" ]) 6 ... 6 end 7 end 7 8 8 end 9 reduce :name => "sort_by_age", :from => " 9 collect_survey" 10 reduce :name => :summarize do |key, values| 10 ... 11 11 end 12 task = SummarizeFacts.prepare 13 :task_name => "Summarize disease risks: Figure 6: Complex ManReduce Dependencies #{key}" 14 task.facts = values 15 16 task.ask do |answer| the original MapReduce model, ManReduce allows arbitrary 17 emit (key, answer) chaining of map and reduce steps, similar to the (ma- 18 end chine computational) framework [12]. Not only can a pro- 19 20 end grammer specify multiple map and reduce steps in arbitrary order, she can also specify several different reduce steps that Figure 4: Human-Assisted Information Extrac- operate on the output of a single map step, or have a map tion step follow a map step directly. The ability to define func- tions on a generalized graph rather than a single map and reduce holds particular importance in the social computing 1 task = ImagePersonTask.prepare domain. For instance, even a simple case like two sequential 2 :task_name => "Tag person: #{key}", 3 :replication => 5 map steps may not fit naturally into single map, if a human 4 step follows a machine step, or if a one human step is fol- 5 task.worker_namespace = "facebook" lowed by another with different worker expertise constraints. 6 task.worker_predicate = Predicate.parse([" friends CONTAINS ?", key]) In ManReduce, each Step (map or reduce ) may have mul- tiple parents and multiple children. A step receives inputs Figure 5: User Specification with Predicate from its parents and sends its output to each of its children. By default, ManReduce infers dependencies by the order in Utilization of social structure through Dormouse. ManRe- which they are defined – each Step’s parent is assumed to duce takes advantage of the social structure and worker pro- be the Step immediately preceding it in the source code. To files provided by Dormouse, finding people whose attributes specify complex dependencies, we provide the from spec- match those needed for a particular task. This is a natural ex- ifier in the man and reduce declarations, which specifies tension from MapReduce, which allows specification of ma- the name of the step whose output to use as input, as shown chines by their properties, such as processor speed or mem- in Figure 6. ory. Using functionality from Dormouse, we can specify that certain map or reduce tasks be dispatched only to people An interesting example with complex control flow is privacy- with graduate degrees in biology, or expertise in computer preserving journal transcription. Many researchers and de- science, or simply to people under 25. signers keep handwritten notebooks, which would be use- ful to digitize and make searchable. Since OCR has limited Adding identities and relationships opens many possibilities accuracy with handwriting, we use human transcription, but in human computation. For instance, accurately tagging peo- split each page up into small chunks to make the tasks man- ple in photographs is important for image search engines. ageable and reduce the likelihood of revealing sensitive con- Using current techniques, search engines can identify a set of tent. A ManReduce program for this application begins with images and candidate names associated with each, but many a machine map step that “shreds” each scanned page into pictures contain multiple people and many names correspond small overlapping images, each of which contains just a few to several real people. The people in these photos can, how- words. A human map step then classifies the snippet as to ever, be identified quickly and accurately by their friends. whether it contains text, an image, or an equation. The result By identifying Facebook users according to each name, we is sent to three human transcription map steps, one which could define a map step that asks friends of each user to judge is public and operates on the text, and two that are sent to whether an image contains their friend. people who are proficient in latex or illustrator to transcribe A user specification is used in ManReduce by simply setting figures and equations. A human reduce step then votes on the worker predicate and namespace properties of a Task ob- the best transcription for each snippet, and a machine reduce ject before it is asked. The specification is created using a step then pastes the chunks from a single page back together Predicate object, as shown in Figure 5. into a PDF. Optionally, a final human reduce step, perhaps restricted to a trusted set of workers (e.g. people from your Complex ManReduce dependency graphs. Going beyond own institution or even just yourself) is used to rapidly ver- ural to think in terms of reusing and assembling basic build- Trusted ing blocks. It also requires some knowledge of advanced Classify as text snippet, Shred uploaded scan figure, or equation programming constructs (for example, ManReduce uses call- back functions for human steps). To address these issues, we developed Dog, a high-level pro-

Qualified Public Qualified cedural programming language that sits on top of ManRe-

Transcribe figures and add duce and focuses on reusability, maintainability, and ease-of- Transcribe text in snippets Transcribe equations captions use. Our approach of writing a high-level language that com- piles into ManReduce is inspired by similar techniques in the

Public large-scale data-processing world. For example, Pig [19] and

Vote on selection within Sawzall [21] are high-level languages built on top of MapRe- accepted transcriptions duce, and Nebula [12] and DryadLINQ [24] are high-level languages built on top of Dryad. Key: Map Design Goals Combine transcribed segments Reduce In creating Dog, we had three main design goals. First, we wanted to make Dog a highly expressive language, so that Figure 7: Illustration of dependency graph in the Jour- even people with little knowledge of programming languages nal Transcription Application could understand and write a Dog program. This is an es- pecially appropriate design goal for a social computing lan- ify the complete output of several pages at a time and output guage, where many of the constructs involve specifying peo- the final document. The control flow for this application is ple and asking them to do something. Such constructs are un- shown in Figure 7. derstood even by non-programmers, and having a language that reflects the natural way that people express these re- Discussion quests would make social computation accessible to a broad While ManReduce takes inspiration for its name from MapRe- audience. duce, it takes inspiration for its design from a number of parallel programming frameworks, including MapReduce, Our second goal was reusability. In traditional MapReduce, Dryad [12], and GPU languages [22]. Each of these many programs are written by combining pre-existing maps frameworks was developed in an environment characterized and reduces in ad hoc ways. In human computation, there by data-intensive applications and the availability of paral- are a large number of common patterns, for example, hu- lel computing resources. This characterization also holds for man verification, human voting, machine summarization of many applications in human computation, for example, in human inputs. We wanted to make it very easy for Dog pro- image processing or machine learning. grammers to express and combine these common patterns. At the same time, there are some applications where ManRe- And finally, we wanted to achieve these goals without dimin- duce is not suitable, for example, for using a single worker ishing the efficiency, power and flexibility of ManReduce. in the crowd to control a robot [14]. Data-processing ap- We achieve the first two goals as follows. We define a large plications are well-suited to ManReduce, while real-time or number of library functions that express common human and single-worker sequential applications are not. machine functions, such as the human functions Vote, Label, In fact, while we believe MapReduce-based models are com- Compare, Extract, and Answer, and the machine functions pelling for variety of human computation applications, we Histogram, Filter, Median, and Sort. Dog then contains a set recognize that other paradigms may be more appropriate for of easily-understandable primitives for (a) human and ma- certain tasks, or preferred by some programmers. Dormouse chine resource allocation and (b) and parameterization and makes it possible to implement other such frameworks, and execution of these library functions. we hope to see a variety of paradigms for social computation This approach abstracts away not only the code for paral- implemented for use with Dormouse. These programming lelization, but also the code for defining human or machine paradigms could take advantage not only of the features pro- functions, and lets the programmer focus on defining the con- vided by Dormouse, but also the wide array of human tasks 3 trol flow. This is an exciting feature of Dog that will be- that are written for Dormouse . come apparent in the code samples later in this paper. Dog scripts are very easily readable and writable. Even with just DOG the functions that are included in default Dog libraries, a While ManReduce is flexible and powerful, one drawback is programmer can write a large number of powerful programs that it can be too low level for many applications. ManRe- simply and compactly. duce requires programmers to think in terms of explicit map and reduce steps, even in common cases where it is more nat- To achieve our third goal of maintaining the power and flex-

3 ibility of ManReduce, we allow programmers to write their As an analogy, languages like Scala [18] and Clojure [10] are implemented own libraries of human and machine functions in ManRe- for use with the Java Virtual Machine, allowing these languages to make use of the platform independence of the JVM as well as the function libraries duce, and import those libraries into Dog programs. In this written for Java. sense, our design of Dog gives the same power of ManRe- duce behind the scenes, while still achieving our goals of it, and (optionally) a set of parameters for the human task, expressivity and reuse. and (optionally) a data set on which the human task should operate. The Dog compiler is implemented as a recursive descent parser that parses Dog programs and generates ManReduce So for example, a Dog programmer may write: code. The Dog standard library functions are simply wrap- labels = ASK workers TO Label ON image_data USING pers around mappers and reducers in the ManReduce stan- layout='game' dard library. The Dog command-line utility includes conve- nience methods to compile and deploy Dog programs in a Each human function has default parameters, so unless a pro- single step. grammer wants to change these default parameters, she can omit the USING clause. Further, programmers can omit the Language Specification ON clause for human tasks that don’t take input data. At its core, Dog is organized around four high-level language primitives: Dog inherits a number of human tasks from Dormouse, for example: Vote, Label, Compare, and Answer. Programmers PEOPLE, which specifices the type of people to perform may also create libraries of other human functions for their some function own use and reuse by others in Dormouse, and import them ASK, which asks a group of people to perform some human into Dog. function FIND, which instantiates those people Compute COMPUTE, which asks a set of machines to perform some The COMPUTE command executes a machine function. COMPUTE function takes as its arguments a machine function, a data set upon as well as default libraries that include a number of human which the function acts, and (optionally) any additional pa- tasks (such as Label) and computational steps (such as rameters required by the function. For example: Histogram). tag_cloud = COMPUTE TagCloud ON words USING For example, a simple Dog program to review UIST submis- color_scheme = 'random' sions and compute a tag cloud on words in the reviews can Like human tasks, the Dog standard libraries inherit a num- be written: ber of machine functions from Dormouse, including Histogram, students = PEOPLE FROM facebook WHERE university = Average, and Filter. Additionally, Dog programmers 'mit' AND degree = 'computer science' may create libraries of machine functions for their own use reviews = ASK students TO Review ON and reuse by others using Dormouse or ManReduce. uist_submissions USING payment = 0 and replication = 3 Find words = COMPUTE Split ON reviews histogram = COMPUTE Histogram ON words In some cases, a Dog programmer may want to instantiate a specification of people independently of the ASK function. People For example, she may be interested in computing summary The PEOPLE command returns a specification of people. demographic information on a given Dormouse community. The common use case for the PEOPLE command is to spec- The FIND command does this. For example, the code snip- ify a certain type of people to perform a given task. PEOPLE pet: requires a FROM clause that specifies the Dormouse commu- workers = PEOPLE FROM gates WHERE expertise nity or crowdsourcing service from which the people will be CONTAINS 'machine learning' selected. For example: ids = FIND workers workers = PEOPLE FROM mechanical_turk; returns the Dormouse ids of machine learning experts in the gates community. FIND may also be used to return people will return a specification for mechanical turk workers. who have successfully performed a task. For example: workers = PEOPLE FROM facebook Each Dormouse community defines properties on the peo- labels = ASK workers TO Label ON data ple in the community, and these properties can be accessed workers_who_labeled = FIND PEOPLE FROM labels through the WHERE clause. For example, a Dog programmer may write: Data Model Dog is designed to support sequential transformations on workers = PEOPLE FROM gates WHERE expertise large-scale data, either by parallel human or machine func- CONTAINS 'theory' AND advisor='don knuth' tions. Dog is also designed to make it easy to express control flows that involve selecting people and specifying tasks for Note that the PEOPLE command doesn’t return actual per- them to perform. son ids; it returns a worker specification, stored internally as a predicate. The specification is instantiated when ASK or As such, Dog supports two primary data types: people spec- FIND is called on the specification. ifications and data maps. A people specification is returned by the PEOPLE command, and is stored internally as a pred- Ask icate4. The ASK command executes a human function. It takes as arguments a human task, a specification of people to perform 4For example, the Dog code PEOPLE FROM facebook A data map in Dog is expressed as a wrapper around a key- variable values, and the INSPECT command prints a small value store. Key-value stores lend themselves naturally to selection of large maps for debugging purposes. parallelism, and crowdsourcing is by its nature parallel. They also lend themselves well to serialization, which is an impor- A convenient feature of Dog is that the primitives are com- tant part of Dog, especially as human steps can be expensive posable, allowing Dog programmers to produce compact, to re-run, and intermediate steps are often too big to fit in readable code. For example, the following code snippet memory. And finally, having all data items be a key-value workers = PEOPLE FROM mechanical_turk store helps our goal of simplicity, as new human and ma- preferred_candidates = ASK workers TO Vote ON chine functions can be easily written in ManReduce, and the candidates output of any data transform can be used as the input to any other data transform. can be rewritten as: workers = ASK PEOPLE FROM mechanical_turk TO Vote An important design feature of Dog is that all data returned ON candidates by human functions retains information about who performed that function. This is a useful feature in a number of contexts, Composability can be arbitrarily complex, and entire pro- including reputation updating and routing data based on so- grams can be written in one line: cial relationships. COMPUTE ranking ON (ASK PEOPLE FROM mechanical_turk TO Vote ON candidates) Routing Tasks A key feature of Dog is the ability to route tasks to workers Debugging Environment based on expertise, demographic, or social structure. While A challenge when writing programs with human tasks is that FROM handles basic routing, some applications require rout- it can be expensive to test and debug, as human workers may ing based on properties of each individual task or data item. need to be paid, or may get frustrated by errors in their task The SUCH THAT clause enables such routing. In the follow- templates. This discourages programmers from iterative im- ing example, we show a program where students are asked to provements to their programs. A second challenge to de- review papers in their areas of expertise, and their advisors bugging is that many times, programs will need to run over are asked to validate them, matching the particular reviewer large amounts of input data (for example, labeling a large with her advisor for each paper. image corpus) that take time. What programmers will often students = PEOPLE FROM gates do is create separate truncated input data sets for debugging, reviews = ASK students TO Review ON which requires additional work. uist_submissions SUCH THAT uist_submission. topic IS IN student.areas_of_expertise The Dog command-line utility has a debug mode that ad- reviewers = FIND PEOPLE FROM reviews dresses both of these issues in simple but effective ways. In advisors = PEOPLE FROM gates WHERE advisees CONTAINS reviewers debug mode, Dog programs are deployed on a local version validated_reviews = ASK advisors TO Validate ON of Dormouse, which uses the programmer’s local machine reviews SUCH THAT review.reviewer = advisor. as the Dormouse master, and routes tasks to the programmer student or whomever the programmer specifies. Second, in debug mode, input data is automatically truncated to a reasonable Function Libraries and Input Data size based on a number of heuristics. To create a library, a programmer simply creates a directory with a .doghouse extension that contains the appropriate Dor- This allows programmers to easily test out their control flow mouse and ManReduce program files that define the human and human functions on a small set of people and data before and machine functions. To import a library, a Dog program- deploying. mer uses the REQUIRE command in the header, followed by a path to the Doghouse file, as well as an optional library EXAMPLES namespace. For example: Bootstrapping Recommendations As a case study of the Dormouse framework, we developed REQUIRE "/dog/lib/statistics.doghouse" AS stats a prototype pipeline for personalization of product offerings (such as Groupon deals). Based on evaluating demographic To import a data file, a programmer uses the IMPORT com- preferences for deals, this pipeline would enable significantly mand in the header: improved targeting of deals to people, but without the need IMPORT "/dog/lib/image_file.js" AS images for large amounts of proprietary usage data. Small compa- nies and brand-new services rarely have access to the volume Other Dog Features of usage data needed for standard personalization approaches Dog supports a number of other commands as well. Joining such as collaborative filtering, making this an exciting use of of communities and data is supported through the MERGE, human-powered computation. SHIFT, UNSHIFT, and CROSS commands. The PARAMETERS command is a convenience command that encapsulates pa- For this use case, we used the facebook community through rameters for the ASK function. The PRINT command prints Dormouse, allowing us to easily access demographic infor- mation (location, gender, and age) for each of the people con- WHERE gender = "female" is equivalent to the Ruby code: tributing preferences, in addition to the friendship graph for Condition.new(["facebook","gender"],"female","=") people. We collected a set of 500 Groupon deals, and paired bilities that an individual will agree with each of their demo- graphics. Also, we can can incorporate a feedback loop, in which deals with uncertain ranking are re-submitted for fur- ther user feedback. Additionally, and importantly, we can in- corporate the friend graph directly, asking users which deals they believe their friends would prefer, and up-weighting agreements5.

Male-preferred Female-preferred Artisan Cheese Experience ** Manicure or Deluxe Mani/Pedi ** Shooting range, Dinner, Drinks ** Paddleboard Rental and Lesson ** Chiropractic, Massage, or Allergy Keratin Treatment ** Treatment ** Panoramic Wall Mural ** RedAwning.com (Vacation)** Wine Tour and Tasting ** Custom Massage *

Figure 9: Deal preferences by gender. ** p < .01;*p < .05 by Fisher’s exact method.

Medical Image Analysis A second application of Jabberwocky is to use people to im- Figure 8: Deals Application prove the performance and evaluation of a machine learning framework for complex medical data. This case study ex- them randomly, creating 12,500 total pairs. We asked work- plores the role of expertise in human computation. ers from the facebook community to state which of the two deals in a pair they preferred, and why. Three hundred In medicine, a common method of clarifying the subcellular workers performed an average of 42 comparisons each. Note location of proteins is immunohistochemical staining, where that soliciting ratings of individual deals would be problem- flourescently tagged antibodies are introduced into a tissue, atic, as there is no absolute scale of quality or interest. So- binding with the protein the scientist wants to localize. The liciting preferences between pairs is a simple and robust way resulting images are then evaluated by pathologists. This around this problem. At the end, we collected a set of pref- technique is used in a wide range of applications, from ex- erences for any subset of people (of a certain age group, for ploring gene function in Parkinson’s [25] to diagnosing can- instance), and using ordering by the number of “wins” for cerous tumors. each item, produce an approximate ranking of all deals for There are large quantities of immunohistochemical stains that demographic. We showed ManReduce code in Figure 6 generated each year. Because of this, some researchers have illustrating components of this pipeline, so here we show a started to explore image-recognition techniques to analyze simple Dog script that could be used, requiring even fewer these stains. However, these techniques are not often used in lines of code (Figure 10). practice, because the idiosyncracies in how staining methods This pilot study demonstrated the convenience and simplic- work make it difficult for a single image-recognition algo- ity of the Jabberwocky framework. With just a few lines of rithm to work across many different stains. code and minimal setup, we specified the entire pipeline. In We used Dog to write a program that routes IHC stains to addition, we gained evidence that product targetting could generalists to localize the stains via a human image-segmentation benefit from such a system, as the preferences did indeed function in Dormouse (Figure 11), and then routes those non- vary noticeably by demographic. For example, the top five expert localizations to experts to perform a faster validation deals prefered by males over females, and the top five deals step. The expert validation feedback was then rerouted to the prefered by females over males, are shown in the table at original generalists, who could use the feedback to improve. the end of this section. Anecdotally, we noticed that the fe- The validated localizations can then be used as input into a male raters preferred many deals related to beauty, fitness, machine learning algorithm. An extension of this would be and home improvement, while the male raters often preferred that as the generalists got more right, they increased in repu- dining and hobbies. We also note that the male and female tation in the system. rankings showed low Spearman correlation (r =0.24), com- pared with correlation between rankings segregated by a ran- What is interesting is that what we originally built as a tool dom split (r =0.71). The difference between these empirical to aid machine learning ended up also aiding human learn- correlations is highly significant (p<1e 4). ing. One can imagine that such systems can get even more − nuanced, with active learning algorithms that have input as To target deals to a particular individual, we can combine the to what labeling tasks to route to which people. We call this preferences according to each relevant demographic (such as social machine learning, where social learning systems inter- “male” and “18-25” and “San Francisco”) using a probabilis- act with the machine learning systems, and both benefit from tic noisy-or model [5]. This would combine the probability of each demographic segment liking a deal with the proba- 5reminiscent of the Newlywed Game. 1 #!/usr/bin/env dog 2 3 IMPORT "deals.js" AS deal_pairs 4 REQUIRE "rank_deals.doghouse" 5 6 CONFIG title = "Compare Deals" 7 CONFIG description = "..." 8 9 answers = ASK PEOPLE TO RankDeals ON deal_pairs 10 11 age = COMPUTE Projection ON answers USING key_name = "age" AND value_name = "deals" 12 gender = COMPUTE Projection ON answers USING key_name = "gender" AND value_name = " deals" 13 ethnicity = COMPUTE Projection ON answers USING key_name = "ethnicity" AND value_name = "deals" 14 education = COMPUTE Projection ON answers USING key_name = "education" AND value_name = "deals" 15 16 age_ranking = COMPUTE PairwiseRank ON age 17 gender_ranking = COMPUTE PairwiseRank ON gender Figure 11: Immunohistochemical Staining Application 18 ethnicity_ranking = COMPUTE PairwiseRank ON ethnicity vice through its marketplace drivers. However, there are 19 education_ranking = COMPUTE PairwiseRank ON also many key differences; for example, hProc does not have education mechanisms for interleaving human and machine computa- Figure 10: Deals Application in Dog tion or abstracting away details of parallel processing. More substantively, since hProc has not been implemented, the de- the other. We believe that this will be an area with much sign decisions are described as high-level proposals. opportunity. Much of the Jabberwocky software stack has taken its inspi- RELATED WORK ration from related constructs in traditional and parallel com- A number of programming frameworks for human computa- puting. The Dormouse Virtual Machine is inspired in part tion have been introduced in recent years. Crowdforge [13], from the Java Virtual Machine [15], a platform-independent a MapReduce-inspired framework that was developed simul- execution environment that converts Java bytecode into ma- taneously but independently from ManReduce, defines par- chine language and executes it. In practice, Dormouse is not tition, map, and reduce steps, and allows nested map and a true virtual machine in that it operates on top of crowd- reduce steps. TurKit [16] is a toolkit for deploying itera- sourcing platforms rather than microprocessor architectures. tive tasks to Mechanical Turk that maintains a straighforward In this sense, it is perhaps more reminiscent to Google’s procedural model. Soylent [2], a word-processing interface Global Workqueue, or some of the cluster management pro- that calls Mechanical Turk workers to edit parts of a docu- tocols used in scientific computing such as Parallel Virtual ment on demand, introduces the Find-Fix-Verify crowd pro- Machine [15] or MPI [7]. ManReduce takes its inspira- gramming pattern, which splits tasks into a series of gener- tion from MapReduce [4], Dryad [12], and GPU shader lan- ation and review stages. Each of these specifies and imple- guages [22]. And Dog takes its inspiration from Pig [19], ments a design pattern rather than building a full stack, and Sawzall [21], Nebula [12] and DryadLINQ [24]. The popu- are platform dependent. larity of these existing tools suggests that the paradigms we present here will be useful and natural for many program- Recently, a trio of declarative query languages for human mers. computation have been proposed: hQuery [20], CrowdDB [6], and Qurk [17]. These languages view crowdsourcing ser- CONCLUSION vices as databases where facts are computed by human pro- To date, the programming frameworks for crowd comput- cessors. These languages are different from Dog in that ing have been single-platform frameworks. Further, the pro- they are declarative rather than imperative. The imperative gramming frameworks for crowdsourcing have viewed the model of Dog is particularly important for social computa- crowd as a collection of largely independent and interchange- tion, where we want the programmer to be able to specify able workers, rather than an ecosystem of connected, hetero- the type of person who will compute the result, not just the geneous people. And finally, despite a clearly differentiated desired outcome. domain, no domain-specific programming languages have been developed for social computing, requiring programmers Heymann et. al propose hProc [9], a programming environ- to define control flows for people in languages designed for ment that focuses on modularity and reuse. It shares with computers. Dormouse the notions of easy reuse of human function tem- plates, and of abstracting out the specific crowdsourcing ser- The Jabberwocky software stack represents a step forward in the tools available to programmers for social computation. 19. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. A programmer may deploy a non-trivial application in Dog Pig Latin: A Not-So-Foreign Language for Data Processing. In Proc. SIGMOD (2008). without having to build labor-intensive sociotechnical infras- tructure, allowing developers to easily tap into people and 20. A. Parameswaran and N Polyzotis. Answering Queries using their heterogeneous skillsets in an organized manner. Jab- Humans, Algorithms and Databases. Technical report. berwocky puts a wide range of possibilities for data-intensive 21. R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpret- applications within reach of a broad class of developers, and ing the data: Parallel analysis with Sawzall. Scientific Pro- we believe it holds the potential to change the way program- gramming, October 2005. mers interact with people using code. 22. D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data par- allelism to program GPUs for general-purpose uses. SIGOPS REFERENCES Operating Systems Review, October 2006. 1. S Bamford and et al. Galaxy Zoo. 23. L. von Ahn. Games with a Purpose. Computer, June 2006. 2. M. Bernstein, G. Little, R.. Miller, B. Hartmann, M. Acker- man, D. Karger, D. Crowell, and K. Panovich. Soylent: a 24. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U.´ Erlingsson, word processor with a crowd inside. In Proc. UIST (2010). P. Gunda, and J. Currey. DryadLINQ: a system for general- purpose distributed data-parallel computing using a high-level 3. S Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee, M. Bee- language. In Proc. OSDI (2008). nen, A. Leaver-Fay, D. Baker, and Z. Popovic. Predicting pro- tein structures with a multiplayer online game. Nature, June 25. L. Zhang, M. Shimoji, B. Thomas, D. Moore, S. Yu, 2010. N. Marupudi, R. Torp, I. Torgner, O. Ottersen, T. Dawson, and V. Dawson. Mitochondrial localization of the Parkinson’s dis- 4. J. Dean and S. Ghemawat. MapReduce: simplified data pro- ease related protein DJ-1. Human Molecular Genetics, June cessing on large clusters. Communications ACM, January 2005. 2008. Appendix 5. FJ Diez. Parameter adjustment in bayes networks. the gener- alized noisy or-gate. In Proc. UAI (1993). The source code of Survey and Average from Figure 3. 6. Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti survey.rb: Ramesh, and Reynold Xin. CrowdDB: answering queries with crowdsourcing. In Proc. SIGMOD (2011), pages 61–72. 1 class Survey < ManReduce::Task 2 def render 7. William Gropp, Ewing Lusk, Nathan Doss, and Anthony 3 include_file("survey.erb") Skjellum. A high-performance, portable implementation of 4 end the MPI message passing interface standard. Parallel Com- 5 puting, 1996. 6 def process_response(response) 7 answers = [] 8. B. Hartmann. Amazing but True Cat Stories. 8 answers << {"rating" => response["rating"]} http://bjoern.org/projects/catbook/, April 2009. 9 answers << {"length" => response["length"]} 10 return answers 9. P. Heymann and H. Garcia-Molina. Human processing. Tech- 11 end nical report. 12 end 10. R. Hickey. The clojure programming language. In Proc. DLS (2008). survey.erb: 11. D. Horowitz and S.D. Kamvar. The anatomy of a large-scale 1 12. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: 4 distributed data-parallel programs from sequential building 5 blocks. SIGOPS Operating Systems Review, March 2007. 6 10 14. W. Lasecki, K. Murray, S. White, R. Miller, and F. Bigham. Legion: closed-loop crowd control of existing interfaces. In Proc. UIST (2011). average.rb: 15. T. Lindholm and F. Yellin. Java Virtual Machine Specification. 1 class Average < ManReduce::Reduce Addison-Wesley Longman Publishing Co., Inc., 2nd edition, 2 def reduce(key, values) 1999. 3 count = values.length 4 sum = values.inject(0) {|sum, x| sum += x} 16. G. Little, L. Chilton, M. Goldman, and R. Miller. TurKit: 5 if count == 0 then Tools for Iterative Tasks on Mechanical Turk. In Proc. 6 emit(key, 0) HCOMP (2009). 7 else 8 emit(key, sum / count) 17. A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowd- 9 end sourced Databases: Query Processing with People. In Proc. 10 end CIDR (2011). 11 end 18. M. Odersky, P. Altherr, V. Cremet, B. Emir, S. Maneth, S. Micheloud, N. Mihaylov, M. Schinz, E. Stenman, and M. Zenger. An overview of the Scala programming language. Technical report, EPFL Lausanne, Switzerland.