Zebralancer: Decentralized Crowdsourcing of Human

Home , Crowdsourcing

arXiv:1803.01256v5 [cs.HC] 7 May 2020 • • nte3t EEIDS ina uti,Jl 2018. July Austria, Vienna, ICDCS, thi of IEEE version 38th abridged the An in 2018. xx, Dec received Manuscript lye 1]o takr 1]cncmrms t function- its em- compromise its can of [15] mis- some attackers silently or, or might [13]; [14] self-interests ployees party real- for the numerous in-house that First, behave crowdsourcing. reveal same of incidents the case and world practice, the in for desirable goes is third-party trusted a payment). the d repudiate to (i.e. try “false-reporting” requesters and honest the efforts) real by rewa making reap hindered without workers be dishonest effec- (i.e. can “free-riding” the mechanisms so-called otherwise, incentive rewards; of the tiveness the and between hos data exchange to fair crowd-shared the third-party fulfill trusted to tasks state-of-the a crowdsourcing real requires the make mechanisms, necessarily to these solution workers facilitate motivate To to efforts. [6–12] mechanism introduced use incentive were monetary to Various data-drive 5]. fostering “workers”) [4, information applications (called gather individuals to devices of mobile group can a “requester”) is (called request example one where notable [3] crowdsensing Another mobile [2]. marketplace, (MTurk) crowdsourcing Turk Amazon’s Mechanical via created challenge an- was ImageNet of famous [1] of solicitation benchmark the the data: is notated example remarkable One Internet. EETASCIN NXX,VL X O X X 20XX XXX XX, NO. XX, VOL. XXXX, ON TRANSACTIONS IEEE I 1 toecmstentrltninbtenaoyiyadacco b and use anonymity can between cues u system tension our visual camouflage natural in the (ii) anonymity (i) overcomes delicate and it The were: dazzle, identify. strips to motion peers zebra by predators of confuse hypotheses popular Two h uhr r ihteDprmn fCmue cec,Yin N Science, Technology, of Computer Institute E-mail: of Jersey USA. Department 07102, New the Computing, of with College are authors The erLne:DcnrlzdCodorigof Crowdsourcing Decentralized ZebraLancer: ti elkonta h euto fterlac on reliance the of reduction the that well-known is It the over collaboration open empowers Crowdsourcing uhniainshm ih eo needn neet Fi interest. independent of be might scheme authentication owr oe rporpi ocp,ie,the a i.e., anonymous concept, stays cryptographic he novel else a or forward submissions, the link can anyone accountabil rewards. the reap sacrificing to without times requests/submissions multiple submit h to participation workers their malicious Second through blockchain. requesters open and the (ii workers to blockchain; about data the the to leaking data without submits also he if policy, the on based elyi nats e fEhru.Teeprmn eut sh Terms Index results experiment The Ethereum. of net test a in it deploy aadsre codn oaplc none hnhrts i task her when announced policy a to according da deserve, of data requirements utilities/fairness basic the guarantee vroetofnaetlcalne fdcnrlzn cro our decentralizing First, of challenges fundamental two overcome Abstract NTRODUCTION ua nweg tpOe Blockchain Open atop Knowledge Human W einadipeettefirst the implement and design —We outsource-then-prove Bokhi,codorig ofietaiy anonymity, confidentiality, crowdsourcing, —Blockchain, { l6,qag gwang qiang, yl768, ehdlg eovstetninbtentebokhi tr blockchain the between tension the resolves methodology } @njit.edu unL,QagTn n uln Wang Guiling and Tang Qiang Lu, Yuan rvt n anonymous and private common-prefix-linkable ae appeared paper s untability. nlg as analog, e acodorig hsesrn:()arqetrwl o pa not will requester a (i) ensuring: thus crowdsourcing, ta wr,NJ, ewark, t.Teie eidi utelnaiiy fawre subm worker a if linkability: subtle a is behind idea The ity. yherd by d soy ipyealn nnmt sseigyattempting seemingly is anonymity enabling Simply istory. al,w mlmn u rtclfracmo mg annotati image common a for protocol our implement we nally, )teaoepoete r elzdntol ihu centr a without only not realized are properties above the i) dorig .. aalaaeadiett breach. identity and leakage data i.e., wdsourcing, erLne looecmsti rbe yalwn anonym allowing by problem this overcomes also ZebraLancer e to sed Wu g h rnprnyo lccanalw n oifrprivate infer to one allows blockchain of transparency the , wteapiaiiyo u rtclao h xsigreal- existing the atop protocol our of applicability the ow -art rds ulse i h lccan i)ec okride gets indeed worker each (ii) blockchain; the via published s is- dulnal costss oraieti eiaelinkab delicate this realize To tasks. across unlinkable nd n s t ✦ isoe ate.ZbaacrcnihrttePPnetwork ato P2P blockchain infrastructure. the building underlying and the inherit of as can participate way blockchain ZebraLancer ambitious to less parties. party a missioned any to opposed allows blockc as that open/permissionless/public network An blockchain paper. this through the is us to [17]. Uber lesson of fresh worry leakage most the data A a tremendous increases breach. least, also privacy not tasks massive but of all Last hosting [4]. platform 2010-2013 central scheduled during 11 outages and downs service server unexpected 3 app, sin map from crowdsourcing the suffered a of Waze, plat- example, vulnerabilities For centralized failure. the point a all biased data Third, inherits is inevitably [16]. collect platform form workers to the over because chance requesters MTurk, on good Fo at disputes. a paying resolve without have to fails requesters often instance, party the Second, ality. pcfidts.I eoe niigt ul a build to atop. enticing platform becomes faithfully crowdsourcing It to that related task. deliveries specified computer” message a and “decentralized computations c a all contract handles smart as smart of viewed the Essentially, application P2P be enabled. exotic all is more by [18] code, a verified contract program and hence, enforced be peers; is can network which block of execution each the pre- in a contained interestingly More to messages message Internet. untrusted according reliable the ensures via network deliveries This whole protocol. consensus the and defined peers, by network will by validated by block replicated be committed Each and messages a collectively. some managed network include as usually (P2P) is organized peer-to-peer It board” a blocks. “bulletin of chain immutable and parent accountability. eetaie rwsucn ytmZbaacr and ZebraLancer, system crowdsourcing decentralized nnmu uhniain ermr h e anonymous new the remark We authentication. anonymous 1 ncnrs,a pnblockchain open an contrast, In ermr htbokhi sue orfroe blockchain open refer to used is blockchain that remark We nprnyadtedt ofietaiyto confidentiality data the and ansparency 1 oeta what than more y sadsrbtd trans- distributed, a is t wc oatask, a to twice its u ilallow will but ol blockchain. world labtr but arbiter, al information lt,w put we ility, nts and task on ous payment a decentralized ani a is hain fopen of per- p leave, the , gle an r 1 IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 2

Unfortunately, this new fascinating technology also crowdsourcing system2. Without relying on any third-party brings about new privacy challenges [19], which were never information arbiter, our protocol can still guarantee the that severe in the centralized setting before and could even faithful execution for a class of incentive mechanisms, harm the basic system utilities, as one notable feature of the once they are announced as pre-specified policies in the blockchain is its transparency. The whole chain is replicated blockchain. More importantly, we also protect confidential to the whole network to ensure consistency, thus the data data and identities from the blockchain network, while the submitted to the blockchain will be visible to the public. underlying blockchain is auditing the correct execution of crowdsourcing tasks. Specifically, First, the blockchain transparency causes an immediate 1) A blockchain based protocol is proposed to realize problem violating data privacy, considering that many of decentralized crowdsourcing that satisfies: (i) the fair the crowdsourced data maybe sensitive. For example, even exchange between data and rewards, i.e., a worker in the intuitively “safe” image annotation tasks, if there are will be paid the correct amount according to the pre- some special ambiguous pictures (e.g. Thematic Appercep- defined policy of evaluating data, if he submits to the tion Test pictures [20]), the answers to them can be used blockchain; (ii) data confidentiality, i.e., the submitted to infer the personality profiles of workers. Sometimes, the data is confidential to anyone other than the requester; data are simply valuable to the requester who paid to get and (iii) anonymity and accountability. them. What is worse, since the block confirmation (which Intuition behind the fairness and confidentiality is an corresponds to the time when the submitted answers are outsource-then-prove methodology that: (i) the requester actually recorded in a block) normally takes some time after is enforced to deposit the budget of her incentive pol- the data is submitted to the network, a malicious worker icy to a smart contract; (ii) submissions are encrypted can simply copy the data committed by others, and submit under the requester’s public key and will be collected the same data as his own to run the free-riding attack. by the blockchain; (iii) the evaluation of rewards is Without data confidentiality, the incentive mechanisms could outsourced to the requester who then needs to send an be rendered completely ineffective in decentralized settings. instruction about how to reward workers. The instruction is ensured to follow the promised incentive policy, because the requester is also required to attach a valid Second, most crowdsourcing systems [2, 4] and incentive succinct zero-knowledge proof. mechanisms [7–9] implicitly require requesters/workers to The worker anonymity of our protocol can ensure: (i) authenticate to prevent misbehaviors caused by (colluded) the public, including the requester and the implicit counterfeited identities [21]. When crowdsourcing is decen- registration authority, is not able to tell whether a data tralized, this basic requirement will cause the history of comes from a given worker or not; (ii) if a worker submitting/requesting to be known by everyone (which joins multiple tasks announced via the blockchain, no was previously “protected” in a data center such as the one can link these tasks. More importantly, we also breached one of Uber’s). Thus considerable information address the threat of multiple-submission exacerbated about workers/requesters [22] will leak to the public through by anonymity misuse. In particular, if a worker anony- their participation history, which seriously impairs their mously submits more than the number allowed in one privacy and even demotivates them to join. Notably, if a task, our scheme allows the blockchain to tell and drop user frequently joins traffic monitoring tasks, then anyone these invalid submissions. Similarly, we achieve the can read the blockchain and figure out his location traces. requesters’ accountable anonymity. 2) To achieve the above goal of anonymity while preserv- Clearly, to decentralize crowdsourcing in a meaningful ing accountability, we propose, define and construct way to realize basic system utilities, the above fundamental a new cryptographic primitive, called common-prefix- privacy challenges have to be overcome, which requires us linkable anonymous authentication. In most of the time, to resolve two natural tensions: (i) the tension between the a user can authenticate messages and attest the validity blockchain transparency and the data confidentiality, and of identity without being linked. The only exception (ii) the tension between the anonymity and the accountabil- that anyone can link two authenticated messages is that ity. Simple solutions utilizing some standard cryptographic they share the same prefix and are authenticated by one tools (e.g. encryption and/or group signature) to protect user. the data confidentiality and the anonymity do not work To utilize the new primitive in our protocol, a worker well: the encryption of data immediately prevents smart has to submit to a task via an anonymous authen- contracts from enforcing the rewards policy; to allow fully tication. The reference of the task will be unique, anonymous participation will give a dishonest worker an and should be the common-prefix, so that the special opportunity of multiple submissions in one crowdsourcing linkability will prevent multi-submission to a task. A task, and thus he may claim more rewards than what is requester can also use it to authenticate in each task supposed (similarly for a malicious requester). See more she publishes, and convince workers that she cannot details about the challenges in Section 2. 2A couple of recent attempts on decentralized crowdsourcing [23] Our contributions. In this paper, we construct, analyze have been made, however, none resolved the fundamental privacy issues, which are quite arguable for basic functionalities, as in these and implement a general blockchain based protocol to systems, malicious workers can simply free-ride by copying other enable the first private and anonymous decentralized data submissions. See Section 3 for the detailed insufficiencies. IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 3

maliciously submit to intentionally downgrade their rewards. We remark that such a scheme can be used to anonymously authenticate in a constant time independent to tasks. Such a primitive may be of independent interests for the special ﬂavor of its accountability-yet-anonymity. 3) To showcase the feasibility of applying our protocol, we implement the system that we call ZebraLancer for a common image annotation task on top of Ethereum, a real-world blockchain infrastructure. Intensive experiments and performance evaluations are conducted in a Ethereum test net. Since current smart contracts sup- Fig. 1. The model of data crowdsourcing: workers and requesters obtain port only primitive operations, tailoring such protocols unique credentials bound to their identities at a registration authority compatible with existing blockchain platforms is non- (RA); authenticated requesters and authenticated workers can make fair exchange between data and rewards through a third-party platform trivial. (arbiter).

2 PROBLEM FORMULATION identities. In practice, a RA can be instantiated by (i) the In this section, we will give more precise definitions about platform itself, (ii) the certificate authority who provides the problem and its security requirements. authentication service, or (iii) the hardware manufacturer who makes trusted devices that can faithfully sign messages TABLE 1 [24]. Our solution should be able to inherit these established Key notations related to crowdsourcing model RAs in the real-world. In this paper, w.l.o.g., we assume that each unique iden- Notation Represent for tity is only allowed to submit one answer to a task. Also, we R A requester that can be uniquely identified by ID R consider settings where the value of crowd-shared answers

Wj A worker who can be uniquely identified by ID Wi can be evaluated by a well-defined process such as auctions The crowdsourcing task published by R to collect n and quality-aware rewards, and also the corresponding T answers from n different workers rewards, c.f. [10–12] about quality-aware rewards and [8, 9] about auction-based incentives. These incentives share the Aj The answer that is submitted to T by Wj same essence as follows. R(Aj ; τ) The process that defines the value of the answer Aj Suppose an authenticated requester publishes a task T Rj The reward for answer Aj according to its value with a budget τ to collect n answers from n workers. An authenticated worker interested in it will then submit his an- Data crowdsourcing model. As illustrated in Fig.1, there swers. The reward of an answer Aj will follow a well-defined are four roles in the model of data crowdsourcing, i.e., process determined by some auxiliary variables, i.e., the requesters, workers, a platform and a registration authority. reward of Aj can be defined as Rj := R(Aj ; a1,...,am, τ), A requester, uniquely identified by R, can post a task to where R is a function parameterized by some auxiliary collect a certain amount of answers from the crowd. When variables denoted by a1,...,am, τ. Remark that τ is the announcing the task, the requester promises a concrete re- budget of the requester, and we will use Rj := R(Aj ; τ) ward policy to incentivize workers to contribute (see details for short. about the definition of reward policy below). A worker with a unique ID Wj , submits his answer Aj and expects to Particularly, in some simple tasks (e.g. multiple choice receive the corresponding reward. The platform, a medium problems), the quality of an answer can be straightfor- assisting the exchange between requesters and workers, is wardly evaluated by all answers to the same task, i.e. either a trusted party or emulated by a network of peers. The Rj = R(Aj ; A1,...,An, τ), with using majority voting or platform considered in this paper is jointly maintained by a estimation maximization iterations [10–12]. More generally, collection of network peers, and in particular, we will build [25] proposed a universal method to evaluate quality: (i) it atop a open blockchain network. The registration authority some workers are requested to answer a complex task; (RA), can play an important role of verifying and managing (ii) other different workers are then requested to grade unique identities of workers/requesters, by binding each each answer collected in the previous stage. What’s more, identity to a unique credential (e.g. a digital certificate). Note our model captures the essence of auction-based incentive that an answer Aj is chosen from a well-defined set A. mechanism such as [8, 9], when the parameters a1,...,am We remark that the well established identities are nec- represent the bids of workers (and other necessary auxiliary inputs such as their reputation scores). essary demand of real-world crowdsourcing systems, for example MTurk and Waze, to prevent misbehaviors such Our work considers the general definition above and as Sybil attack. Moreover, many auction-based incentive will be extendable to the scope of auction-based incentives, mechanisms [8, 9] are built upon the non-collusive game even though the protocol design and implementations in theory that implicitly requires established identities to en- this paper mainly focus on how to instantiate quality-aware sure one bid from one unique ID. We employ RA to establish incentives. Also note that the flat-rate incentive, that is each IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 4 submitted answer will get a flat-rate payment [2], can be adversary A corrupts one worker,3 and participates in the considered as a special case of the above abstraction as well. protocol interacting with a requester (and the platform): Security models. Next, we specify the basic security require- (i) A submits some answers {A1,...,An},n ≥ 1, let us ments for our (decentralized) crowdsourcing system on top define the bad event B2 as that A receives a payment of the existing infrastructure of open blockchain. greater than max{Aj ∈{A1,...,An}} Rj := R(Aj ; τ) from the A Data confidentiality. Ideally, this property requires that requester; (ii) also sees the transcripts corresponding to W the communication transcripts (including the blocks in the an honest worker i’s submission (e.g. the ciphertext of A A blockchain) do not leak anything to anyone (except the i), and then submits her answer j to the platform, let us define the bad event B3 as that A receives a payment requester) about the input parameters a1,...,am of the in- R = R(A ; τ) A centive policy R. Because these parameters might actually be j i . We require that for all polynomial time , Pr[B2] Pr[B3] − Pr[(A ← A) = A ] confidential data or sealed bids. We can adapt the classical and A i are negligibly semantic security [26] style definition from cryptography small. for this ideal purpose: the distribution of the public com- We remark that the above securities against a malicious munication can be simulated with only public knowledge. requester and malicious workers have captured the special However, an adversarial worker can infer information after fairness of the exchange between crowd-shared data and receiving his payment. This is the inherent issue of incentive rewards. For example, Sybil attackers who forge identities, mechanisms, and cannot be resolved even if there is a fully and front-running attackers who copy and paste others’ trusted third-party. So our protocol will focus on the best- submissions (which can be either ciphertexts or plaintexts) possible data confidentiality, as if there is a fully trusted shall be prevented. party. Informally speaking, our protocol is running in the Technical challenges. The main advantage that the “real” world, and imagine there is an “ideal” world where blockchain offers is the smart contract that can be auto- exists a fully trusted party facilitating the incentives; by real- matically executed as a piece of pre-defined agreement. Let ideal world paradigm, our data confidentiality requires that us firstly look at a naive decentralized solution to a crowd- the two worlds are computationally indistinguishable. sourcing task: the requester codes her incentive policy intoa Anonymity. Private information of worker/requester can task’s smart contract C , and then broadcasts the contract C be explored by linking tasks they join/publish [22]. In- to the blockchain network to collect n answers; the incentive tuitively, we might require two anonymity properties for policy is using an incentive mechanism defined by a func- workers: (i) unlinkability between a submission and a particular tion R and a budget τ; after the contract C is included by a worker and (ii) unlinkability among all tasks joined by a par- block, it can expect workers submit answers to it in the clear; ticular worker. However, (i) indeed can be implied by (ii), finally, C can evaluate the reward Ri = R(Ai; A1,...,A,τ) because the break of first one can obviously lead up to the for each answer Ai; more importantly, the contract can break of the latter one. Similarly, the anonymity of requester further automatically transfer Ri to the blockchain address can be understood as the unlinkability among all tasks bound to answer Ai. published by her. The requirement of worker anonymity can be formulated via the following game. An adversary The above naive solution looks appealing to resolve the A corrupts a requester, the registration authority (RA), and fairness issues such as free-riding and false-reporting. Notice the platform (e.g. the blockchain); suppose there are only that an implicit requirement is that answers (or bids) should two honest workers, W0 and W1. In the beginning, the be submitted in the clear, as the contract needs all those adversary announces a number of n tasks. For each task as the inputs (i.e. a1,...,am) of incentive policy R to audit W the right amount of the reward for each answer. However, Ti, suppose there are a set of participating workers Ti . After seeing all the communications, for any Ti 6= Tj , A those inputs can be valuable and sensitive crowd-shared W W answers, and should be kept confidential. Or they could be cannot tell whether Ti ∩ Tj = ∅ better than guessing. We note that the anonymity should hold even if all entities, auction bids, and should be confidential as well, because a including the requester and the platform (except W0 and malicious worker can learn the distribution of truthful bids W1), are corrupted. The requester anonymity can be defined of honest workers, and further break auctions by crafting via the above game similarly, and we omit the details. optimized malicious (untruthful) bids. One may suggest that a worker encrypts his answer (or his bid) under the Security against a malicious requester. A malicious re- requester’s public key, and submits the ciphertext instead. quester may avoid paying rewards (defined by the policy Unfortunately, this immediately renders the above naive so- R), e.g. launches the false-reporting attack. Security in this lution to fail, as the contract (more precisely, the blockchain case can be formulated via the following security game: peer) only sees ciphertexts and thus cannot evaluate the an adversary A corrupts the requester and executes the reward. Another proposal to hardcode the secret key in protocol to publish a task with a promised reward policy R the smart contract also fails, as the secret key will become and a budget τ. Let us define a bad event B1 to be that there transparent. Also remark that more advanced encryption exists a worker W , who submits answers A and receives j j schemes such as fully homomorphic encryption do not help, a payment smaller than Rj = R(Aj ; τ). We require that for every polynomial time adversary A, the probability Pr[B1] 3We remark that we focus on resolving the new challenges intro- is negligibly small. duced by blockchain, and put forth the best possible security, as if there Security against malicious workers. A dishonest worker is a fully trusted third-party serving as the crowdsourcing platform. For example, it is not quite clear how to handle the collusion of many may try to harvest more rewards than what he deserves. identities, even in the centralized setting; thus such a problem is out of Security in this case can be formulated as follows. An the scope of this paper. IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 5

because the blockchain peers still cannot know the plaintext incentive mechanisms and is not privacy-preserving either; of rewards without the secret key, even if they can compute the authors of [30] leveraged the blockchain as a payment the ciphertext of rewards. Therefore, the tension between the channel in their crowdsourcing framework, but it is neither data confidentiality and the transparent execution of smart secure against malicious workers and dishonest requesters, contracts brings in our first challenge: nor privacy-preserving; a couple of recent attempts [23, 31] Challenge 1. Leveraging the blockchain to enforce incentives, took advantage of the public blockchain to enforce incen- but not revealing the confidential inputs of incentive mechanism tives, but these frameworks are neither private nor anony- (e.g. answers or bids) to the public. mous, i.e., the collected data and the unique identities (such as certificates) will leak to the whole network of the open As briefly pointed out in the introduction, when the blockchain. anonymity of workers/requesters is not protected well, their private information can be hampered, because the Anonymous crowdsourcing. Li and Cao [32] proposed participation history of tasks will leak to the public through a framework to allow workers generate their own the blockchain. What’s worse, negative payoffs will be pseudonyms based on their device IDs. But the protocol added to decrease the motivations of workers/requesters sacrificed the accountability of workers, because workers to participate, due to their breached privacy. To mitigate can forge pseudonyms without attesting that they are asso- the risk of privacy breach, one might suggest anonymous ciated to real IDs, which gave a malicious worker chances to submission/request. However, if a worker is allowed to forge fake pseudo IDs and cheat for rewards. Rahaman et al. submit data without authenticating his unique identity, a [33] proposed an anonymous-yet-accountable protocol for malicious one may exploit this to flood a task by multiple crowdsourcing based on group signature, and focused on submissions. In particular, a requester specifies a task to how to revoke the anonymity of misbehaved workers. Mis- collect 20 answers from 20 anonymous workers, and a mali- behaved workers could be identified and further revoked cious worker might submit 20 colluded answers with claim- by the group manager. The authors in [28] similarly relied ing that they are from 20 different workers, as all answers on group signature but introduced a couple of separate are anonymously submitted. Worse still, an anonymous authorities. Our solution can be considered as a proactive requester might send (colluded) answers to her own task to version that can prevent worker misbehavior, and without repudiate payments. For instance, a requester anonymously relying on a group manager. publishes a task to collect 50 answers, and she submits 30 colluded anonymous answers to downgrade other 20 Accountable anonymous authentication. The pioneering answers, such that she avoids paying but still gets 20 useful works in anonymous e-cash [34, 35] firstly proposed the answers. These scenarios cause another tension between notion of one-time anonymous authentication. The concept the participants anonymity and their accountability, which later was studied in the context of one-show anonymous brings in the second challenge: credential [36]. Some works [37, 38] further extended the Challenge 2. Ensuring participants to be anonymous while notion of one-time use to be k-time use, and therefore keeping their accountability in decentralized crowdsourcing. enabled a more general accountability for anonymous authentications. In [39], the authors considered a special flavor of accountability to periodically allow k-time anonymous 3 RELATED WORK authentications. We thoroughly review related works, and briefly discuss the Our new primitive provides a more fine-grained con- insufficiencies of the state-of-the-art solutions. ditional linkage of anonymous authentications, which is needed for preventing multi-submission in each crowd- Centralized crowdsourcing systems. MTurk [2] is the most sourcing task. commercially successful crowdsourcing platform. But it has a well-know vulnerability allowing false-reporters gain Linkable group/ring signature. Conceptually similar to the short-term advantage [16]. Also, MTurk collects plaintexts of linkability appeared in one-show credential [36], linkable answers, which causes considerable worry of data leakage. group/ring signatures [40, 41] were proposed to allow a Last, the pseudo IDs in MTurks can be trivially linked by a user to sign messages on behalf of his group unlinkably up malicious requester. Dynamo [27] was designed as a privacy to twice. wrapper of MTurk. Its pseudo ID can only be linked by the In [42], a more general concept of event-oriented linkable pseudo ID issuer, but still it inherited all other weaknesses of group signature was formulated to realize more fine-grained MTurk. SPPEAR [28] considered a couple of privacy issues trade-off between accountability and anonymity: a user can in data crowdsourcing, and thus introduced a couple more sign on behalf of his group unlinkably up to k times per authorities, each of which handled a different functionality. event, where an event could be a common reference string Distributing one authority into multiple reduces the exces- (e.g., the unique address to call a smart contract deployed in sive trust, but, unfortunately, it is still not clear how to the blockchain). But its main disadvantage is that the group instantiate all those different authorities in practice. manager can reveal the actual identities of users. Similar Decentralized crowdsourcing. We also note there are sev- event-oriented linkability was discussed in the context of eral attempts [23, 29–31] using blockchain to decentralize ring signature [43] as well. Even though that construction crowdsourcing, but neither of them considers privacy and enjoys unbreakable anonymity, its main drawback becomes anonymity which are arguably fundamental for basic utility: the impracticability of “hiding” the real identity behind a the authors of [29] built up a crowd-shared service on top large group (mainly because the verification of signatures of the blockchain, but the system is not compatible with might require the public keys of all group members). IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 6

Our new primitive can be considered as a special block [51]. Specifically, miners and full nodes will per- cryptographic notion to formalize the subtle balance be- sistently receive newly proposed blocks, and faithfully tween event-oriented linkability and irrevocable anonymity execute “programs” defined by current states with tak- (within a large and dynamic group). Specifically, our scheme ing messages in new blocks as inputs. Moreover, the ensures that no one can link the actual identity to any computing results can be reliably delivered to the whole authenticated message (which is strictly stronger than [42]), network. and also it enables a user to “hide” behind a large group of 3) Transparency. All internal states of the blockchain will users (i.e. all users registered at RA, which is impossible in be visible to the whole blockchain (intuitively, anyone). practice via [43]). Therefore, all message deliveries and computations via Privacy-preserving smart contracts. Privacy-preserving the blockchain are in the clear. smart contract is a recent hot topic in blockchain research. 4) Blockchain address (Pseudonym). By default, the sender of Most of them are for general purpose consideration [44, 45], a message in the blockchain is referred to a pseudonym, and thus deploy heavy cryptographic tools including gen- a.k.a. blockchain address. In practice, a blockchain ad- eral secure multi-party computation (MPC). Hawk [46] did dress is usually bound to the hash of a public key; provide a general framework for privacy-preserving smart more importantly, the security of digital signatures contracts using light zk-SNARK, but mainly for reward re- can further ensure that one cannot send messages in ceiver to prove to the contract. Our work can be considered the name of a blockchain address, unless she has the as a very specially designed MPC protocol, and a lot of ded- corresponding secret key. icated optimizations of zk-SNARK exist which can directly Also, the program code of a smart contract deployed in benefit our protocol. Last, cryptocurrencies like Zcash [47] the blockchain can also be referred by a unique address, and Ethereum [48] also leverage zk-SNARK to build a public such that one can call the contract to be executed, by ledger that supports anonymous transactions. We note that committing a message pointing to this unique address. they consider more basic blockchain infrastructures, on top of which we may build our application for crowdsourcing. zk-SNARK. A zero-knowledge proof (zk-proof) allows a party (i.e. prover) to generate a cryptographic proof con- vincing another party (i.e. verifier) that some values are 4 PRELIMINARIES obtained by faithfully executing a pre-defined computation Blockchain and smart contracts. A blockchain is a global on some private inputs (i.e. witness) without revealing any ledger maintained by a P2P network collectively following a information about the private state. The security guarantees pre-defined consensus protocol. Each block in the chain will are: (i) soundness, that no prover can convince a verifier if aggregate some transactions containing use-case specific she did not compute the results correctly; sometimes, we data (e.g., monetary transfers, or program codes). In general, require a stronger soundness that for any prover, there exists we can view the blockchain as an ideal public ledger [49] an extractor algorithm which interacts with the prover and where one can write and read data in the clear. Moreover, it can actually output the witness (a.k.a. proof-of-knowledge); (ii) will faithfully carry out certain pre-defined functionalities. zero-knowledge, that the proof distribution can be simulated The last property captures the essence of smart contracts, without seeing any secret state, i.e., it leaks nothing about that every blockchain node will run them as programs the witness. Both above will hold with an overwhelming and update their local replicas according to the execution probability. results, collectively. More specifically, the properties of the The zero-knowledge succinct non-interactive argument blockchain can be informally abstracted as the following of knowledge (zk-SNARK) further allows such a proof ideal public ledger model [46]: to be generated non-interactively. More importantly, the proof is succinct, i.e., the proof size is independent on the 1) Reliable delivery of messages. The blockchain can be mod- complexity of the statement to be proved, and is always eled as an ideal public ledger that ensures the liveness a small constant. More precisely, zk-SNARK is a tuple of and persistence of messages committed to it [49]. De- three algorithms. A setup algorithm can output the public tailedly, a message sent to the blockchain is in the form parameters to establish a SNARK for a NP-complete lan- of a validly signed transaction broadcasted to the whole guage L = {~x | ∃ ~w, s.t., C(~x, ~w)=1}. The Prover algorithm blockchain network, and then it will be solicited by a can leverage the established SNARK to generate a constant- block and written into the blockchain. size proof attesting the trueness of a statement ~x ∈ L with Remark that we assume the synchronous network witness ~w. The Verifier algorithm can efficiently check the model [46, 50] of blockchain through the paper, that proof. means a transaction will appear in any honest node’s replica within a certain time after it appears in the blockchain network. Essentially, the blockchain delivers any message in the form of valid transaction to any 5 PROTOCOL: PRIVATE AND ANONYMOUS DE- blockchain node within a-priori delay. CENTRALIZED DATA CROWDSOURCING Also we remark that a network adversary can reorder In this section, we will construct a private and anonymous transactions that are broadcasted to the network but not protocol to address the critical challenges of decentralizing yet written into a block. crowdsourcing, without sacrificing security against “free- 2) Correct computation. The blockchain can be seen as a riders” and “false-reporters”. The procedures of crowd- state machine driven by messages included in each sourcing will be decentralized atop an existing network of IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 7 blockchain. More specifically, we will tackle the new privacy different tasks will be provably unlinkable, even if they are and anonymity challenges brought by the blockchain. submitted by a same user. Also, the number of maximum As we briefly mentioned in previous sections, the sys- allowed submissions in each task can be easily tuned (by tem will implicitly has a separate registration service that counting linked submissions). validates each participant’s unique identity before issuing a Last, we also need to augment the smart contract by certificate. Such setup alleviates some basic problems that building the general algorithm of verifying zero-knowledge every worker is allowed to submit no more than a fixed proofs in it. In particular, when the smart contract re- number k of answers. For simplicity, we consider here k =1. ceives an instruction regarding rewards and its proof, the Intuitions. Our basic strategy is to let the smart contract verification algorithm will be executed. All inputs of the (which is hosted by the blockchain network) to enforce verification algorithm are common knowledge stored in the the fair exchange between the submitted answers and their open blockchain, e.g., the budget, the encrypted answers corresponding rewards, but without revealing any data or and the public key of encryption. If a dishonest requester any identity to the blockchain. Let us walk through the high reports a false instruction, her proof cannot be verified and level ideas first. the contract will simply drop the instruction. What’s more, The requester firstly codifies a reward policy parame- if the smart contract does not receive a correct instruction terized by her budget (i.e. R(·; τ)) into a smart contract. within a time limit, it can directly disseminate the budget to She broadcasts a transaction containing the contract code all workers evenly as punishment (which can be considered and the budget. Once the smart contract is included in as part of the pre-specified incentive mechanism), since the the blockchain, it can be referred by a unique blockchain budget has been deposited. In this way, the requester cannot address, and the budget should be deposited to this ad- gain any benefit by deviating from the protocol, and she will dress (otherwise, no one would participate). After that, any be self-enforced to respond properly and timely, resulting worker who is interested in contributing could simply sub- in that each worker will receive the expected reward. On mit his answer to the blockchain, via a transaction pointing the other hand, a dishonest worker can never claim more to the contract’s address. rewards than that he is supposed to get, as the reward is calculated by the requester herself. As pointed out before, we have to protect the confidentiality of the answers, in order to ensure that answers from different workers are independent. So the workers encrypt the answers under the requester’s public key. Now the contract cannot see the answers so it cannot calculate the corresponding rewards. But the requester can retrieve all the encrypted answers and decrypt them off-chain, and further learn the rewards they deserve. It would be necessary that the requester will correctly instruct the smart contract how to proceed forward. Concretely, we will leverage the practical cryptographic tool of zk-SNARK to enforce the requester to prove: she indeed followed the pre-specified reward policy calculating the rewards. Detailedly, the requester should prove her instruction for rewarding is computed as follows: (i) obtain all answers by decrypting all encrypted answers using a secret key corresponding to the public key contained in the smart contract; (ii) use all those answers and the announced R(·; τ) to compute the quality of each answer. A more challenging issue arises regarding anonymity-yet- Fig. 2. Subtle linkability of the common-prefix-linkable anonymous au- accountability. Also as briefly pointed before, we would like thentication scheme. All involved algorithms except Setup are shown in to achieve a balance between anonymity and accountability. bold. Here we put forth a new cryptographic primitive to resolve the natural tension. A user can anonymously authenticate on messages (which are composed of a fixed length pre- 5.1 Common-prefix-linkable anonymous authentica- fix and the remaining part). But if the two authenticated tion messages share a common prefix, anyone can tell whether Before the formal description of ZebraLancer’s protocol, they are done by a same user or not. Moreover, no one let us introduce the new primitive for achieving the can link any two message-authentication pairs, as long as anonymous-yet-accountable authentication first. As briefly these messages have different prefixes. Having this new shown in Fig.2, our new primitive can be built atop any primitive in hand, a simple and intuitive solution to the certification procedure, thus we include a certification gen- anonymous-yet-accountable protocol is to let each worker eration procedure that can be inherited from any existing anonymously authenticate on a message αC ||Ci, whenever one. Also, we insist on non-interactive authentication, thus all the encrypted answer Ci is submitted to a task contract C the steps (including the authentication step) are described as that can be uniquely addressed by αC . This implies that algorithms instead of protocols. Formally, a common-prefix- all submissions from the same worker to one task can be linkable anonymous authentication scheme is composed of linked and then counted, but any two submissions to two the following algorithms: IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 8

TABLE 2 2) CertGen queries. The adversary A submits q public Notations related to common-prefix-linkable anonymous authentication keys with different identities and obtains q different certificates: cert1,...,certq. Notation Represent for 3) Auth. The adversary A chooses q + 1 messages RA’s public-secret key pair, which is generated p||m1,...,p||mq+1 sharing a common-prefix p (with (mpk, msk) by the algorithm of Setup |p| = λ) and authenticates to the challenger C by The public key certified by RA to bind the generating the corresponding attestations π1,...,πq+1. (pki,ski) unique ID i, and its corresponding secret key Adversary A wins if all q +1 authentications pass the veri- The algorithm executed by RA to issue a certifi- CertGen fication, and no pair of those authentications were linked. cate associated to a registered public key The algorithm that can be run by anyone to Definition 1 (Common-prefix-linkability). For all probabilis- CertVrfy check the validity of a certificate tic polynomial time algorithm A, Pr[A wins in the above game] The algorithm to authenticate on a message with is negligible on the security parameter λ. Auth using a certified public key Next is the anonymity guarantee in normal cases. We The algorithm to verify whether a message is Verify authenticated by a certified public key or not would like to ensure the anonymity against any party, including the public, the registration authority, and the The algorithm to link two authenticated mes- Link sages, if and only if they share a common-prefix verifier who can ask for multiple (and potentially correlated) authentication queries. Also, our strong anonymity requires The zk-SNARK algorithm that generates zk- Prover proofs attesting the statements to be proven that no one can even tell whether a user is authenticating for The zk-SNARK algorithm that checks whether different messages, if these messages have different prefixes. Verifier zk-proofs are generated faithfully or not The basic requirement for anonymity is that no one can recognize the real identity from the authentication transcript. But our unlinkability requirement is strictly stronger, - Setup(1λ). This algorithm outputs the system’s master as if one can recognize identity, obviously, she can link two public key mpk, and system’s master secret key msk, authentications by firstly recovering the actual identities. where λ is the security parameter. To capture the unlinkability (among the authentications - CertGen(msk,pk). This algorithm outputs a certificate of different-prefix messages), we can imagine the most cert to validate the public key. stringent setting, where there are only two honest users in - Auth(m,sk,pk,cert,mpk). This algorithm generates an the system, the adversary still cannot properly link any of attestation π on a message m that: the sender of m them from a sequence of authentications. Formally, consider indeed owns a secret key corresponding to a valid the following game between the challenger C and adversary certificate. A. - Verify(m,mpk,π). This algorithm outputs 0/1 to decide 1) Setup. The adversary A generates the master key pair. CertGen whether the attestation is valid or not for the attested 2) . The adversary A runs the certificate gener- message, with using system’s master public key. ation procedure as a registration authority with the challenger. The challenger submits two public keys - Link(mpk,m1, π1,m2, π2). This algorithm takes inputs pk0, pk1 and the adversary generates the corresponding two valid message-attestation pairs. If m1,m2 have a certificates for them cert0,cert1. A can always generate λ π1, π2 common-prefix with length , and are generated certificates for public keys generated by herself. 1 from a same certificate, it outputs ; otherwise outputs 3) Auth-queries. The adversary A asks the challenger to 0. serially use (sk0, pk0,cert0) and (sk1, pk1,cert1) to do We also define the properties for a common-prefix- a sequence of authentications on messages chosen by linkable anonymous authentication. Correctness is straight- her. Also, the number q of authentication queries is cho- forward that an honestly generated authentication can be sen by A. The adversary obtains 2q message-attestation verified. Unforgeability also follows from the standard notion pairs. of authentications, that if one does not own any valid 4) Challenge. The adversary A chooses a new message certificate, she cannot authenticate any message. We mainly m∗ which does not have a common prefix with any focus on the formalizing the security notions of common- of the messages asked in the Auth-queries, and asks the prefix-linkability and anonymity. challenger to do one more authentication. C picks a ran- ∗ The first one characterizes a special accountability re- dom bit b and authenticates on m using skb, pkb,certb. quirement in anonymous authentication. It requires that After receiving the attestation πb, A outputs her guess no efficient adversary can authenticate two messages with b′. a common-prefix without being linked, if using a same The adversary wins if b′ = b. certificate. More generally, if an attacker corrupts q users, she cannot authenticate q +1 messages sharing a common- Definition 2 (Anonymity). An authentication scheme is un- prefix, without being noticed. Formally, consider the fol- linkable, if ∀ probabilistic polynomial-time algorithm A, | Pr[A 1 4 lowing cryptographic game between a challenger C and an wins in the above game] − 2 | is negligible. attacker A: 4We remark that our definition of anonymity is strictly stronger 1) Setup. The challenger C runs the Setup algorithm and than that definition of the event-oriented linkable group signature [42], obtains the master keys. in which the RA can revoke user anonymity under certain conditions. IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 9

Construction. Now we proceed to construct such a prim- - Link(m1, π1,m2, π2): On inputting two attestations 1 1 2 2 itive. Same as many anonymous authentication construc- π1 := (t1,t2, η1) and π2 := (t1,t2, η2), the algorithm 1 ? 2 tions, we will also use the zero-knowledge proof technique simply checks t1 = t1. If yes, output 1; otherwise, towards anonymity. In particular, we will leverage zk- output 0. We also use Link(π1, π2) for short. SNARK to give an efficient construction. For the above concept of common-prefix-linkable anonymous authentica- Security analysis (sketch). Here we briefly sketch the se- tion, we need to further support the special accountability curity analysis for the construction. As the scheme is of in- requirement. The idea is as follows, since the condition dependent interests, we defer detailed analyses/reductions that “breaks” the linkability is common-prefix, thus the to an extended paper where the scientific behind will be authentication will do a special treatment on the prefix. In formally studied. particular, the authentication shows a tag committing to the Regarding correctness, it is trivial, because of the com- prefix together with the user’s secret key, and then presents pleteness of underlying SNARK. zero-knowledge proof that such a tag is properly formed, Regarding unforgeability, we require an uncertified at- i.e., computed by hashing the prefix and a secret key. To tacker cannot authenticate. The only transcripts can be seen ensure other basic security notions, we will also compute by the adversary are headers and the zero-knowledge at- the other tag that commits to the whole message. The user testation. Headers include one generated by hashing the will further prove in zero-knowledge that the secret key concatenation of p||m,sk. In order to provide a header, corresponds to a certified public key. the attacker has to know the corresponding sk, as it can Note that our main goal is to decentralize crowdsourc- be extracted in the random oracle queries. Thus there are ing, such a new anonymous authentication primitive could only two different ways for the attacker: (i) the attacker be further studied systematically in future works. Con- generates forges the certificate, which clearly violates the cretely, we present the detailed construction as follows: signature security; (ii) the attacker forges the attestation using an invalid certificate, which clearly violates the proof- - Setup(λ). This algorithm establishes the public parame- of-knowledge of the zk-SNARK. ters P P that will be needed for the zk-SNARK system. Regarding the common-prefix-linkability, it is also fairly Also, the algorithm generates a key pair (msk, mpk) straightforward, as the final authentication transcript con- which is for a digital signature scheme.5 tains a header computed by H(p,sk) which is an invariable

- CertGen(msk,pki): This algorithm runs a signing algo- for a common prefix p using the same secret key sk. 6 rithm on pki, and obtains a signature σi. It outputs Regarding the anonymity/unlinkability, we require that certi := σi. after seeing a bunch of authentication transcripts from one user, the attacker cannot figure out whether a new - Auth(p||m,ski, pki,certi, P P ): On inputing a message p||m having a prefix p. authentication comes from the same user. This holds even The algorithm first computes two tags (or interchange- if the attacker can be the registration authority that issues all the certificates. To see this, as the attacker will not be ably called headers later), t1 = H(p,ski) and t2 = able to figure the value of the sk from all public value, H(p||m,ski), where H is a secure hash function. Then, thus the headers/tags can be considered as random values. let ~w = (ski, pki,certi) represent the private witness, and ~x = (p||m,mpk) be all common knowl- It follows that H(p,sk) and a random value r cannot be edge, the algorithm runs zk-SNARK proving algorithm distinguished (similarly for H(p||m,sk)). More importantly, due to the zero-knowledge property of zk-SNARK, given Prover(~x, ~w, P P ) for the following language LT := r, a simulator can simulate a valid proof η∗ by controlling {t1,t2, ~x = (p||m,mpk) | ∃ ~w = (ski, pki,certi) s.t. the common reference string of the zk-SNARK. That said, CertVrfy(certi, pki,mpk)=1∧pair(pki, ski)=1∧t1 = ∗ CertVrfy the public transcript t1,t2, η can be simulated by r1, r2, η H(p,ski) ∧ t2 = H(p||m,ski)}, where the ∗ algorithm checks the validity of the certificate using where r1, r2 are uniform values, and η is a simulated proof, a signature verification, and pair algorithm verifies all of which has nothing to do with the actual witness sk. whether two keys are a consistent public-secret key Summarizing the above intuitive analyses, we have the pair. This prove algorithm yields a proof η for the following theorem: statement ~x ∈ L (also for the proof-of-knowledge of Theorem 1. Conditioned on that the hash function to be mod- ~w). eled as a random oracle and the zk-SNARK is zero-knowledge, Finally, the algorithm outputs π := (t1,t2, η). the construction of the common-prefix-linkable anonymous au- - Verify(p||m,π,mpk,PP ): this algorithm runs the veri- thentication satisfies anonymity. Conditioned on the underlying fying algorithm of zk-SNARK Verifier on ~x, π and P P , digital signature scheme used is secure, and the zk-SNARK sat- and outputs the decision bit d ∈{0, 1}. isfies proof-of-knowledge, our construction of the common-prefix- linkable anonymous authentication will be unforgeable. It is also correct and common-prefix linkable. 5To be more precise, the public parameter generation could be from another algorithm. For simplicity, we put it here. In the security game for anonymity, the adversary only generates msk, mpk, not the public parameter. 5.2 The protocol of ZebraLancer 6 We assume an external identification procedure that can check the Now we are ready to present a general protocol for a class actual identity bound to each public key, and the user key pairs are generated using common algorithms, e.g., for a digital signature. We of crowdsourcing tasks having proper quality-aware incen- ignore the details here. tives mechanisms defined as in Section 2. As zk-SNARK IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 10 requires a setup phase, we consider that a setup algorithm TABLE 3 generated the public parameters P P for this purpose, and Key notations related to the protocol of ZebraLancer published it as common knowledge.7 Our descriptions focuses on the application atop the open blockchain, and Notation Represent for therefore omits details of sending messages through the C The smart contract programmed for the crowd- underlying blockchain infrastructure. For example, “one sourcing task to be published uses blockchain address α to send a message m to the The blockchain address of C , which is also used α blockchain” will represent that he broadcasts a blockchain C as the prefix for authentication transaction containing the message m, the public key asso- The one-time blockchain address used by the α ciated to α, and the signature properly generated under the R requester R to anonymously interact with C corresponding secret key. The one-time blockchain address used by the αi worker Wi to anonymously interact with C The one-time public-secret key pair used in C for esk, epk ( ) encrypting/decrypting crowd-shared answers The encrypted answer submitted to C by the Ci worker Wi, i.e. the ciphertext of answer Ai The instruction sent from the requester to instruct R the smart contract to reward each crowd-shared answer The zk-SNARK proof generated to attest the cor- π reward rectness of the instruction of paying rewards The attestation sent from a user i to anonymously π i authenticate a message having a prefix αC Public parameters for zero-knowledge proof, including the one for common-prefix-linkable PP anonymous authentication and the one for incentive mechanism

• Register. Everyone registers at RA to get a certificate bound to his/her unique ID, which is done off-line only once for per each participant. A requester, having a unique ID denoted by R, creates a public-secret key pair (pkR, skR), and registers at the registration authority (RA) to obtain a certificate certR binding pkR to R. Each worker, having a unique ID denoted by Wi, also generates his public and secret key Fig. 3. The schematic diagram of the protocol of ZebraLancer for the quality-aware incentive mechanisms as proof-of-concept. Encrypted an- pair (pki, ski), and registers his public key at RA to swers and the instruction of rewarding are pointing to the task published obtain a certificate certi binding pki and Wi. as a smart contract. • TaskPublish. A requester anonymously authenticates and publishes a task contract with a promised reward policy. Remark that here we let each worker/requester to gen- R erate a different blockchain address for each task (i.e. a When the requester has a crowdsourcing task, she α one-task-only address) as a simple solution to avoid de- generates a fresh blockchain account address R, and a (epk, esk) anonymization in the underlying blockchain.8 For concrete key pair (which will be used for workers to instantiations of the underlying infrastructures, see the im- encrypt submissions) for this task only. R Param plementations in Section 6. We further remark that the pro- then prepares parameters , including the epk tocol can be extended to private and anonymous auction- encryption key , the number of answers to n based incentives trivially (see Appendix B for a concrete collect (denoted by ), the deadline, the budget τ R example), although it mainly focuses on quality-aware ones. , the reward policy , SNARK’s public parameters P P , RA’s public key mpk, and also πR = 9 Protocol details. As shown in Fig.3, the details of Ze- Auth(αC ||αR, skR, pkR,certR, P P ). braLancer protocol can be described as follows: 9We remark that the requester should authenticate on her blockchain address αR along with the task contract, and workers will 7This in practice can be done via a secure multiparty computation join the task only if the task contract is indeed sent from a blockchain protocol [52] to eliminate potential backdoors. address as same as the authenticated αR. So a malicious requester 8Our anonymous protocol mainly focuses on the application layer cannot “authenticate” a task by copying other valid authentications. such as the crowdsourcing functionality that is built on top of the In addition, each worker has to authenticate on his blockchain address blockchain infrastructure. If the underlying blockchain layer supports αi along with his answer submission as well. The task contract will anonymous transaction, such as Zcash [47], the worker and the re- check the submission is indeed sent from a blockchain address same to quester can re-use account addresses. We further remark that the the authenticated αi. Otherwise, a malicious worker can launch free- anonymity in network layer are out the scope of this paper, we may riding through copying and re-sending authenticated submissions that deploy our protocol on existing infrastructure such as Tor. have been broadcasted but not yet confirmed by a block. IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 11

The requester then codes a smart contract C that con- Algorithm 1: Example using quality-aware incentive tains all above information for her task. After compiling Require : This contract’s address αC ; requester’s C C , she puts ’s code and a transfer of the budget into one-time blockchain address αR; requester’s a blockchain transaction, and uses the one-task-only authenticating attestation πR; RA’s public key address αR to send the transaction into the blockchain mpk; budget τ; public key epk for encrypting network. When a block containing C is appended to the answers; SNARK’s public parameters P P ; blockchain, C gets an immutable blockchain address number of requested answers n; deadline of answering in unit of block TA; deadline of αC to hold the budget and interact with anyone.10 instructing reward in unit of block TI . See Algorithm 1 below for a concrete example of task 1 List keeping answers’ ciphertexts, C ← ∅; contract. (The important component of verifying zk- 2 Map of anonymous attestations and authenticated proofs is done by calling a library libsnark.Verifier in- one-time blockchain addresses of workers, W ← ∅; tegrated into the blockchain infrastructure, and imple- 3 if getBalance(αC ) <τ ∨ ¬Verify(αC ||αR,πR,mpk,PP ) mentation details will be explained in Section 6). then 4 goto 24 ; ⊲ Unidentified requester or no deposit. • AnswerCollection. The contract collects anonymously au- 5 timerA ← a timer expires after TA; ⊲ Collect answers thenticated encrypted answers from workers who didn’t sub- 6 while ||C|| < n ∧ timerA NOT expired do mit before. 7 if αi sends πi,Ci then W If a registered worker Wi is interested in contributing, 8 if ¬Link(πi,πR) ∧∀π∗ ∈ .keys() ¬Link(πi,π∗) ∧ he first validates the contract content (e.g., checking the Verify(αC ||αi||Ci,πi,mpk,PP ) then 9 W .add(πi → αi) C.add(Ci) parameters), then generates a one-time blockchain ad- ; ;

dress αi. He signs his answer Ai with using the private 10 if αR sends i, πfake then key associated with the blockchain address αi to obtain 11 if libsnark.Verifier((Ci,σi),πfake, P P ) then W C a signature σi, then encrypts the signed answer Ai||σi 12 .remove(πi → αi); .remove(Ci); under the task’s public key epk to obtain ciphertext Ci. I I Note that σi can be verified by the chain address αi, 13 timer ← a timer expires after T ; ⊲ Wait instruction e.g., it is generated by the same signing algorithm to 14 while timerR NOT expired do 15 if αR sends R := (R1,...,Rn) and πreward then sign transactions. 16 P ← (epk,τ,C1,...,Cn); He then uses common-prefix-linkable anonymous au- 17 if libsnark.Verifier((P , R),πreward, P P ) then thentication scheme to generate an attestation πi = 18 for each (πi → αi) ∈ W do Auth C 9 (α ||αi||Ci, ski, pki,certi, P P ). Then he uses his 19 transfer(αC, αi, Ri); one-time address αi to send Ci, πi to the blockchain 20 goto 24; network (with a pointer to αC , i.e. the unique address of the contract C ). 21 R ← τ/||W ||; ⊲ Reward all if no correct instruction C C Then, runs Verify(α ||αi||Ci, πi,mpk,PP ), and also 22 for each (πi → αi) ∈ W do executes Link(πi, π∗) for each valid authentication at- 23 transfer(αC, αi, R); testation π that was received before (including re- ∗ 24 transfer(αC, αR, getBalance(αC)); ⊲ Refund the else C quester’s, namely πR). Such that, can ensure Ci is 25 function getBalance(addr) the first submission of a registered worker. For unau- 26 return the balance of addr in the ledger; C 11 thenticated or double submissions, simply drops it. 27 function transfer(src, dst, v) The contract C will keep on collecting answers, until 28 if getBalance(src) < v then it receives n answers or the deadline (in unit of block) 29 return false; passes. It also records each address αi that sends Ci. 30 src’s balance - v; dst’s balance + v; return true; Remark that Link algorithm will be executed O(n2) 31 ⊲ The correctness and availability of this task contract is times, but it is a simple equality check over a pair of governed by the blockchain network; the contract hashes, such that the cost of running it for several times program is driven by a “discrete” clock that increments will be nearly nothing in practice. with validating each newly proposed block Along the way, the requester can keep on listening the 32 ⊲ libsnark.Verifier is a library embedded in the runtime environment of smart contract such as EVM blockchain, and decrypts each submission Ci that is sent by αi and accepted by the contract. Such that, the requester can obtain the signed answer Ai||σi bound to C , and verifies the signature σ using the i i the requester will generate a zero-knowledge proof blockchain address αi. If Ai is not properly signed, πfake to convince the contract that the plaintext bound 10We emphasize that αC will be unique per each contract. In to Ci is not correctly signed message. Particularly, practice, αC can be computed via H(αR counter), where H is a the proof is generated by zk-SNARK’s Prover al- secure hash function, and counter is governed|| by the blockchain to gorithm to attest the statement that (Ci,epk,αi) ∈ be increased by exact one for each contract created by the blockchain address αR. It’s also clear that the requester R can predicate αC before {Ci,epk,αi | ∃esk, s.t., verifying(αi, Ai, σi) 6= 1 ∧ C is on-chain, such that she can compute πR off-line and let it be a Ai||σi = Dec(esk,Ci) ∧ pair(esk, epk)=1}, where parameter of contract C . verifying is a function that outputs 1 iff the inputs 11 We remark that our protocol can be extended trivially to allow correspond properly signed message. Once the contract each worker to submit some k answers in one task by modifying the checking condition programmed in the smart contract of crowdsourc- receives and validates such a proof πfake for a given ing task. submission from αi, it will remove Ci and αi. Note IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 12

that the above proof and verification are necessary to primitives, including our common-prefix-linkable anony- prevent the network adversaries who can delay others’ mous authentication scheme, are well abstracted, which submissions, and then copy and paste others’ cipher- enable us to argue security in a modular way. texts to reap rewards. Regarding the data confidentiality of answers, all related • Reward. The requester computes and prove how to reward public transcripts are simply the ciphertexts C1,...,Cn, and properly authenticated anonymous answers. the zk-SNARK proof π. The ciphertexts are easily simulat- The requester R keeps listening to the blockchain, and able according to the semantic security of the public key once C collects n submissions, she retrieves and de- encryption, and the proof π can also be simulated without crypts all of them to obtain the corresponding answers seeing the secret witness because of the zero-knowledge property. A1,...,An (if there are not enough submissions when the deadline passes, the requester simply sets the re- Regarding the anonymity, an adversary has two ways to maining answers to be ⊥ which has been considered break it: (i) link a worker/requester through his blockchain by the incentive mechanism R). addresses; (ii) link answers/tasks of a worker/requester Next, the requester computes the reward for each an- through his authenticating attestations. The first case is swer Ri = R(Ai; A1,...,An, τ) as specified by the trivial, simply because every worker/requester will inter- policy codified in C . More importantly, she generates a act with each task by a randomly generated one-task-only zero knowledge proof πreward, with the secret key esk blockchain address (and the corresponding public key). The as witness to attest the validity of the instruction. In par- second case is more involved, but the anonymity of workers ticular, the proof is for the following NP-language L = and requesters can be derived through the anonymity of the n n {R, P | ∃esk s.t. ∧j=1Aj ||σi = Dec(esk,Cj ) ∧j=1 Rj = common-prefix-linkable anonymous authentication scheme. R(Aj ; A1,...,An, τ) ∧ pair(esk, epk)=1}, where P Regarding the security against a malicious requester, a denotes Param together with ciphertexts C1,...,Cn; malicious requester has three chances to gain advantage: while R := (R1,...,Rn) is the instruction about how (i) deny the policy announced in TaskPublish phase; (ii) to reward each answer. After computing R and πreward, cheat in Reward phase; (iii) submit answers to intention- R puts them into a blockchain transaction, and still ally downgrade others in AnswerCollection phase. The first use her one-task-only blockchain address αR to send threat is prevented because the smart contract is public, C the transaction to (by using a pointer to αC ). This and the requester cannot deny it once it is posted in the finishes the outsource-then-prove methodology. immutable blockchain. The second threat is prohibited by Once a newly proposed block contains the reward the soundness of the underlying zk-SNARK, since any instruction R and its attestation πreward, the contract incorrect instruction passing the verification in the smart C first checks that they are indeed sent from αR contract, directly violates the proof-of-knowledge (i.e. the (by verifying the digital signature of the underlying strong soundness). The last threat is simply handled the un- blockchain transaction). Then it leverages SNARK’s forgeability and common-prefix-linkability of our common- Verifier algorithm to verify the proof πreward regarding prefix-linkable anonymous authentication scheme. the correctness of R. If the verification passes, it trans- Security against malicious workers is straightforward, the fers each amount Ri to each account αi, and refunds the only ways that malicious workers can cheat are: (i) sub- remaining balance to αR. Otherwise, pause. If receiving no valid instruction after a predefined time (in unit of mitting more than one answers in AnswerCollection phase; (ii) copying and pasting others’ answers to earn reward; block), the contract simply transfers τ/n to each αi as part of the policy R. (iii) sending the contract a fake instruction in the name of requester in Reward phase; (iv) altering the policy specified in the contract. The first threat is simply han- 5.3 Analysis of the protocol dled by the common-prefix-linkability and unforgeability Correctness and efficiency. It is clear to see that the re- of common-prefix-linkable anonymous authentication. The quester will obtain data and the workers would receive the second threat can be approached by predicating the plain- right amount of payments. If they all follow the protocol, texts of answers or copying the ciphertexts, but both are under the conditions that (i) the blockchain can be modeled prevented: predicating plaintext is prevented due to the as an ideal public ledger, (ii) the underlying zk-SNARK is semantical security of public key encryption; copying the of completeness, (iii) the public key encryption is correct, ciphertexts are prevented because the securities of digital and (iv) common-prefix-linkable anonymous authentication signature and zk-SNARK. The third threat is simply han- satisfies correctness. Regarding efficiency, we note the on- dled by the security of digital signatures. The last issue chain computation (and storage which are two of the major is trivial, because the blockchain security ensures the an- obstacles for applying blockchain in general) is actually very nounced policy is immutable. light, as the contract essentially only carries a verification Theorem 2. The data confidentiality of our protocol holds, if the step. Thanks to zk-SNARK, the verification can be efficiently underlying public key encryption is semantically secure and the executed by checking only a few pairing equalities; more- used zk-SNARK is of zero-knowledge. over, the special library can be dedicatedly optimized in The anonymity of our protocol for both workers and requesters various ways [53]. will be satisfied, if the underlying common-prefix-linkable anony- Security analysis (sketch). We briefly discuss security here mous authentication satisfies the anonymity defined in Definition and defer the details to an extended version. The underlying 2, and the zk-SNARK is zero-knowledge. IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 13

Conditioned on that the blockchain infrastructure we rely on while the anonymity of other systems can be broken by can be modeled as an ideal public ledger, the underlying common- a third-party authority (or few colluded third-parties). Re- prefix-linkable anonymous authentication satisfies the unforge- mark that the identities established via a registration service ability and the common-prefix-linkability, the zk-SNARK satisfies are required by all systems in Table.4. proof-of-knowledge and the digital signature in use is secure, our protocol satisfies: security against a malicious requester and TABLE 4 security against malicious workers. Comparison between our system ZebraLancer and some other crowdsourcing platforms

Ours MTurk Dynamo SPPEAR CrowdBC 6 ZEBRALANCER: IMPLEMENTATION AND EXPER- [2] [27] [28] [23] IMENTAL EVALUATION Preventing √ √ We implement the protocol of ZebraLancer atop Ethereum, false-reporting × × and instantiate a series of typical image annotation tasks [11] Preventing √ √ free-riding with using it. Furthermore, we conduct experiments of these Data conﬁden- √ tasks in an Ethereum test net to evaluate the applicability. tiality × × × × User √ anonymity × × Note: √ denotes a functionality realized without relying on any central trust except the established identities; denotes a (partially) realized function by relying on a central trust (other than the registration authority); denotes an unrealized feature. Note that the data conﬁdentiality is marked× as , if any third-party other than the requesters can access the submitted× data.

Implementation challenges. The main challenge of de- ploying smart contracts in general is that they can only support very light on-chain operations for both computing and storing.12 Our protocol actually has taken this into consideration. In particular, our on-chain computation only Fig. 4. The system-level view of ZebraLancer. Our Dapp layer can be consists SNARK verifications, while the heavy computation built on top of an existing blockchain, e.g. the Ethereum Byzantium of SNARK proofs are all done off the blockchain. Even still, release [48]. building an efficient privacy-preserving DApp compatible with existing blockchain platform such as Ethereum is not System in a nutshell. As shown in Fig.4, the decentralized straightforward. For instance, in order to allow smart con- application (DApp) of our system is composed of an on- tracts to call a zk-SNARK verification library, a contract of chain part and an off-chain part. The on-chain part consists this library should be thrown into a block, but this library of crowdsourcing task contracts and an interface contract is a general purpose tool that can be too complex to be of the registration authority (RA). The RA’s contract simply executed in the smart contract runtime environment, e.g. posits the system’s master public key as a common knowl- Ethereum Virtual Machine (EVM). Alternatively, we modify edge stored in the blockchain. The off-chain part consists the the runtime environment of smart contracts, so that an of requester clients and worker clients. These clients can optimized zk-SNARK verification library [53] is embedded be blockchain clients wrapped with functionalities required in it as a primitive operation. Our modified Ethereum client by our system. Specifically, a client of requester should is written in Java 1.8 with Spring framework, and is avail- codify a specific task with a given incentive mechanism able at github.com/maxilbert/ethereumj. and announces it as a smart contract. Note that we, as the designers of the DApp, can provide contract templates We remark that Ethereum project recently integrated to requesters for easier instantiation of incentive mecha- some new cryptographic primitives into EVM to enable nisms, c.f. [54]. The clients further need an integrated zk- SNARK verification as well [48], which ensures our DApp SNARK prover to produce the anonymous authenticating can essentially inherit all Ethereum users to maintain the attestations; moreover, a requester client should also lever- blockchain infrastructure to govern the faithful execution of age SNARK prover to generate proofs attesting the correct the smart contracts in our DApp. execution of incentive policies. Establishments of zk-SNARKs (off-line). As the feasibility We also compare ZebraLancer to some existing crowd- of ZebraLancer highly depends on the tininess of SNARK sourcing systems in Table.4. The security performance of our system overwhelms others, as it considers the most strict 12We remark the communication overhead is not a serious worry, because: (i) a blockchain network such as Ethereum does not require fairness of exchange, user anonymity and data confidential- fully meshed connections, i.e. requesters and workers can only connect ity, under that condition of minimum trust. For example, a constant number of Ethereum peers; (ii) if necessary, requesters and our design realizes the fair exchange without leaking data workers can even run on top of so-called light-weight nodes, which to a third-party information arbiter. And ZebraLancer also eventually allows them receive and send messages only related to crowdsourcing tasks; (iii) even if there is a trusted arbiter facilitating guarantees the strongest user anonymity that cannot be incentive mechanisms, the only saving in communication is just an broken by any third-party (even the registration authority), instruction about how to reward answers (and its attestation). IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 14

proofs and the efficiency of SNARK verifications, it be- TABLE 5 comes critical to establish necessary zk-SNARKs off-line. Execution time of in-contract zk-SNARK verifications. As formally discussed before, the authentication scheme Operands Length Time@ Time@ and nearly all incentive mechanisms can be stated as some Verification for well-defined deterministic constraint relationships. We first Proof Key Inputs PC-A PC-B translate these mathematical statements into their corre- Our authentication 729B 1.2KB 1.5KB 10.9ms 6.2ms sponding boolean circuit satisfiability representations. Fur- Majority (3-Worker) 729B 16.0KB 3.4KB 15.5ms 9.1ms thermore, we establish zk-SNARK for each boolean circuit, Majority (5-Worker) 730B 21.6KB 4.7KB 16.3ms 9.8ms such that all required public parameters are generated. All the above steps are done off-line, as they are executed Majority (7-Worker) 731B 27.3KB 6.0KB 17.0ms 10.3ms for only once when the system is launched. Note that the Majority (9-Worker) 729B 32.9KB 7.3KB 17.5ms 12.1ms

potential backdoors in these zk-SNARK public parameters Majority (11-Worker) 730B 38.6KB 8.6KB 17.9ms 13.1ms could be further eliminated via an off-line protocol based Note: the sizes of proofs, keys and inputs are their bytes after being on secure multi-party computation [52]. However, such an serialized in unicode. off-line setup is beyond the scope of showing our system feasibility. system can be efficiently executed in respect of verification An image annotation crowdsourcing task. To showcase the time. Moreover, our experiment results also reveal that the usability of our system, we implement a concrete crowd- spatial cost of zk-SNARK verifications is constant and tiny sourcing task of image annotation similar to the ones in at both types of PCs (exactly 17MB main memory). Also, [11]. The task is to solicit labels for an image which can the required on-chain storage for the task contracts is at later be used to train a learning machine. The task requests the acceptable magnitude of kilobyte. Therefore, the on- n answers from n workers, and can be considered as a chain performance of the system can be clearly practical, multi-choice problem. Majority voting is used to estimate considering both time and space. the “truth”. An answer is seen as “correct”, if it equals to the At the off-chain side of requester, we consider refor- “truth”. The reward amount of a worker is τ/n if he answers mation of the statements of attesting correct payments correctly, otherwise, he receives nothing. In our terminol- to enhance the off-chain efficiency, as the inherent heavy ogy, the reward R := R(A ; A1,...,A , τ) = τ/n, if A i i n i NP-reductions of generating zk-SNARK proofs could be equals the majority; otherwise, R = 0. Following [11], we i prohibitively expensive for the requester without care- implement and deploy 5 contracts in the test net to collect ful optimizations. Briefly speaking, we reform L = 3 answers, 5 answers, 7 answers, 9 answers and 11 answers n n {R, P | ∃esk s.t. ∧ Aj ||σi = Dec(esk,Cj ) ∧ Rj = from anonymous-yet-accountable workers, respectively. j=1 j=1 R(Aj ; A1,...,An, τ) ∧ pair(esk, epk)=1}, and translate it ′ n n The smart contracts are written in Solidity, a high- into L = {R, P | ∃esk s.t. ∧j=1Cj = Enc(epk,Aj ||σi) ∧j=1 level scripting language translatable to smart contracts of Rj = R(Aj ; A1,...,An, τ) ∧ pair(esk, epk)=1}, because Ethereum. We also modify Solidity compiler, such that a epk, as the exponent of modular exponentiation operations, programmer can write a contract involving zk-SNARK ver- is significantly smaller than esk in RSA, and therefore the ifications at high-level. We instantiate the encryption to be latter language corresponds a more compact arithmetic cir- RSA-OAEP-2048, the DApp-layer hash function to be SHA- cuit and essentially brings more efficient proving. The prov- 256, and the DApp-layer digital signature to be RSA signa- ing cost at the requester end therefore becomes acceptable; ture. Moreover, for zk-SNARK, we choose the construction for example, to prove the majority of 11 workers’ binary of libsnark from [53]. We deploy a test network consisting of voting, the requester spends 56 GB memory+swap and less four PCs: three PCs are equipped with Intel Xeon E3-1220V2 than 2 hours, which was hundreds of gigabytes and about CPU (PC-As), and the other one is equipped with Intel i7- one day without such the optimization [55]. 4790 CPU (PC-B); all PCs have 16 GB main memory and have Ubuntu 14.04 LTS installed. In the test net, a PC-A and

79 a PC-B play the role of miners, and the other two PC-As only validate blocks (i.e. full nodes that do not mine). One full

node plays the role of the requester, and anonymously pub- 78 lishes crowdsourcing tasks to the blockchain; and the other full node mimics workers, and sends each anonymously authenticated answer from a different blockchain address. 62 Miners are only responsible to maintain the test net and do not involve in tasks.

@PC-A @PC-B Performance evaluation. As the main bottleneck is the on- (second) Authentications Anonymous for Execution Time of Generating Attestations Attestations Generating of Time Execution chain computation of the smart contract, we first measure (3.1GHz E3-1220V2) (3.6GHz i7-4790) the time cost and the spatial cost of miners, regarding the ex- Fig. 5. The time of generating common-prefix-linkable anonymous au- ecutions of zk-SNARK verifications used in the above anno- thentications in two PCs. The box plot is derived from 12 different tation tasks. These zk-SNARKs are established for common- experiments. prefix-linkable anonymous authentications and incentive mechanisms, respectively. The results of time cost are listed We also measure the off-chain cost of anonymity, if in Table 5. It is clear that zk-SNARK verifications in our one uses the common-prefix-linkable anonymous authen- IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 15 tication. We measure the running time of generating the Algorithm 2: Example of auction-based incentive authenticating attestations at PCs. As shown in Fig.5, our Require : This contract’s address αC ; requester’s experiment results clarify that about 78 seconds are spent one-time blockchain address αR; requester’s on generating an anonymous attestation with using PC- authenticating attestation πR; RA’s public key A (3.1 GHz CPU). In PC-B (3.6GHz CPU), the running mpk; budget τ; public key epk for encrypting time can be shortened to about 62 seconds. Those are not bids and answers; SNARK’s public parameters ideal, but acceptable by the anonymity-sensitive workers. P P ; deadline of bidding in unit of block TB; deadline of instructing auction result in unit of We remark that our protocol can be trivially extended to block TI ; deadline of answering in unit of block support non-anonymous mode, in case that one gives up TA; deadline of instructing auction result in the anonymity privilege: s/he can generate a public-private unit of block TI . key pair (for digital signatures), and then registers the public 1 List keeping the ciphertexts of bids, B ← ∅; key at RA to receive a certificate bound to the public key; to 2 Map of anonymous attestations and authenticated W authenticate, s/he can simply show the certified public key, one-time blockchain addresses of workers, ← ∅; 3 if getBalance(αC ) <τ ∨ ¬Verify(αC ||αR,πR,mpk,PP ) the certificate, along with a message properly signed under then the corresponding secret key, which essentially costs nearly 4 goto 26 ; ⊲ Unidentified requester or no deposit. nothing regarding the computational efficiency. 5 timerB ← a timer expires in TB; ⊲ Collect secret bids. 6 while timerB NOT expired do 7 if αi sends πi, and his encrypted bid Bi then 8 if ¬Link(πi,πR) ∧∀π ∈ W .keys() ¬Link(πi,π ) ∧ 7 EXTENSION FOR AUCTION-BASED INCENTIVES ∗ ∗ Verify(αC ||αi,πi,mpk,PP ) then W B The pseudocode in Algorithm 2 showcases how to extend 9 .add(Wi := (πi → αi)); .add(Bi); the ZebraLancer protocol to instantiate a private and anony- 10 if αR sends i, πfake then mous crowdsourcing task using an auction-based incentive 11 if libsnark.Verifier((Bi,σi),πfake, P P ) then mechanism to pre-select workers. The main difference is 12 W .remove(πi → αi); B.remove(Bi); using auction to pre-select workers before they submit data (line 6-12). As shown in the algorithm, the workers can 13 timerI ← a timer expires after TI ; ⊲ Wait instruction signed their bids (with using their private keys associated 14 while timerI NOT expired do with the blockchain addresses), and then encrypt these 15 if αR sends selected workers S := (W1′ ,...,Wn′ ) and πauction then signed bids under requester’s public key. The ciphertexts 16 P ← (epk,τ,B1,...,B B ,W1,...,W W ); can be sent to the task contract (line 7-9). Then, the requester || || || || 17 if libsnark.Verifier((P , S),πauction, P P ) then is allowed to convince the contract that the plaintext bound 18 goto 20; to a submission is not properly signed by the corresponding worker (line 10-12). Moreover, the requester is then incen- 19 S = W ; ⊲ All workers are “selected” if no instruction. tivized to leverage the outsource-then-prove method to prove 20 timerA ← a timer expires after TA; ⊲ Collect answers the result of auction (i.e. the selected workers) to the task 21 while ||S|| 6= 0 ∧ timerA NOT expired do contract (line 13-18). After the workers are selected, they 22 if αi sends encrypted answer Ci then can submit answers to the contract (line 20-25). 23 if (∗→ αi) ∈ S then 24 transfer(αC, αi, Bi); The main motivation of the above “private” bidding 25 S.remove((∗→ αi)); process is to prevent a malicious worker from learning the bidding strategies of honest workers through exploring 26 transfer(αC, αR, getBalance(αC)); ⊲ Refund the else all historic bids stored in the open blockchain, which is an implicit requirement of most auction-based incentive mechanisms (mainly for truthfulness). For example, when the requester is using an incentive mechanism (e.g. a variant to make the auction-based extension to be anonymous- of reverse Vickrey auction [8]) to make workers give their yet-accountable, as the multiple bidding of each worker truthful bids, if a malicious worker can well estimate the is prevented by its subtle common-prefix linkability. To distribution of all other bids, then an optimized bid, which further allow a worker submit k bids in a single task, we is usually untruthful, can be explored by him to break can instantiate a counter in the smart contract to track the the requester’s auction mechanism. Note that the public number of bids from each worker, because of the common- parameter of the zk-SNARK for verifying the result of the prefix-linkability property. auction (i.e. pre-selected workers) is established off-line and published as common knowledge, such that it can be referred in the contract by P P . 8 CONCLUSION & OPEN PROBLEMS Remark that the private auction atop blockchain is an ZebraLancer can facilitate a large variety of incentive mech- interesting orthogonal problem [56] per se. Nevertheless, anisms to realize the fair exchange between the crowd- ZebraLancer is really general, and can be adapted for a very shared data and their corresponding rewards, without the broad variety of incentive mechanisms including auctions, involvement of any third-party arbiter. Moreover, it shows although we didn’t intentionally focus on private auctions. the practicability to resolve two natural tensions in the It is also clear that our common-prefix-linkable anony- use-case of the decentralized crowdsourcing atop open mous authentication scheme can be straightforwardly used blockchain: one between the data confidentiality and the IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 16 blockchain transparency, and the other one between the [8] D. Zhao, X.-Y. Li, and H. Ma, “How to crowdsource tasks truth- participants’ anonymity and the their accountability. fully without sacrificing utility: Online incentive mechanisms with budget constraint,” in Proc. IEEE INFOCOM 2014, pp. 1213–1221. Along the way, we put forth a new anonymous authen- [9] D. Yang, G. Xue, X. Fang, and J. Tang, “Crowdsourcing to smart- tication scheme. Besides the strong anonymity that cannot phones: Incentive mechanism design for mobile phone sensing,” be revoked by any authority, it also supports a delicate link- in Proc. ACM MobiCom 2012, pp. 173–184. [10] D. Peng, F. Wu, and G. Chen, “Pay as how well you do: A ability only for messages that share the common prefix and quality based incentive mechanism for crowdsensing,” in Proc. are authenticated by the same user. A concrete construction ACM MobiHoc 2015, pp. 177–186. of the scheme is proposed, and it shows the compatibility to [11] N. Shah and D. Zhou, “Double or Nothing: multiplicative Incen- tive Mechanisms for Crowdsourcing,” J. Mach. Learn. Res., vol. 17, real-world blockchain infrastructure. The delicate linkability no. 1, pp. 5725–5776, 2016. of the scheme is subtly different from the state-of-the-art [12] H. Jin, L. Su, D. Chen, K. Nahrstedt, and J. Xu, “Quality of in- of anonymous-yet-accountable authentication schemes [34– formation aware incentive mechanisms for mobile crowd sensing 39, 42], and we envision the scheme might be of indepen- systems,” in Proc. ACM MobiHoc 2015, pp. 167–176. [13] P. Muncaster. China’s Internet wunderkind in dent interests. We also develop a general outsource-then-prove the dock over alleged fraud. [Online]. Available: technique to use smart contracts in a privacy-preserving http://www.theregister.co.uk/2012/07/03/qihoo fraud traffic comscore way. This technique can further extend the scope of ap- [14] H. Wei. Alipay apologizes for leak of personal info. [Online]. Available: plications atop some existing privacy-preserving blockchain http://www.chinadaily.com.cn/china/2014-01/07/content 17219203.htm infrastructures such as [46–48]. [15] H. Kelly. Apple account hack raises con- cern about cloud storage. [Online]. Available: Open problems. Since this work is the first attempt http://www.cnn.com/2012/08/06/tech/mobile/icloud-security-hack/ of decentralizing data crowdsourcing atop the real-world [16] B. McInnis, D. Cosley, C. Nam, and G. Leshed, “Taking a HIT: blockchain in a privacy-preserving way, the area remains Designing around rejection, mistrust, risk, and workers’ experiences in Amazon Mechanical Turk,” in Proc. ACM CHI 2016, pp. largely unexplored. Here we name a few open questions, 2271–2282. and we defer solutions to them in our future work. First, [17] K. B. Mike Isaac and S. Frenkel. Uber hid 2016 breach, there are many incentive mechanisms using reputation paying hackers to delete stolen data. [Online]. Available: https://www.nytimes.com/2017/11/21/technology/uber-hack.html systems, can we further extend our implementations to [18] N. Szabo, “Formalizing and securing relationships on public net- support those incentives? Second, as the current smart con- works,” First Monday, vol. 2, no. 9, 1997. tract technology is at an infant stage and can only allow [19] A. Kate. Blockchain privacy: Challenges, solutions, very tiny on-chain storage, can we further optimize our and unresolved issues. [Online]. Available: http://www.isical.ac.in/∼rcbose/blockchain2017/lecture/Kate Slides.pdf implementations for particular crowdsourcing tasks to assist [20] H. A. Murray, “Thematic apperception test.” 1943. more large-scale tasks, e.g. to collect annotations for millions [21] J. R. Douceur, “The sybil attack,” in International workshop on peer- of images (i.e. the scale of ImageNet dataset)? Third, our to-peer systems. Springer, 2002, pp. 251–260. anonymous protocol currently either relies on the under- [22] K. Yang, K. Zhang, J. Ren, and X. Shen, “Security and privacy in mobile crowdsourcing networks: challenges and opportunities,” lying blockchain to support anonymous transaction, or re- IEEE Commun. Mag., vol. 53, no. 8, pp. 75–81, 2015. quires workers/requesters use one-time blockchain account [23] M. Li, J. Weng, A. Yang, and W. Lu, “CrowdBC: A Blockchain- to submit data and receive reward. Can we design a (DApp- based Decentralized Framework for Crowdsourcing,” 2017. [Online]. Available: http://eprint.iacr.org/2017/444.pdf layer) protocol to solve the drawbacks? Last but not least, [24] S. Saroiu and A. Wolman, “I am a sensor, and I approve this our protocol relies on a trusted registration authority (RA) to message,” in Proc. ACM HotMobile 2010, pp. 37–42. establish identities. Although such a trusted RA could be a [25] Y. Baba and H. Kashima, “Statistical quality estimation for general reasonable assumption (in view of real-world experiences), crowdsourcing tasks,” in Proc. ACM SIGKDD 2013, pp. 554–562. [26] S. Goldwasser and S. Micali, “Probabilistic encryption & how to it is more tempting to develop an alternative methodology play mental poker keeping secret all partial information,” in Proc. to remove the third-party RA without sacrificing securities. ACM STOC 1982, pp. 365–377. For example, can we adapt the successful invention of [27] N. Salehi, L. C. Irani, M. S. Bernstein et al., “We are dynamo: Overcoming stalling and friction in collective action for crowd proof-of-work to build up a crowdsourcing framework from workers,” in Proc. ACM CHI 2015, pp. 1621–1630. literally “zero” trust without any established identity? [28] S. Gisdakis, T. Giannetsos, and P. Papadimitratos, “Security, privacy, and incentive provision for mobile crowd sensing systems,” IEEE Internet of Things Journal, vol. 3, no. 5, pp. 839–853, 2016. [29] F. Buccafurri, G. Lax, S. Nicolazzo, and A. Nocera, “Tweetchain: REFERENCES An alternative to blockchain for crowd-based applications,” in [1] J. Deng, W. Dong, R. Socher et al., “Imagenet: A large-scale International Conference on Web Engineering 2017. Springer, pp. hierarchical image database,” in Proc. IEEE CVPR 2009, pp. 248– 386–393. 255. [30] C. Tanas, S. Delgado-Segura, and J. Herrera-Joancomart, “An [2] Mechanical Turk. [Online]. Available: integrated reward and reputation mechanism for MCS preserving https://www.mturk.com/mturk/ users privacy,” in International Workshop on Data Privacy Manage- [3] R. Ganti, F. Ye, and H. Lei, “Mobile crowdsensing: current state ment and Security Assurance 2015, pp. 83–99. and future challenges,” IEEE Commun. Mag., vol. 49, no. 11, pp. [31] A. Muehlemann, “Sentiment protocol: A decentralized protocol 32–39, 2011. leveraging crowd sourced wisdom,” 2017. [Online]. Available: [4] Waze. [Online]. Available: https://status.waze.com https://eprint.iacr.org/2017/1133.pdf [5] S. Devarakonda, P. Sevusu, H. Liu, R. Liu, L. Iftode, and B. Nath, [32] Q. Li and G. Cao, “Providing efficient privacy-aware incentives “Real-time air quality monitoring through mobile sensing in for mobile sensing,” in Proc. IEEE ICDCS 2014, pp. 208–217. metropolitan areas,” in Proc. ACM UrbComp 2013, pp. 15:1–15:8. [33] S. Rahaman, L. Cheng, D. D. Yao, H. Li, and J.-M. J. Park, [6] Y. Wen, J. Shi, Q. Zhang et al., “Quality-driven auction-based “Provably secure anonymous-yet-accountable crowdsensing with incentive mechanism for mobile crowd sensing,” IEEE Trans. Veh. scalable sublinear revocation,” Proceedings on Privacy Enhancing Technol., vol. 64, no. 9, pp. 4203–4214, 2015. Technologies, vol. 2017, no. 4, pp. 384–403, 2017. [7] Y. Zhang and M. Van der Schaar, “Reputation-based incentive [34] D. Chaum, “Blind signatures for untraceable payments,” in Pro- protocols in crowdsourcing applications,” in Proc. IEEE INFOCOM ceedings on Advances in cryptology, 1983, pp. 199–203. 2012, pp. 2140–2148. IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 17

[35] D. Chaum, A. Fiat, and M. Naor, “Untraceable electronic cash,” in Proceedings on Advances in cryptology. Springer, 1990, pp. 319–327. [36] J. Camenisch and A. Lysyanskaya, “An efficient system for non- transferable anonymous credentials with optional anonymity revocation,” in Advances in Cryptology–EUROCRYPT. Springer, 2011, pp. 93–118. [37] S. Xu and M. Yung, “K-anonymous secret handshakes with reusable credentials,” in Proc. ACM CCS 2004, pp. 158–167. [38] I. Teranishi, J. Furukawa, and K. Sako, “K-times anonymous authentication,” in Asiacrypt, 2004, vol. 3329, pp. 308–322. [39] J. Camenisch, S. Hohenberger, M. Kohlweiss, A. Lysyanskaya, and M. Meyerovich, “How to win the clonewars: efficient periodic n- times anonymous authentication,” in Proc. ACM CCS 2006, pp. 201–210. [40] T. Nakanishi, T. Fujiwara, and H. Watanabe, “A linkable group signature and its application to secret voting,” Trans. of Information Processing Society of Japan, vol. 40, no. 7, pp. 3085–3096, 1999. [41] J. K. Liu, V. K. Wei, and D. S. Wong, “Linkable spontaneous anonymous group signature for ad hoc groups,” in Australasian Conference on Information Security and Privacy 2004, pp. 325–335. [42] M. H. Au, W. Susilo, and S.-M. Yiu, “Event-oriented k-times revocable-iff-linked group signatures,” in Australasian Conference on Information Security and Privacy 2006, pp. 223–234. [43] P. P. Tsang, V. K. Wei, T. K. Chan, M. H. Au, J. K. Liu, and D. S. Wong, “Separable linkable threshold ring signatures,” in International Conference on Cryptology in India, 2004, pp. 384–398. [44] A. Kiayias, H.-S. Zhou, and V. Zikas, “Fair and robust multi-party computation using a global transaction ledger,” in Proc. Eurocrypt 2016, pp. 705–734. [45] G. Zyskind, O. Nathan, and A. Pentland, “Enigma: Decentralized computation platform with guaranteed privacy,” 2015. [Online]. Available: https://arxiv.org/abs/1506.03471 [46] A. Kosba, A. Miller, E. Shi et al., “Hawk: The blockchain model of cryptography and privacy-preserving smart contracts,” in Proc. IEEE S&P 2016, pp. 839–858. [47] D. Hopwood, S. Bowe, T. Hornby, and N. Wilcox, “Zcash Protocol Specification.” [Online]. Available: https://github.com/zcash/zips/blob/master/protocol/protocol.pdf [48] Ethereum Team. Byzantium HF Announcement. [Online]. Available: https://blog.ethereum.org/2017/10/12/byzantium-hf-announcement/ [49] J. A. Garay, A. Kiayias, and N. Leonardos, “The bitcoin backbone protocol: Analysis and applications,” in Proc. EUROCRYPT 2015, pp. 281–310. [50] R. Pass, L. Seeman, and A. Shelat, “Analysis of the blockchain protocol in asynchronous networks,” in Proc. EUROCRYPT 2017, pp. 643–673. [51] V. Buterin, “A next-generation smart contract and decentralized application platform,” 2014. [Online]. Available: https://github.com/ethereum/wiki/wiki/White-Paper [52] E. Ben-Sasson, A. Chiesa, M. Green, E. Tromer, and M. Virza, “Se- cure sampling of public parameters for succinct zero knowledge proofs,” in Proc. IEEE S&P 2015, pp. 287–304. [53] E. Ben-Sasson, A. Chiesa, D. Genkin et al., “SNARKs for C: Verifying program executions succinctly and in zero knowledge,” in Advances in Cryptology–CRYPTO. Springer, 2013, pp. 90–108. [54] C. Frantz and M. Nowostawski, “From institutions to code: To- wards automated generation of smart contracts,” in Proc. IEEE FAS*W 2016, pp. 210–215. [55] Y. Lu, Q. Tang, , and G. Wang, “On Enabling Machine Learning Tasks atop Public Blockchains: a Crowdsourcing Approach,” in Proc. IEEE ICDM Workshops (BlockSEA) 2018. [56] H. Galal and A. Youssef, “Verifiable sealed-bid auction on the ethereum blockchain,” in International Conference on Financial Cryptography and Data Security, Trusted Smart Contracts Workshop. Springer, 2018.