Worker Recommendation for Crowdsourced Q&A Services

Worker Recommendation for Crowdsourced Q&A Services: A Triple-Factor Aware Approach

Zheng Liu Lei Chen Department of Computer Science and Department of Computer Science and Engineering Engineering HKUST HKUST [email protected] [email protected]

ABSTRACT Answer, and Quora. As a result, it becomes increasingly Worker Recommendation (WR) is one of the most impor- convenient to make use of the crowd’s intelligence. With tant functions for crowdsourced Q&A services. Specifically, the quick growth of crowd’s participation, one of the central given a set of tasks to be solved, WR recommends each issues in managing crowdsourced Q&A platforms is worker task with a certain group of workers, whom are expected to recommendation, which is to find appropriate workers for give timely answers with high qualities. To address the WR the published tasks and actively ask for their solutions. Tak- problem, recent studies have introduced a number of rec- ing Quora as an example, when people are going to publish ommendation approaches, which take advantage of workers’ their tasks, the platform provides some candidate workers expertises or preferences towards different types of tasks. who are expected to give satisfactory feedbacks, and allows However, without a thorough consideration of workers’ char- people to request answers from these workers. acters, such approaches will lead to either inadequate task The worker recommendation problem has drawn wide at- fulfillment or inferior answer quality. tention and several primary works have been proposed, such In this work, we propose the Triple-factor Aware Worker as [27, 33, 16, 35, 29]. Fundamentally, most of the existing Recommendation framework, which collectively considers methods treat the worker recommendation as the “expert workers’ expertises, preferences and activenesses to maxi- finding” problem, where two basic operations are involved. mize the overall production of high quality answers. We con- Firstly, tasks are categorized according to some specific cri- struct the Latent Hierarchical Factorization Model, which is teria. Secondly, workers are ranked with their expertise (or able to infer the tasks’ underlying categories and workers’ authority) to each category of tasks based on their histori- latent characters from the historical data; and we propose a cal answers. In this way, given a new task, its corresponding novel parameter inference method, which only requires the category can be recognized and workers with the top exper- processing of positive instances, giving rise to significantly tise on this category will be recommended for it. higher time efficiency and better inference quality. What’s In spite of the existing progress, the worker recommen- more, the sampling-based recommendation algorithm is de- dation problem cannot be effectively addressed only with veloped, such that the near optimal worker recommendation workers’ expertise. To facilitate timely acquisition of high can be generated for a presented batch of tasks with consid- quality answers, the worker recommendation should jointly erably reduced time consumption. Comprehensive experi- consider the following three factors: expertise, prefer- ments have been carried out using both real and synthetic ence and activeness. The detailed reasons to incorporate datasets, whose results verify the effectiveness and efficiency these three factors are given as follows. First of all, the of our proposed methods. crowds have diverse skills in different domains. To make sure the acquired answers are reliable, the recommended PVLDB Reference Format: workers must be equipped with enough expertise on the re- Zheng Liu, Lei Chen. Worker Recommendation for Crowdsourced lated areas of the given tasks. Secondly, the crowd’s pref- Q&A Services: A Triple-Factor Aware Approach. PVLDB, 11 (3): 380-392, 2017. erences towards different types of tasks are discrepant. For DOI: 10.14778/3157794.3157805 example, some people may prefer mathematical problems to those on classical literature; if tasks on both categories are 1. INTRODUCTION available, they will probably choose the mathematical ones with higher priority. To guarantee workers’ acceptance, the Recently a large number of crowdsourced Q&A applica- recommendation should consider the ones who have satisfac- tions have come into being, such as Stack Exchange, Yahoo tory preferences towards the type of presented tasks. Lastly, Permission to make digital or hard copies of all or part of this work for the crowd’s activenesses are significantly different from each personal or classroom use is granted without fee provided that copies are other: the enthusiastic workers would probably solve a large not made or distributed for profit or commercial advantage and that copies number of tasks within a short period of time, while the pas- bear this notice and the full citation on the first page. To copy otherwise, to sive ones might only make very little production. For the republish, to post on servers or to redistribute to lists, requires prior specific sake of timely feedbacks, the recommended workers are de- permission and/or a fee. Articles from this volume were invited to present sirable to be active in generating new answers. (For better their results at The 44th International Conference on Very Large Data Bases, August 2018, Rio de Janeiro, Brazil. demonstration of the above three factors, real world data Proceedings of the VLDB Endowment, Vol. 11, No. 3 exploration will be demonstrated in Section 3.) Copyright 2017 VLDB Endowment 2150-8097/17/11... $ 10.00. DOI: 10.14778/3157794.3157805

380 In this work, we come up with the Triple-factor Aware The major contributions of this work are summarized as Worker Recommendation framework (referred as TriRec), the following points. where all the aforementioned factors are jointly considered • We propose TriRec to maximize the production of high for the worker recommendation. In TriRec, we construct quality answers in crowdsourced Q&A services, where work- the Latent Hierarchical Factorization Model (referred as ers’ expertises, preferences, and activenesses are collectively LHFM), which categorizes the tasks based on their con- considered. To the best of our knowledge, this is the first tents and characterizes the workers with their expertises, work incorporating all these factors. preferences and activenesses. With LHFM, we can directly • We come up with the Latent Hierarchical Factorization identify the category of a newly presented task, and figure Model (LHFM), which encodes tasks with their semantic out the workers who will not only complete the given task signatures, and characterizes workers with their expertises, with large probabilities, but also be capable of making high preferences and activenesses. With LHFM, workers’ suit- quality answers. In addition to LHFM, we develop the op- ableness towards a given task can be directly acquired. timized worker recommendation strategy for the scenarios • Our proposed model avoids explicit reasoning of the nega- where new tasks are processed in batches. With such a strat- tive instances, which eliminates potential errors and enables egy, the overall recommendation effect can be maximized for the parameters to be inferred with high efficiency. a whole batch of presented tasks. • The optimal worker recommendation is produced with the However, the realization of TriRec is non-trivial, where the sampling-based algorithm, which not only considerably re- following technical challenges have to be conquered. (1) The duces the time consumption, but also generates the approx- answering activities (i.e., which tasks have been answered imation result whose expectation is lower-bounded by 1/4 by each of the workers) are jointly affected by the workers’ of the optimal solution. preferences and activenesses. If a task is not answered by • We conduct extensive experiments with both real and syn- a worker, it could be resulted from either the worker’s non- thetic datasets, whose results verify our improvement over interest to the given task, or the limitation of her activeness. the state-of-the-art approaches. As such, both factors, preference and activeness, are not di- The rest of our paper is organized as follows. We re- rectly reflected by the observation data. Meanwhile, the view the related work in Section 2. In Section 3, we explore majority of workers have only answered a small fraction of the data from crowdsourced Q&A services and demonstrate tasks in history (e.g., a common worker in Stack Overflow the workers’ characters of expertises, preferences and active- may produce around a few hundreds answers; while, there nesses. In Section 4, we introduce the key definitions of this are hundreds of millions tasks ever issued). Therefore, ex- work. The LHFM and Optimal Worker Recommendation plicit reasoning of the non-answered tasks will be not only are discussed in Section 5 and 6, respectively; and the ex- possibly erroneous, but also severely time consuming. (2) perimental study is reported in Section 7. Finally, we draw The optimal worker recommendation, formed as a weighted the conclusion and discuss the future work in Section 8. bipartite assignment problem, cannot be solved by the conventional approaches in a temporally scalable manner. (e.g., 2. RELATED WORK Hungarian algorithms or Successive Shortest Paths, whose The related works are reviewed from the following two time cost could be as expensive as O(|W |2|T |2)). Whereas, aspects: Worker Recommendation in Crowdsourced Q&A in real applications, large-scale cases might be frequently services and Crowdsourced Task Allocation. confronted. (For example, there are over 6 million regis- Worker Recommendation. Worker recommendation tered workers in Stack Overflow, which will make it almost is an important function for Q&A services. A large pro- impractical to optimally process merely a batch of few hun- portion of the existing works fall into the domain of Expert dreds tasks.) Finding (EF for short). Given people’s historical answering To address the above challenges, we come up with the records, EF estimates the workers underlying capabilities of following technical mechanisms. (1) LHFM probabilisti- solving each type of tasks correctly. Representative works cally models the answering activities as unified distributions include (but do not limited to) [27, 35, 1, 29]. In particular, of workers’ preferences and activenesses. With the maxi- [27] proposes a model to determine the tasks’ latent topics mization of corresponding posteriori, workers’ preferences and to estimate the workers’ expertises over different top- and activenesses can be judiciously learned from the his- ics through link analysis; [35] takes the correlation among torical observation. Meanwhile, LHFM’s probabilistic for- tasks’ categories into consideration, and workers’ expertises mulations avoid explicit reasoning of the negative instances, are collaboratively estimated with answer records in all the and the parameter inference can be conducted with purely relevant categories; [1] proposes a semi-supervised coupled positive instances. (An instance associates with a pair of mutual reinforcement framework for simultaneous calcula- task and worker, which is positive if the task is answered by tion of workers’ expertise and answers’ quality; and [29] for- the worker; otherwise, it is negative.) In this way, poten- mulates the expert finding problem as a graph-regularized tial errors can be avoided and the inference process will be matrix factorization process, with workers’ social relation- greatly accelerated. (2) The optimal worker recommenda- ship taken into account. Some of these techniques are also tion is produced with our sampling-based algorithm. In our employed in our estimation of worker expertise, such as la- proposed approach, the workers’ suitableness towards each tent representations of workers and tasks, and factorization type of tasks are indexed offline; based on which worker formulation. What’s more, the techniques for exploration recommendation can be efficiently produced for the newly of auxiliary data (e.g., social relationship) can also be in- presented tasks through sampling the pre-constructed ta- corporated into our framework to further improve the infer- ble. In particular, near optimal result (whose expectation ence quality. Despite some works on workers’ preferences is above 1/4 of the optimal solution) can be obtained with estimation, e.g., [26, 15, 30], which employ incremental col- O(|T|) time complexity. laborative filtering, latent semantic indexing and content-

381 based matrix factorization as the inference strategies, they are technically similar to the existing studies on expert finding. Because of leaving out the activeness factor, they are unable to infer the workers’ preferences with high qualities, which has been theoretically analyzed and experimentally counts tested in the previous sections. Most of the existing works keywords simply focus on one aspect of workers’ characters, rather than comprehensively considering their expertises, preferences and activenesses, they are inferior to maximize the overall production of high quality answers. workers workers Crowdsourced Task Allocation. Task allocation is an (a) (b) intensively studied issue in crowdsourcing research. For example, [10, 4] proposes the offline task assignment strategy in the spatial scenario; [5] goes beyond the basic settings (with only spatial and temporal constraints) by considering the situation where workers are equipped with diversified

skills; besides, [21] develop optimal online strategies to deal scores with the cases where workers and tasks are presented in a keywords streaming fashion. In spite of the fundamentally different application scenarios, these works also involve the optimal solution for weighted bipartite assignment problems. How- ever, all the proposed methods in these works incur running workers workers (c) (d) time of O(|W|1 |T|2 )( , ≥ 1), which is significantly 1 2 Figure 1: (a), (b): Heat Map and Dots of count@keyword; (c), above the O(|T|) time consumption in our work, and not (d): Heat Map and Dots of score@keyword. scalable enough to handle the real crowdsourced Q&A services where a huge number of workers are presented. We aggregate each worker’s (e.g., worker w) associated In addition, a number of task allocation works have been records into two values: proposed for matching tasks to workers with “right skills”, • count@keyword, which equals to the count of w’s answered such as [32, 31, 11]. Particularly, [32] estimates the work- tasks containing a certain keyword; and ers reliabilities on every specific task, and selects the top- • score@keyword, which equals to the average score of w’s k tasks for each sequentially arrived worker to achieve the answered tasks with the corresponding keyword. largest answer quality improvement; [31] utilizes the taxon- We demonstrate the aggregated results from four aspects: omy of skills to match the tasks with workers of necessary 1) workers’ counts on different keywords (reflection of pref- skills; instead of using a predefined taxonomy, [11] leverages erence), 2) workers’ average scores on different keywords (re- the knowledge base to analyze tasks skill domains, based on flection of expertise), 3) the relationship between workers’ which tasks are assigned to workers with appropriate skills average scores and counts on the common keywords (corre- so as to gain the maximum ambiguity reduction. Our work lation of preference and expertise), and 4) the distribution of is substantially different from the above ones from the fol- workers’ annual counts of answers (reflection of activeness). lowing aspects. Firstly, because of the application to the Counts on keywords. We randomly select 20 workers crowdsourced Q&A scenario, our optimization goal is the who produced over 100 answers within the year of 2016 and overall acquisition of timely and qualified answers (measured demonstrate their preferences in terms of count@keyword in in terms of EFP), rather than simply the answers quality. Figure 1 (a) and (b). (20 popular keywords are randomly Secondly, workers expertise has to be captured in a totally chosen for the demo of heat map.) In the demonstration different way because of the application scenario’s differ- results, only a few blocks in the heat map are deeply colored, ence. Thirdly, the recommendation for crowdsourced Q&A and most of the points’ counts in the dots map are below services requires effective estimation and utilization of work- 10, which reflects that each worker would only make frequent ers’ activeness and preferences; however, neither factors are answers for tasks associated with a small set of keywords. crucial for task assignment scenarios (e.g., task allocation in Besides, the workers’ differences in the heat map indicate AMT), thus neglected by the aforementioned methods. their greatly diversified preferences over different keywords. Average scores on keywords. We further demonstrate the workers’ expertises with score@keyword in Figure 1 (c) 3. DATA EXPLORATION and (d). Similar to the observation in (a) and (b), each In this section, we explore the data collected from Stack worker is only good at tasks associated with a small set of Overflow1 to analyze three types of workers’ characters: pref- keywords, whose answers towards such tasks are rated with erences, expertises and activenesses. The explored dataset comparatively high scores; in addition, workers’ expertises is composed of historical tasks and the scores of workers’ are significantly diversified across the keywords. answers towards them. The content of each task is rep- Counts and average scores on common keywords. resented by a bag of keywords (e.g., {c++, polymorphism, In Figure 2 (a), each dot represents the tuple of a worker’s virtual function, dynamic linkage}), and each worker’s score (count@keyword, score@keyword) to a common keyword. on a specific task is normalized by the corresponding task’s The averages and standard variances of score@keyword for maximum score (so that it is confined within [0, 1]). different count@keyword are aggregated in Figure 2 (b). Ac- cording to the presented results, the variation of answers’ 1 counts does not clearly affect the value distributions on https://stackoverflow.com

382 ieta eol aetewreswohv oeanswering some have No- who workers the recommendation. pro- take worker only will we future that which the tice inferred, for guidance are vide activenesses and preferences [0 of ˆ score (i.e., maximum the by malized a aao e nwrdtasks answered { her of data cal ( answers. recom- their be give will task with Each mended programming. oriented objective words linkage with dynamic task function, tual a underly- e.g., their categories; reflect ing contents Tasks’ Bag-of-Words. of form INFRASTRUCTURE AND DEFINITION 4. captured strategy. precisely recommendation be the to within an- need considered new jointly factors generating and these in all active Thus, and tasks, swers. given ex- the and preference for is satisfactory pertise To with worker equipped a answers. be recommendation, to new desirable of making effectiveness of the frequencies guarantee qual- and tasks, answers, of of active- choices ity their and affect significantly expertises which preferences, nesses, diversified greatly have 400 over produce even than may less others answers. made some active- workers while workers’ most answers, that 100 observed diversified: is greatly It are are nesses 2016 3. num- in Figure greatest answers of in the counts aggregated annual with whose ones answers), of 60,000 bers Top the (from be workers also but workers task, answers. the given qualified find a making to accept of as probably capable so only account not fac- into will both taken who result, be a to As need correlated. tors necessarily not are preferences and expertises workers’ indicates which score@keyword, counts. different of variances standard and keywords; scores common average on (b) score@keyword) (count@keyword, of Dots 2: Figure s t w Worker. Task. Summary. Answers. of Counts Annual # workers scores 1 2 3 4 5 6 70 } 0 0 0 0 0 0 0 , t iue3: Figure ] rmtehsoia aa h okr’expertises, workers’ the data, historical the From 1]. ∈ s t w T w / s ˆ o h nfiaino cl,ec score each scale, of unification the For . ahts srpeetdb t otn,i the in content, its by represented is task Each t ,s hti a ecnndwti h range the within confined be can it that so ), vrg cr n onso omnKyod:(a) Keywords: common on counts and score Average ahworker Each 100 counts c (a) t h xlrto eut eelta workers that reveal results exploration The itiuino nulcut fanswers. of counts annual of Distribution ieetwres hmwl eivtdto invited be will whom workers, different c t srfre stets’ capacity.) task’s the as referred is 200 counts w sascae ihtehistori- the with associated is } of T s answers w niae t ffiito with affiliation its indicates scores t ftecrepnigtask corresponding the of n crso e answers her of scores and ernol ape300 sample randomly We 300 { oyopim vir- polymorphism, counts (b) 400 s t w snor- is 500 383 uto,franwypeetdts,teactive the task, presented newly a for duction, period. certain workers a a within ap- answers explains answers of judiciously count also individual but each likelihood, What’s provides pearance only and 23]). not activenesses workers’ [24, preferences of incorporation in explicit the discussed accurate more, (as more reasoning facilitates scores) probabilistic and (i.e., appearances data answers’ observation the whose towards and one LHFM, explanation tasks’ and in collaborative tightly-coupled The task, are level. of features latent activeness type workers’ her each reflecting to scalar, representing preference latent vectors, and latent expertise two her by pro- each and characterized which category; content, is underlying its worker its of into on indicator natural mapped based a is vides space task semantic each with latent particular, workers the In characterizes and factors. latent space representstheir semantic the (LHFM) in Model tasks Factorization Hierarchical the tent as a referred to is recommended maximum as (denoted the only capacity and be worker’s tasks, can of framework. worker number our each finite with such paper, joined data; this seamlessly users’ In be acquire newly can initiatively exploration the operation active to an with the [9]) deal to (e.g., To resort methods may history. we an- the workers, has in registered everyone tasks i.e., some consideration, swered into history in records il fis-oe rtsre)i dpe,tenietop-K naive recommendation, worker the for applied adopted, directly is be can served) strategy first (first-come, ciple recommended. whereby being properties, of the desired suitableness indicates the payoff her final all reflecting expertise. expected of high of with satisfaction value equipped worker’s large is pref- a worker such, the strong quality As a iff. answer’s the good has and be and task; will active presented is the towards worker erence the iff. probability is mendation as (denoted payoff quality final of be will probability with period time with worker up come we Definition1. aspects, these Payoff suitableness all Final worker’s of a terms measure in quantitatively To dation. expertises high okrRecommendation. Worker Model. Factorization Hierarchical Latent hntssaepoesdi temadteFF prin- FCFS the and stream in processed are tasks When large a with acquired be can answer timely Apparently, New w History h will who , iue4: Figure p EP,woedfiiini ttda follows. as stated is definition whose (EFP), t w EP ups task Suppose (EFP) n preferences and ∗ s s t w t w 1) Optimal Allocation . ie,nraie cr) h expected The score). normalized (i.e., EFP Characterization cetadase task answer and accept EFP( Optimizer ytmInfrastructure. System Calculation Encoding p t w ,w t, c and ; w r eial o recommen- for desirable are ) ). sdsusdi h Intro- the in discussed As fmkn uharecom- a such making of ) LHFM t 2) srcmeddwith recommended is h rdcdanswer produced the Recommend Characters Topics okr with workers t Expected navalid a in h La- The Table 1: General Notations to represent the underlying categories behind the tasks’ con- Notation Description tents2. Each of the topics is a specific distribution over the T the whole set of tasks for current stage vocabulary, indicating every word’s appearance probability. W the whole set of workers for the current stage. The content of each task is composed of a small set of

Wt workers in W who have answered task t. duplicable words. Because of such a “short-text” nature, we assume that the words of a common task are generated Tw tasks in T which have been answered by worker w. from one identical topic (e.g., task t is associated with topic ct tasks’ capacity. βzt ), which is a common practice to deal with short texts cw worker’s capacity. [28]. The index of each task’s associated topic (i.e., zt) is a βi the i-th topic. hidden variable, whose distribution is specified by its latent θt task t’s latent semantic vector. semantic vector θt (i.e., the ith component θt[i] stands for zt task t’s underlying topic the probability of zt = i). K ew worker w’s latent expertise vector. Given the topics {βi}1 and latent semantic vector θt, the ρw worker w’s latent preference vector. generation probability for task t’s content is presented as: a worker w’s activeness scaler. w K Y XK P ({o}t|{βzt }1 , θt) = P (o|βzt )P (βzt |θt), (1) Ψw worker w’s count of answers. {o}t zt=1 Iw worker w’s acceptance of task t. t where o is a word in task t’s content. It’s obvious that θt w st worker w’s score to task t. indicates task t’s underlying categorical information; there- EFP(t, w) the EFP of recommending task t with worker w. fore, it can be regarded as t’s latent semantic signature. Worker Characterization. Each worker w is charac- where the most appropriate workers will be recommended terized with her latent factors of activeness aw, preference to the currently submitted task. While, we still have to find ρw and expertise ew; these factors, together with the cor- out the worker recommendation strategy when tasks are pre- responding tasks’ latent semantics, jointly explain the an- sented and processed in a batch manner. In particular, it swers’ appearances and scores with the following probabilis- is necessary to figure out how to maximize the overall EFP tic formulations. for a whole batch of presented tasks. Such a problem is re- • Answers’ Appearances. The workers’ answering records ferred as Optimal Worker Recommendation (OWR). Given are partitioned into different time slots (e.g., every month) tasks’ and workers’ capacity constraints, together with EFP and sequentially processed. In each time slot, two pieces between each pair of task and worker, OWR is formed as the of information are available: 1) the number of answers pro- following maximum weighted bipartite assignment problem. duced by each worker (count of answers); 2) the types of Definition2. (OWR) Given the presented batch of tasks, tasks answered by each worker (task selection). Both infor- the available workers, the capacity constraints and EFP of mation are probabilistically reasoned as follows. each task-worker pair, the optimal worker recommendation We model worker w’s answer appearances with two se- assigns each task with a set of workers, which produces the quential steps: Browsing and Selection. Firstly, the worker maximum total EFP and preserves the capacity feasibility. browses a total of bw tasks subject to her activeness. Follow- System Infrastructure. We demonstrate the system ing the conventional modeling of counting data [3], we adopt infrastructure in Figure 4. In particular, the workers’ his- the Poisson process to capture bw’s value, parameterized of torical answering records are fully utilized by LHFM, where w’s activeness aw, i.e. Poisson(aw): latent topics are generated to encode the tasks’ semantics, bw −aw and latent characters are estimated to represent the workers P (bw|aw) = (aw) e /bw! . (2) in terms of expertise, preference and activeness. Given a With such a formulation, aw can be interpreted as w’s ex- set of new tasks, the system calculates workers’ suitableness pected browsing frequency (e.g., the monthly number of w’s (i.e., EFP) towards each presented task; by solving the con- browsed tasks on average). Secondly, from the browsed sequent OWR problem, each task gets its recommended set tasks, w makes selection according to her preference (repre- of workers, which leads to the maximum overall EFP. sented by ρw) towards the type of a presented task (indicated We summarize the frequently used notations in Table 1 to by θt). In particular, ρw is a K-dimensional weight vector, facilitate comprehension. reflecting w’s odds of choosing each type of tasks; therefore, t’s probability of being selected is captured by the Bernoulli 5. LATENTHIERARCHICAL FACTORIZA- distribution Ber(ρwθt), i.e., ( w TION MODEL w ρwθt, It = 1, P (It |ρw, θt) = w (3) LHFM consists of two functional components: 1) the Se- 1 − ρwθt, It = 0, mantic Encoding module, which encodes a task to its latent where the binary variable Iw equals to 1 if the browsed task semantic vector based on its content; and 2) the Worker t t is selected by w; or 0, otherwise. For the above equation, Characterization module, which characterizes a worker with a larger product of ρ θ indicates that w will select t with a her latent vectors of expertise and preference, and the la- w t higher probability. Finally, being the joint result of Brows- tent scalar of her activeness. With the model’s parameters ing and Selection, the count of answers follows the factorized inferred from historical data, the expected final payoff can Poisson distribution presented in Theorem 1. be calculated for future worker recommendation. 2 Selecting the optimal topic number is an open issue, despite HDP 5.1 Probabilistic Formulation [20] or Perplexity [2]. However, for our problem where parameter inference is conducted in a supervised way, the optimal K can be Semantic Encoding. Following the conventional topic selected as the minimum value leading to the best prediction perfor- K modeling approaches, we adopt K topics (denoted as {βi}1 ) mance [12, 19].

384 " ! .3 .4 ./ .: Remarks. For the above generative process, the worker characterization only involves the positive instances (tasks that have been answered) associated with each worker; while # 6 7, + $ , , explicit reasoning for negative instances (tasks that have not been answered) is eliminated. Thus, unnecessary time con- % '$ sumption and possible errors caused by explaining the neg- & ative instances can be avoided in the parameter inference.

2 2 ( Ψ, I1 S1 5.2 Parameter Inference ) T, * - The log-likelihood of the generative process is derived as: Figure 5: Graphical Model of LHFM. L = log P (O, I, Ψ, S, Z, θ, β, ρ, a, e|α, γ, σ1, σ2, σ3, σe)

Theorem1. Given the worker’s latent activeness aw, latent = log P (O|Z, β)P (Z|θ)P (θ|α)P (β|γ) + log P (Ψ|θ, a, ρ)+ preference ρw, and all the tasks’ latent semantics {θt}T , the log P (I|θ, ρ) + log P (S, |θ, e, σe)+ (6) count of answers Ψw (i.e., Ψw = |Tw|) follows the Poisson log P (Z, ρ, a, e|σ1, σ2, σ3) distribution Poisson(P a ρ θ /|T |): T w w t =LO + LΨ + LI + LS + Lπ. 1 P a ρ θ P T w w t Ψw − awρwθt/|T | P (Ψw|Θ) = ( ) e T , (4) The bold symbols represent the sets of homogeneous pa- Ψw! |T | rameters, e.g., O: {{o}t}T . LO, LΨ, LI , LS stands for the where Θ is the whole set of parameters aw, ρw, and {θt}T . latent factors’ log-likelihoods w.r.t. 1) tasks’ contents, 2) counts of answers, 3) tasks’ selection and 4) answers’ scores, Proof. Presented in Section 9. respectively; and Lπ is priors’ log-likelihood for the workers’ characterized factors. • Answers’ Scores. The answers’ scores are jointly re- Given the observations and model’s priors, the values of sulted from the workers’ expertises and the tasks’ categories. latent parameters (Z, θ, β, a, ρ, e) are learned to maxi- Specifically, worker w’s expertise e is a K-dimensional vec- w mize the likelihood function in Eq. 6. A Monte Carlo EM tor, which indicates the comparative quality of w’s answer style algorithm [25] is developed for parameter inference. In towards each type of task. As a result, w’s answer quality particular, the semantic-related parameters (i.e., Z, β, θ) towards a presented task t is reflected by the product of e w are inferred through Gibbs sampling (referred as Semantic and θ , i.e., e θ . Being an indicator of the answer’s qual- t w t Sampling); while, the worker characterization’s factors (i.e., ity, the generation of score sw is modeled with the factorized t θ, ρ, a, e) are estimated with gradient descent (named Gaussian distribution N(e θ , σ−1), i.e., w t e as positive-instance Gradient because of the involvement −σ (sw−e θ )2 w 1 e t w t of purely positive instances). The semantic sampling and P (s |ew, θt) = e 2 , (5) t p −1 positive-instance gradient are iteratively carried out until 2πσe the convergence of the likelihood function. where σ serves as the regularization parameter for the fit- e Semantic Sampling. In this step, we sample the associ- ting loss. Apparently, a large product of e θ will probably w t ated topic of each task through Gibbs sampling. Given the result in a answer’s high score. topics and latent semantics, the full conditional probability Generative Process. The graphical model of LHFM of assigning task t with the k-th topic follows: is illustrated in Figure 5, where the left part demonstrates Q the semantic encoding, and the right part shows the worker o∈{o} θt,k ∗ βk,o P (z = k|θ , β) = t , (7) characterization. The hollow circles represent the hidden t t PK Q θt,l ∗ βl,o variables, and the solid ones stand for the observations. l=1 o∈{o}t

The parameters in the top row: γ, α, σ1, σ2, σ3 and σe where θt,k and βk,o are the k-th and o-th components of the are the model’s priors. Given our proposed probabilistic corresponding vectors. With the topic sampled for task t formulations, the generative process for observation data is (i.e., zt), the topics and latent semantic vector are updated presented as follows. according to the empirical distributions: • Semantic Encoding. P Iz =j ∗ |{o|o = oi}t| + γ Draw each topic in {βj }K : βj ∼ Dir(γ). β = T t , j,oi P Draw each task’s latent semantic vector: θt ∼ Dir(α). T Izt=j ∗ |{o}t| + γ|O| (8) Draw each task’s associated topic w.r.t. its latent seman- P β + α {o}t k,o θt,k = , tic vector: zt ∼ Mult(θt). PK P β + αK l=1 {o}t l,o Draw each word o in task t’s content {o}t w.r.t. its associated topic: o ∼ Mult(βzt ). where O denotes the vocabulary, and the binary indicator • Worker Characterization. Izt=j equals to 1 if zt = j; or 0, otherwise. For each worker w. Following the conventional topic modeling processing, the −1 Draw activeness scalar, aw ∼ N(0, σ1 ). above topic sampling and update operations will be repeti- −1 Draw preference vector, ρw ∼ N(0, σ2 I). tively carried out for a predeﬁned number of iterations [6]. −1 Draw expertise vector, ew ∼ N(0, σ3 I). Then stabilized estimation for all the relevant variables (Z, For each time slot. P β, θ) can be obtained. T awρwθt Draw count of answers, Ψw ∼ Poisson( |T | ). Positive-instance Gradient. With the maximization of For each task t in Tw. LO in semantic sampling, we iteratively adapt the model’s w Draw task selection, It ∼ Ber(ρwθt). parameters to maximize the summation of LΨ, LI , LS and w −1 Draw answer score, st ∼ N(ewθt, σe ). Lπ (denoted as L). We use mini-batch gradient descent to

385 Algorithm 1: Parameter Inference with Gibbs sampling (line 3-4), based on which the workers’ characters are updated through gradient descent (line input : O, Ψ, I, S, K, α, γ, σ1, σ2, σ3, σe. output: β, Z, θ, ρ, a, e. 6-8); the latent semantic vectors are further refined according to the updated workers’ characters (line 9-10). While 1 begin updating worker w’s characters, the batch of her positive 2 repeat instances T is sequentially presented according to the time 3 Semantic Sampling: w slots. The above operations will be repetitively conducted 4 • Update β, Z, θ acoording to Eq. 7, 8; for a predefined number of iterations to generate the final 5 Positive-instance Gradient: estimation for all the model’s parameters. 6 • for each worker w do In Alg. 1, workers’ preferences are judiciously adjusted 7 for positive instances Tw in each time slot do together with their activenesses, without explicit reasoning of the negative instances. As a result, not only the time 8 Update a , ρ , e according to Eq. 11; w w w cost is greatly reduced, but more reliable inference can be 9 • for each task do produced as well. A toy example is presented as follows to 10 Update θt according to Eq. 12; briefly illustrate the mechanism. Example1. Given the batch of tasks within a specific time 11 until a predefined number of iterations; slot, e.g., T = {t1, ..., t10}. Suppose only t1 is answered 12 return β, Z, θ, ρ, a, e; by worker w1; and according to w1’s inferred activeness, she usually generates approximately one answer within each time update the parameters’ values. In particular, L’s partial slot. In this situation, w1’s non-answering of {t2, ..., t10} gradients w.r.t. worker w’s relevant factors are derived as: is probably resulted from the limitation of w1’s activeness,

rather than her preference. Therefore, w1’s preference ρw1 ∂L ∂LΨ ∂Lπ = + will be increased for the relevant domains of t1, without much ∂aw ∂aw ∂aw decrement for the related domains of {t2, ..., t10}. ∂L ∂LΨ ∂LI ∂Lπ = + + (9) Meanwhile, there is another worker w2, who also answers ∂ρw ∂ρw ∂ρw ∂ρw t1 in the same time slot. However, w2’s inferred prefer- ∂ ∂L ∂L L = S + π ; ence over {t2, ..., t10} is small, and w2 would usually produce ∂ew ∂ew ∂ew roughly three answers in each time slot. In this situation, w ’s non-answering of {t , ..., t } is likely to be caused by while L’s partial gradient w.r.t. task t’s latent semantic 2 2 10 vector is derived as follows: w2’s limited preference. As a result, w2’s preference ρw2 will be decreased for the relevant domains of {t2, ..., t10}, without ∂ ∂L ∂L ∂L L = Ψ + I + S , (10) much decrement for her activeness. ∂θt ∂θt ∂θt ∂θt The time cost of each iteration in Alg. 1 is divided into where Wt represents the workers who have answered task t. two parts. Firstly, the time complexity of semantic sampling P The above gradients’ computation only relies on the infor- is O(K T |{o}t|), which is comparable to general topic an- mation about positive instances, i.e., the number of answers alyzing approaches3, such as [6]. Secondly, the positive- w P Ψw, the answered tasks Tw, and the answers’ scores {st }T . instance gradient only requires a time cost of O(K |Tw|), P w W As a result, it merely requires O( W |Tw|) (i.e., the scale of thanks to the purely involvement of positive instances. The positive instances) times of derivate calculations for all the conventional approaches (e.g., [23, 13]) would process both workers and tasks (as presented in Eq. 9, 10). positive and negative instances, which lead to the time con- The parameters are updated according to the derived gra- sumption of O(K|W ||T |). Given that |Tw| |T |, the infer- dients. We adopt an iterative manner to coordinate the up- ence for workers’ characters is significantly accelerated. dating process, where a, ρ, and e are firstly adapted with fixed θ. For each of the workers, the following operations 5.3 Expected Final Payoff Estimation are carried out for her characterized factors based on Eq. 9: With LHFM’s parameters inferred in Alg. 1, EFP can be calculated for a given pair of task and worker. First of LΨ Lπ aw ← aw + η1( + ), all, as a direct deduction from Theorem 1, the probability ∂a ∂a w w of task t’s being answered by worker w within an arbitrary LΨ LI Lπ ρw ← ρw + η1( + + ), (11) time period is stated by the following theorem. ∂ρ ∂ρ ∂ρ w w w Theorem2. For an arbitrary time period τ, task t’s be- ∂LS ∂Lπ ew ← ew + η1( + ). ing answered by worker w follows the Bernoulli distribution ∂ew ∂ew Ber(1 − exp(−awρwθt|τ|/|T ||τ0|)), i.e., The update for each worker is recursively carried out until  − awρwθt|τ| the convergence to a local maxima. Based on the updated a, |T ||τ | w w 1 − e 0 , At = 1; P (At |aw, ρw, {θt}T ) = a ρ θ |τ| (13) ρ and e, each task’s latent semantic vector is further refined: − w w t  |T ||τ | w e 0 , At = 0; ∂LΨ ∂LI ∂LS θt ← θt + η2( + + ). (12) w ∂θt ∂θt ∂θt where |τ0| is the length of one unit time slot; At equals to 1 if w answers t; or 0, otherwise. Similarly, the adaptation is repetitively performed until the convergence to another local maxima. Monte Carlo EM. The parameter inference is summa- Proof. Presented in Section 9. 3 rized in Alg. 1. As discussed above, the algorithm is per- Semantic sampling could be safely omitted if there have been topics formed iteratively: the tasks’ semantic vectors are inferred constructed from comprehensive corpus [23]

386 Given task t’s content, the latent semantic vector θt can Algorithm 2: Sampling-based Recommendation be ﬁrstly determined with Eq. 8. Suppose t is needed to input : Trec, U be solved within time period τ, the probability of t’s being output: R answered by w (denoted as P {Aw(τ) = 1}) is presented as: t 1 begin

a ρ θ |τ| 2 initialize R ← {R ← ∅} ; − w w t t Trec w |T ||τ0| P {At (τ) = 1} = 1 − e , (14) 3 for each task t ∈ Trec do 4 extract latent semantic vector θt with Eq. 8; according to the statement of Theorem 2. With the expected 5 w sample type zt with Mult(θt); answer quality calculated with Eq. 5, i.e., E{st } = ewθt, 6 sample candidate κt with Mult(φzt ); we can derive the expected final payoff between t and w with 0 the following equation: 7 Trec ← Trec; 8 repeat w w EFP(t, w) = P {At (τ) = 1} ∗ E{st }. (15) 9 sample task t for recommendation with Mult(ϕ); 10 add t’s candidate: Rt ← Rt + κt; 0 6. OPTIMAL WORKER RECOMMENDA- 11 remove t from Trec if count t = ct; 12 suspend κt if count κt = cκt ; TION 13 re-sample candidates and update K; 0 Formed as a maximum weighted bipartite assignment prob- 14 until Trec = ∅; lem, the optimal worker recommendation can be solved with 15 return R; traditional methods, such as Hungarian or Successive Short- est Path [14]. Whereas, the time complexity for these methods could be as high as O(|T |2|W |2), which is temporally im- each worker will be chosen as task t’s candidate, denoted as practical for large scale problems. Even if we resort to the κt, with the multinomial distribution specified by vector φzt : simple greedy algorithm for accelerated computation and U(zt, w) put up with its approximation result, it will still require the φzt,w = . (17) P U(z , w0) running time of O(|T |2|W |). w0∈W t In this paper, with workers indexed by their characterized Recommendation Sampling. The worker recommen- factors, we design a sampling-based algorithm for the recom- dation is produced by sequentially selecting the candidates mendation. Briefly, the workers are indexed by a |K| × |W | with a looped sampling process. Specifically, given all can- matrix, namely Utility Table, which records the workers’ didates (denoted as K), task t will be sampled to acquire its recommendation utilities toward each type of tasks. When candidate κt according to multinomial distribution ϕ, where a batch of tasks are presented, we will consecutively sample their underlying categories (based on their latent semantic U(zt, κt) ϕt = P . (18) 0 U(z 0 , κ 0 ) vectors) and candidate workers (based on the Utility Ta- t ∈Trec t t ble). From the sampled candidates, we recursively generate the recommendation result through sampling the candi- Once task t is sampled, the candidate κt will be added to its dates’ utilities, until all the tasks are recommended with the recommendation list; and another worker (other than those required numbers of workers. In this way, we can acquire a who have been recommended to t) will be selected as the solution with its approximation ratio greater than 1/4 using new candidate for task t. merely O(|T |) running time. Due to the capacity constraints, task t will be removed from Trec once it is recommended with the required number 6.1 Worker Indexing (i.e., ct) of workers; at the same time, worker w will be We create the |K| × |W | utility table (denoted as U) to taken out from the current candidates and exempted from index the workers with their recommendation utilities to- being a candidate any more, if she has been recommended wards each type of tasks. In particular, we define the set to the maximum number (i.e., cw) of tasks. Such a sampling of standard tasks {t¯ , ..., t¯ }, where t¯ ’s semantic vector is process will be repetitively carried out, until all the tasks are 1 K k recommended with the required number of workers. one-hot at the k-th dimension (i.e., θ¯ [i] = 1 iff. i = k; oth- tk The worker recommendation process is summarized in erwise, θt¯k [i] = 0). The (k, w)-th entry of the utility table records the EFP of task t¯ and worker w, i.e., Alg. 2. The tasks’ recommendation lists (R) are initial- k ized to be empty sets (line 2); and in the first for-loop, the U(k, w) = EFP(t¯k, w). (16) tasks’ types and candidates are sequentially sampled (line 5, 6). In the second loop, the tasks are repetitively selected 6.2 Sampling-based Recommendation through sampling Mult(ϕ), with their candidates added to Given a batch of tasks Trec, the worker recommendation R (line 10). The sampled task t and its candidate worker κt is generated with the following steps. will be removed from further recommendation (line 11, 12) Type Sampling. Given the content of each task t ∈ Trec, if their capacity constraints are reached (i.e., count t = ct or we extract its latent semantic vector θt based on LHFM’s count κt = cκt ). Besides, new candidates will be re-sampled semantic encoding module (whose computation follows Eq. for task t (if it is not removed) and the tasks associated with 8). With the extraction of latent semantic vectors, we de- κt (if κt is suspended); and K will be updated accordingly termine each task’s underlying type zt through sampling the (line 13). The following toy example is presented to illus- multinomial distribution specified by θt, i.e., zt ∼ Mult(θt). trate the recommendation process. Candidates Generation. Based on the sampled types Example2. Suppose there are two tasks for recommenda- of all the tasks, {zt}Trec , we allocate each task with a candi- tion, t1 and t2, each of which requires one worker. With date worker through sampling the utility table. Particularly, their extracted semantic vectors, θt1 and θt2 , their categories

387 are sampled as zt1 and zt2 ; then, the tasks’ candidates are recommended to a presented task; while, Exp/Act will rec- sampled according to the zt1 -th and zt2 -th rows of U. Sup- ommend workers with the highest expertises/activenesses). pose w1 (whose capacity is 1) is selected to be both κt1 and Following the conventional offline evaluation of recom- 4 κt2 . Then, in the 1st round of recommendation sampling, t1 mendation system [8, 18], we adopt “Accumulative An- is selected; thus, w1 is recommended to t1; besides, w1 and t1 swer Score” (denoted as Acc) to measure the effective- are exempted from further sampling. Suppose w2 is selected ness of triple-factor aware worker recommendation. Such a as t2’s new candidate; and in the 2nd round of recommenda- metric serves as a joint indicator of answer production and tion sampling, since only one task is left, t2 is recommended answer quality. In particular, given each task t’s worker with w2. Finally, the recommendation is made as: t1 with recommendation Rt, the accumulative answer score (for the w1, and t2 with w2. whole set of tasks to be recommended Trec) is calculated as Accelerated Sampling with Alias Method. To re- the sum of answers’ scores: duce the sampling cost, the Vose’s Alias method [22] is X X w Acc({Rt}Trec ) = st . (19) adopted. In particular, given an arbitrary multinomial dis- t∈Trec w∈Rt tribution to be sampled (e.g., Mult(ϕ)), two tables, Alias and Probabilities, are created as the auxiliary structures to Some workers may not return their answers; in that case, w enable the O(1) sampling. Given a m dimensional distribu- the corresponding scores will be set to zero (i.e., st = 0 if tion, the structures’ creation requires O(m) time and space w’s answer is not returned). The physical meaning of Acc costs. In our sampling-based algorithm, the Utility Table is easy to comprehend: given a fixed recommendation size, is static, therefore, its auxiliary structures are constructed the algorithm leading to higher Acc indicates better identi- offline, with O(|W|) time and space costs (given K is a con- fication of the desired workers (those who are expected to stant value); while, the other auxiliary structures (for type make timely and quality answers), thus reflecting its higher and recommendation sampling) need to be specifically con- effectiveness for worker recommendation. structed for each batch of presented tasks, which will lead In addition, we also use the total number of acquired an- to O(|T |) time and space costs at running time. Notice swers (denoted as #Ans) and the average score of acquired rec Acc that the Alias and Probabilities tables of Mult(ϕ) might be answers (which equals to #Ans ) to get a more comprehensive adjusted after K’s being updated (line 13, Alg. 2), it will view of the recommendation result. merely take O(1) operations since only a constant number Positive-only Inference. The positive-only inference, of entries in ϕ need to be adapted. which is adopted for worker characterization, is compared Efficiency and Effectiveness. Alg. 2 consists of two with the traditional binary-class collaborative filtering tech- nique (denoted as Bcf) [23, 13]. Briefly, Bcf would assign looping components, where the first one takes |Trec| repeti- P unit weight (i.e., 1) to the positive instances, and presents tions, and the second one incurs ct. Given the O(1) Trec sampling and updating cost within each repetition, the over- the negative ones with weight δ (0 < δ < 1), making the negative instances less credible. Besides, instead of process- all time complexity is O(|Trec|) (provide that ct is a constant). Together with the time cost for auxiliary structures’ ing all the negative instances, Bcf uniformly samples d1/δe negative instances after processing each positive instance. construction, the time cost will still be O(|Trec|). The major space cost is resulted from the storage of utility To evaluate the positive-only inference’s quality, we adopt table, tasks’ types and candidates, together their auxiliary Recall@K, which is a widely used metric for inference qual- structures for sampling, which equals to O(|W | + |T |). ity of recommender systems (e.g., [23]). In particular, given rec K Alg. 2 is lower-bounded by 1/4 of the optimal solution, workers with the top-K preferences towards task t (Rt ) and which is formally stated as Theorem 3. all the workers who have answered t (Wt), it is defined that: Theorem3. The expected total EFP of sampling-based rec- Recall@K = |RK ∩ W |/|W |. (20) ommendation is no less than 1/4 of the optimal result. t t t

Proof. Presented in Section 9. A high value of Recall@K means a large proportion of the recommended workers will give their answers to the presented task, thus reflecting better inference quality on work- 7. EXPERIMENTAL STUDY ers’ preferences and activenesses. Besides, the efficiency of The experimental study is conducted to evaluate the fol- positive-only inference is evaluated in terms of time cost. lowing aspects of our proposed methods: 1) the effect of In particular, the overall running time of inferring the work- worker recommendation with triple-factor awareness, 2) the ers’ characters is recorded for all the comparison methods. effect of parameter inference with only positive instances Sampling-based Recommendation. As a solution for (referred as Positive-only Inference), 3) the effect of worker making the maximum weighted bipartite assignment, the recommendation using sampling-based algorithm. sampling-based algorithm is compared with: 1) the Mixed Integer Programming (with Coin OR’s solve5), which gener- 7.1 Experiment Settings ates the optimal solution, and 2) the greedy method, which Triple-factor Aware Worker Recommendation. The sequentially generates the approximation result with the de- triple-factor aware worker recommendation is compared with scending order of the recommendation utilities (i.e., EFP). those single-factor approaches, which are conventionally used We adopt the optimal worker recommendation’s objective, by the existing works (e.g., [15, 17]). In particular, the namely Accumulative EFP (AE for short), to evaluate baseline methods include Preference-Purely, Expertise- 4 Purely and Activeness-Purely (Pre, Exp and Act for A recommendation system is considered to be effective if it’s able to accurately predict the users historical actions in the testing data, e.g., short), where workers’ preferences, expertises and active- movies ratings for movie recommendation, or accumulated answer nesses are the solely factors considered in recommendation score in our work. 5 (i.e., in Pre, workers with the highest preferences will be https://projects.coin-or.org/Cbc

388 efrhrvrfidtruhtefloigobservations. can following explanation the an through Such verified character. further workers’ emphasize them be of simply solve type to methods one able baseline also on the but probable all time, only while, on straight- not correctly; tasks quite are given is who the workers fulfill superiority to the to its finds rise about Tri gives reason as (with Tri The forward, group where 15). testing to each (a), 3 in 6 Acc Figure highest in the reflected is 6. dation of Figure in scores presented average are and total (Score) (#Ans) answers (Acc), answers acquired scores experi- acquired answer The of accumulative number task. on Act, each results and for mental Exp pref- recommended Pre, EFP, are Tri, top-K respectively) for the activenesses (i.e., expertises, utilities erences, top-K the with workers Evaluation Recommendation Triple-factor 7.2 Ber(0 0.05: distribution of Unif(0 Uniform parameterized 1: tribution the and with 0 workers, simulated of and is parameterized tasks EFP of the datasets synthetic where recommenda- work- two further sampling-based create 5547 to we of and Besides, tion, performance tasks included. the are 67080 evaluate 2016) of within (answers total ers a workers. where of number evaluation, required the for tasks the all recommen- allocating worker of of terms efficiency in the dation evaluate we addition, In result particular, In recommendation given algorithm. sampling-based of effectiveness scores. the average on result (c) answers; acquired of 6: Figure

h ueirt ftil-atraaewre recommen- worker aware triple-factor of superiority The where evaluation, the for tasks 1000 choose randomly We for Overflow Stack from crawled data world real use We Score #Ans Acc xeietI a euto c;()rsl nnumber on result (b) Acc; on result (a) I: Experiment E= AE iecost time X t ∈ T rec X Top Top Top .. h iecnupinof consumption time the i.e., , { (c) (b) (a) R - w - - K K K t ∈ , } R ) n h enul dis- Bernoulli the and 1), T t rec EFP( Ei optdas: computed is AE , . 5,respectively. 05), ,w t, ) K . aidfrom varied (21) 389 . oiieol neec Evaluation Inference Positive-only 7.3 that implies to finding unlikely answers. a are quality Such who high workers, produce (c)). low-expertise than 6 the higher Figure excludes considerably Tri (in and work- Act expertises), and the top Pre recommends the (which are with Exp scores average of ers Tri’s those more, to What’s comparable tasks. dif- of towards types preferences workers’ ferent due of lower lower neglect leaves significantly slightly complete are their it in #Ans to Act’s resulting preferences), and thus Exp top activeness, while #Ans; tasks of the factor have recommending the who (by out workers answers those of to max- acquisition to tries the also Pre imize Although answers. acquired of number bars topics. red 200 dark and and 100 blue 50, of red, settings scales; represent workers’ variant with ization 7: Figure “ rameter where Filtering, Binary-class and Collaborative Strategy Positive-only between comparisons cost 2: Table 00t 000 I aeo osbedpiain n pair any duplication, possible of case (In from ranges size 20,000. whose to dataset, 2000 boot- original through the sampled from are work- strapping variant workers Particularly, with cost scales. time ers’ of terms Pos). in scalability of temporal that Bcf than lower (as much quality still is inference Recall@K on 0.02’s prevents improvement which pseudo further erroneous, the be Bcf’s 2) to as likely processed; considerably are be instances grow to negative will need instances consumption training time more the 1) effects: of decrement the However, parameter with Bcf of value smaller when Recall@K of highest decrement moderate with extent some inference efficiency. best running the to high leads preserving Pos Bcf of while than quality, adoption the greater incurs say, to (only and is That cost values It time Recall@K in- low positive-only 2. largest comparatively the Table the for in generates (standing Pos presented ference) that time are observed be the results compar- can whose record the all and methods, for tasks ison characterization chosen worker of randomly consumption 1000 of total oprsnWt Bcf. With Comparison largest the to leads Tri (b), 6 Figure in demonstrated As Scalability. to improved be can quality inference Bcf’s addition, In Methods Methods δ δ δ δ δ - - Pos Pos - - - 0.02 0.02 0.05 0.1 0.1 0.2 0.5 δ

qa to equal xeietI-:Rcl@ K=1,5,10 n time and 100) 50, 10, = (K Recall@K II-1: Experiment xeietI-.Tm otfrWre Character- Worker for cost Time II-2. Experiment efrhreaut oiieol Inference’s Positive-only evaluate further We δ 0.0695 0.0695 0.0685 0.1090 0.0847 0.0847 0.0814 0.0806 10 x ilicroaemr riigsmls(as samples training more incorporate will . δ ae 1 takes δ Recall@K qast .2.Ti sbcuea because is This 0.02). to equals δ 0.2705 0.2705 0.2649 0.2597 0.2457 0.2391 0.3386 ilieial euti w side two in result inevitably will 50 /δ δ - emaueRcl@ o a for Recall@K measure We x suongtv instances). negative pseudo ersnsteBfwt pa- with Bcf the represents ” 0.4183 0.4183 0.4080 0.3968 0.3683 0.3421 0.4906 100 δ a c hw the shows Bcf (as Time (s) (s) Time 359.90 359.90 141.60 84.15 84.15 53.16 36.46 49.48 δ -0.5). δ - AE AE Time (s) Time (s) Time

Tasks (100) Workers (100) (a) (b) Tasks (100) Workers (100) (a) (b) AE AE Time (s) Time Time (s) Time

Tasks (100) Workers (100) (c) (d) Tasks (100) Workers (100) (c) (d) Figure 9: Experiment III-2. (a), (b): diﬀerent methods’ time costs on variant number of tasks and workers; (c), (d): Sample’s time costs on variant number of tasks and workers. AE AE Sample). A necessary condition for the worst cases of Greedy and Sample is that all the tasks share a similar group of

Tasks (100) Workers (100) favorable workers, where tasks recommended in the initial (e) (f) stage will probably affect the subsequent ones. However, Figure 8: Experiment III-1. (a), (b): AE (Real) on variant such a condition is almost impossible to hold in neither real number of tasks and workers; (c), (d) AE (Uniform) on variant nor synthetic settings: in real settings, tasks’ recommenda- number of tasks and workers; (e), (f): AE (Bernoulli) on variant tion utilities (i.e., EFP) are greatly diversified over the work- number of tasks and workers. ers because of their differences on semantic vectors; while, in synthetic settings, tasks’ EFP over workers are also sig- of duplicated workers will be treated as different entities, nificantly discrepant owing to their independent generation. i.e., different workers with the same answering record.) We Secondly, Sample’s performance is close to Greedy and the select 3 candidate latent topic dimensions and present the optimal result (denoted as OPT) on the real (Figure 8 (a), results in Figure 7. (b)) and Bernoulli-synthesized (Figure 8 (e), (f)) data. For The overall time cost increases linearly with the growth of both real and Bernoulli-synthesized datasets, a task’s total workers’ scale, given the fixed topic dimension; besides, the recommendation utility (over the whole workers) is densely overall time cost is roughly proportional to the topic dimen- concentrated on a small group of “favorable workers”. (Such sion. Both observations are consistent with our theoretical a “concentration” property is common in practice, as it is analysis in Section 5.3. As such, the time cost won’t be too probable that only a tiny group of workers are highly appro- large given a reasonable scope of workers’ scale. In addition, priate to be recommended for a specific task.) As a result, two alternative methods can be adopted for further acceler- such workers can be selected with significantly higher prob- ation, in case of large-scale workers are presented. Firstly, as abilities, which reduces the performance gap between Sam- each worker’s latent characters are updated independently ple and Greedy. However, such a property does not hold on (as presented in Alg. 1), worker characterization can be Uniform-synthesized data, as a task’s generated EFP follows parallel conducted, where each worker’s updating process is uniform distribution, thus enjoying greater mutual similar- executed by a unique thread (similar manner to [34]). When ities. As a result, Sample’s performance is comparable to adequate computing devices are provided, such an execution Greedy and OPT under the real and Bernoulli-synthesized could cut down the time cost by orders. Secondly, because settings, rather than the Uniform-synthesized setting. workers’ characters will probably change steadily, incremen- Thirdly, the performance ratio between Sample and Greedy tal learning approaches (e.g., [7]) can be applied, which will becomes smaller with the tasks’ scale’s increase (Figure 8 omit the cost of inferring workers’ characters all over again. (a), (c), (e)). In Alg 2, Sample randomly selects tasks and their candidate workers for recommendation with proba- 7.4 Sampling-based Recommendation Evalu- bilities proportional to the corresponding EFP; meanwhile, ation Greedy can be regarded as the deterministic variation of The experiments are carried out with: (1) variant scales Sample, as it sequentially makes recommendation with the of tasks (200 to 2000) and a fixed number of workers (2000); descending order of EFP. When there are more tasks be- (2) variant scales of workers (200 to 2000) and a fixed num- ing presented, Sample’s recommendation sampling opera- ber of tasks (200). Both real and synthetic (Uniform and tion (line 9, Alg. 2) becomes more randomized; thus, it re- Bernoulli) datasets are employed. The resulted AE is re- sembles Greedy less and its resulted AE diverges from that ported in Figure 8 and the time cost is shown in Figure 9, of Greedy. Despite such a deficiency, Sample’s performance where the following interesting points can be observed. is still comparable to Greedy and OPT on real data when Firstly, the approximations’ performances are clearly above tasks’ scale reaches 2000, owing to the “concentration” prop- their theoretical lower bounds (1/2 for Greedy and 1/4 for erty of real world data discussed in the second point.

390 At the same time, the time cost of Sample is considerably Now we transform such a process with the following equiv- lower than the baselines (shown in Figure 9 (a), (b)); as Sam- alent form. According to the definition of Poisson distri- ple only requires O(|T|) rounds of sampling operation, and bution, the candidates’ generation can be regarded as “M each of which can be completed in O(1) with Alias method. independent trials with success rate of aw/M (M → ∞)”, In addition, Sample’s time cost is linear to the tasks’ scale where the probability of bw successful trials is presented as: (Figure 9 (c)), and independent of workers’ scale (Figure M a a w M−bw w bw 9 (d)), which is consistent our theoretical analysis. Such P (bw|aw) = (1 − ) ( ) . (22) bw M M a property is desirable for real applications where the rec- To get confirmed in the end, a candidate trial also needs to ommendation could involve tremendous number of workers pass the Bernoulli selection Ber(P ρ θ /|T |). Because the (e.g., a total of 6 million registered users in Stack Overflow). T w t candidate generation and the final confirmation are independent, therefore, a trial success rate in such a sequential 7.5 Summary of Experiments P process is aw ∗ T ρwθt . In other words, the number of The major findings of the experimental study can be sum- M |T | marized with the following points. successful trials for the whole sequential process follows: M U U • The triple-factor aware worker recommendation provides M−bw bw P (b|aw, ρw, θt,T ) = (1 − ) ( ) , (23) the given tasks with workers who will not only offer timely bw M M P answers with large probabilities, but will also be capable of where U = aw ρwθt/|T |. Therefore, the number of an- solving the tasks with high qualities, therefore leading to the T P swers (i.e., Ψw) follows Poisson(aw T ρwθt/|T |). highest total EFP of all the comparison methods. Proof of Theorem 2. The proof is made with the fol- • The adoption of positive-only inference results in the best lowing two steps. Firstly, the incident of “task t’s being inference quality, while preserving high running efficiency. w answered by worker w (i.e., At = 1) during one unit time • The sampling-based method is able to make near optimal slot τ0” is equivalent to the occurrence of following sequen- recommendation under the real world settings, with greatly tial events: 1 ) w browses as least one task during one unit smaller time cost. Besides, the sampling-based method shows time slot τ0; 2 ) t is within w’s browsed tasks; and 3) t is satisfactory temporal scalability, as its running time is linear w accepted by w. As such, “At = 1” is the specific instance to the tasks’ scale, and independent of the workers’ scale. of the sequential process discussed in the last proof, where each trial’s success rate is awρwθt/N and at least one trial w 8. CONCLUSION AND FUTURE WORK will succeed in the end. Thus, the probability of “At = 1” equals to P (b ≥ 1|a / P ρ θ ), which means: In this work, we propose the Triple-factor Aware Worker w w T w t w −awρwθt/|T | Recommendation framework, which comprehensively con- P (At = 1|aw, ρw, θt,T ) = 1 − e . (24) siders the workers’ expertises, preferences and activenesses Secondly, given an arbitrary time period τ, it can be verified to maximize the production of high quality answers. In this that the expected number of browsed tasks within such a framework, the LHFM is constructed to accurately estimate aw|τ| time period will be . Replacing aw in Eq. 24 with such the tasks’ underlying categories and infer the workers’ la- |τ0| tent characters. Thanks to the adoption of positive-only a value, we have: w −awρwθt|τ|/|T ||τ0| inference, the model’s parameters are learned with higher P (At = 1|aw, ρw, θt,T ) = 1 − e . (25) efficiency and better quality. Besides, the sampling-based al- Therefore, Theorem 2 is justified. gorithm is proposed to generate the near optimal worker rec- Proof of Theorem 3. We sketch the proof with the fol- ommendation in a greatly accelerated and temporally scal- lowing three parts. Firstly, given the optimal worker recom- able way. Extensive experimental studies are carried out on mendation following the form of weighted bipartite assign- both real and synthetic datasets, whose results verify the ment problem (with positive integral capacity constraints), effectiveness and efficiency of our proposed methods. In the the greedy solution, which sequentially picks edges with the future work, we will consider TriRec’s adaption to better descending order of weights, leads to an approximation re- fit users’ individualized requests, which will make the sys- sult no less than 1/2 of the optimal one. Secondly, Alg. tem more interactive and user-friendly; for example, how 2 makes recommendation with three sequential randomiza- to guarantee the acquisition of qualified answers within a tion steps, i.e., type sampling, candidates generation and certain specified time window. recommendation sampling; for an arbitrary pair of task and Acknowledgment. The work is supported in part by worker (e.g., t and w), it is straightforward that the proba- the Hong Kong RGC Project 16202215, Science and Tech- bility of w’s being recommended to t, denoted as P , fol- nology Planning Project of Guangdong Province, China, No. t,w lows P ∝ EFP(t, w)2+, where is a positive number. 2015B010110006, NSFC Grant No. 6172920161232018, Mi- t,w Thirdly, in each repetition of the recommendation sampling crosoft Research Asia Collaborative Grant and NSFC Guang (line 9, Alg. 2), the expected EFP of the newly added rec- Dong Grant No. U1301253. ommendation follows: 2+ X X EFP(t, κt) ∗ EFP(t, κt) EFP(t, κ ) ∗ P = 9. APPENDIX t t,κt P 2+ 0 0 T0 EFP(t, κt) Trec Trec rec Proof of Theorem 1. According to the discussed prob- (26) max 0 EFP(t, κt) abilistic formulation, we are dealing with the following se- X EFP(t, κt) ∗ EFP(t, κt) Trec ≥ P ≥ . quential process: 1 ) a total of bw candidates are browsed, 0 T0 EFP(t, κt) 2 Trec rec where bw ∼ Poisson(aw); 2 ) the selected candidates will be finally confirmed through Bernoulli selection Ber(ρwθt). i.e., in each repetition, the sampled recommendation’s EFP Since we have a total of |T | tasks, the probability of an ar- is no less than 1/2 of the one selected by the greedy algo- P rithm. Thus, the approximation ratio is above 1/4. bitrary candidate being confirmed equals to T ρwθt/|T |.

391 10. REFERENCES Handbook, pages 257–297, 2011. [1] J. Bian, Y. Liu, D. Zhou, E. Agichtein, and H. Zha. [19] M. Taddy. On estimation and selection for topic Learning to recognize reliable users and content in models. In International Conference on Artificial social media with coupled mutual reinforcement. In Intelligence and Statistics, pages 1184–1193, 2012. WWW 2009, pages 51–60. [20] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. [2] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Sharing clusters among related groups: Hierarchical dirichlet allocation. JMLR, 3:993–1022, 2003. dirichlet processes. In NIPS 2005, pages 1385–1392. [3] A. C. Cameron and P. K. Trivedi. Regression analysis [21] Y. Tong, J. She, B. Ding, L. Wang, and L. Chen. of count data, volume 53. Cambridge university press, Online mobile micro-task allocation in spatial 2013. crowdsourcing. In ICDE 2016, pages 49–60. [4] Z. Chen, R. Fu, Z. Zhao, Z. Liu, L. Xia, L. Chen, [22] M. D. Vose. A linear algorithm for generating random P. Cheng, C. C. Cao, Y. Tong, and C. J. Zhang. numbers with a given distribution. IEEE Transactions gmission: A general spatial crowdsourcing platform. on Software Engineering, 17(9):972–975, 1991. PVLDB, 7(13):1629–1632, 2014. [23] C. Wang and D. M. Blei. Collaborative topic modeling [5] P. Cheng, X. Lian, Z. Chen, R. Fu, L. Chen, J. Han, for recommending scientific articles. In KDD 2011, and J. Zhao. Reliable diversity-based spatial pages 448–456. crowdsourcing by moving workers. PVLDB, [24] H. Wang, N. Wang, and D.-Y. Yeung. Collaborative 8(10):1022–1033, 2015. deep learning for recommender systems. In KDD [6] T. L. Griffiths and M. Steyvers. Finding scientific 2015, pages 1235–1244. topics. Proceedings of the National Academy of [25] G. C. Wei and M. A. Tanner. A monte carlo Sciences, 101(suppl 1):5228–5235, 2004. implementation of the em algorithm and the poor [7] E. Hazan, A. Rakhlin, and P. L. Bartlett. Adaptive man’s data augmentation algorithms. Journal of the online gradient descent. In NIPS 2007, pages 65–72. American Statistical Association, 85(411):699–704, 1990. [8] J. L. Herlocker, J. A. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering [26] H. Wu, Y. Wang, and X. Cheng. Incremental recommender systems. ACM Transactions on probabilistic latent semantic analysis for automatic Information Systems, 22:5–53, 2004. question recommendation. In RecSys 2008, pages 99–106. [9] N. Houlsby, J. M. Hernández-Lobato,and Z. Ghahramani. Cold-start active learning with robust [27] L. Yang, M. Qiu, S. Gottipati, F. Zhu, J. Jiang, ordinal matrix factorization. In ICML 2014, pages H. Sun, and Z. Chen. Cqarank: jointly model topics 766–774. and expertise in community question answering. In CIKM 2013, pages 99–108. [10] L. Kazemi and C. Shahabi. Geocrowd: enabling query answering with spatial crowdsourcing. In GIS 2012, [28] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, pages 189–198. H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR 2011, pages [11] P. Mavridis, D. Gross-Amblard, and Z. Miklós.Using 338–349. hierarchical skills for optimized task assignment in knowledge-intensive crowdsourcing. In WWW 2016, [29] Z. Zhao, L. Zhang, X. He, and W. Ng. Expert finding pages 843–853. for question answering via graph regularized matrix completion. TKDE, 27(4):993–1004, 2015. [12] J. D. Mcauliffe and D. M. Blei. Supervised topic models. In NIPS 2008, pages 121–128. [30] L. Zheng and L. Chen. Mutual benefit aware task assignment in a bipartite labor market. In ICDE 2016, [13] R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. Lukose, pages 73–84. M. Scholz, and Q. Yang. One-class collaborative filtering. In ICDM 2008, pages 502–511. [31] Y. Zheng, G. Li, and R. Cheng. Docs: a domain-aware crowdsourcing system using knowledge bases. PVLDB, [14] C. H. Papadimitriou and K. Steiglitz. Combinatorial 10(4):361–372, 2016. optimization: algorithms and complexity. Courier Corporation, 1982. [32] Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. Qasca: A quality-aware task assignment system for [15] M. Qu, G. Qiu, X. He, C. Zhang, H. Wu, J. Bu, and crowdsourcing applications. In SIGMOD 2015, pages C. Chen. Probabilistic question recommendation for 1031–1046. question answering communities. In WWW 2009, pages 1229–1230. [33] G. Zhou, S. Lai, K. Liu, and J. Zhao. Topic-sensitive probabilistic model for expert finding in question [16] F. Riahi, Z. Zolaktaf, M. Shafiei, and E. Milios. answer communities. In CIKM 2012, pages 1662–1666. Finding expert users in community question answering. In WWW 2012, pages 791–798. [34] Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the [17] J. San Pedro and A. Karatzoglou. Question netflix prize. AAIM 2008, 5034:337–348. recommendation for collaborative question answering systems with rankslda. In RecSys 2014, pages 193–200. [35] H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian. Ranking user authority with relevant knowledge [18] G. Shani, A. Gunawardana, G. Shani, and categories for expert finding. World Wide Web, A. Gunawardana. and asela gunawardana. evaluating 17(5):1081–1107, 2014. recommendation systems. In Recommender Systems

392