Information Services with Social Components

Wolf-Tilo Balke Institute for Information Systems Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Semantic Metadata

• Most Information services still rely on metadata – But how to know what metadata will be needed at content creation time? – Can missing metadata be efficiently created just in time also for unexpected queries?

– Joint work with Joachim Selke & Christoph Lofi

2 Information Services

• Complete missing information – For targeted information provisioning – For better indexing & searching – For personalization

• Where to get this information? – Extract it from the Social Web?

3 Information Services

• Example: recommender systems

4 Information Services

• Example: ESP Game & Image Labeler – Idea: “Games with a purpose” • Image Labeling: Guess your partner’s tags, and both score. • No payment necessary

5

• Hot and emerging paradigm – Vaguely defined concept: “Concepts for fostering human collaboration to solve complex problems.” – Aims at tapping the “the Wisdom of the Crowd” • “Under certain conditions large crowds of people are able to perform highly effective decisions”

6 Crowdsourcing

• Examples: – Building complex artefacts • Knowledge: Wikipedia.org • Software: Linux, Apache – Content Creation • YouTube, Flickr – User opinions • IMDb, Netflix, – Networking • . LinkedIn – etc.

7 Crowdsourcing

• Four challenges need to be overcome – How to recruit and retain users? – What contributions can users make? – How to combine the contributions to solve the target problem? – How to evaluate users and their contributions?

8 Crowdsourcing

• Most platforms rely on volunteers – Intrinsically motivated • Users believe in the mission of the platform • Users directly profit from the platform

• Problem: – Mission cannot easily be changed, only specialized tasks solvable on each platform – Communities have to be carefully fostered, but are hard to control

9 Generic Crowdsourcing

• Generic Task-Based Crowdsourcing – General purpose platforms can facilitate virtually any task for anybody • Workers are attracted and retained by paying money

10 Generic Crowdsourcing

– Clients can initiate a large crowd-sourcing task • Define the user interface • Define how the task is broken down to individual work packages: HITs (Human Intelligence Tasks) • Define the overall workload • Define how individual results are aggregated • Define payment per HIT

11 Generic Crowdsourcing

– Workers solve task • Short description of task • Transparent payment per HIT • Solves task using user interface provided by client • Can provide feedback with respect to task and its initiator

12 Generic Crowdsourcing

• Popular example from art: Aaron Koblin – http://www.thesheepmarket.com/ – Laboral Centro de Arte, Gijon, Spain Japan Media Arts Festival, Tokyo, Japan Apex Gallery, New York, USA ElectroFringe, New Castle, Australia Media Art Friesland, The Netherlands

13 Generic Crowdsourcing

• You get what you pay for… – 10 000 sheep = 200 USD

14 Generic Crowdsourcing

15 Generic Crowdsourcing

• Popular examples from art reloaded – How about more detailed instructions? – www.tenthousandcents.com/

16 Generic Crowdsourcing

17 Real World Applications

• Crowd-Enabled Databases – Core idea: Build a database engine which can dynamically certain operations • Complete missing data during query time – Incomplete tuples (CNULL values) – Elicit completely new tuples

• Use human intelligence operators – Entity resolution – Similarity rankings – etc.

18 Crowd-Enabled DB

19 Classification of CS Tasks

• The ease-of-use and reliability of crowdsourcing tasks varies with the respective use case • In general, three variables have to be controlled – Answer/Solution Quality, impacted by… • Worker diligence • Worker maliciousness • Worker quality and skills – Execution Time • Job attractiveness (payment vs. time) • Worker pool size – Costs • Number of HITs • costs per HIT (affected by time and skill needed) • Quality control overhead

20 Classification of CS Tasks

• Two general discriminating properties impacting these variables can be identified – Ambiguity of the tasks solutions • For a given solution, can we indisputably decide if it is correct or wrong? – Factual tasks (best case) • Can we at least reach a community consensus? – i.e. answer is considered correct by most people – Consensual tasks (not-so-good case) • Is there no correct answer? Answers completely subjective? – Opinionated tasks (luckily, uninteresting case for most computer science tasks)

21 Classification of CS Tasks

– Required level of worker expertise / skill • Can anybody solve the tasks? – General worker pool can be used • Are special skills / background knowledge required? – Worker pool must be filtered – Expert users must be found

22 Classification of CS Tasks

Examples: Examples: • “What is the nicest color?” • “What is the best operating • “What political party will you system?” opinionated vote?”

II IV

Examples: Examples: • Ambiguous classification • “Is ‘Vertigo’ a violent movie?” “Does the person on this • “Is the VW Amarok a car photo look happy?” suited for families?”

“Is this YouTube Video funny?” consensual

I III

Examples: Examples: • Find information on the Web • Find specific information level of answer ambiguityanswer of agreement / level “When was Albert Einstein “What is the complexity class born?” of deciding Horn logic?” factual • Manual OCR • Simple cognitive classification “Is there a person on this photo?”

any user some users only question answerable by 23 Experiment

• Basic Settings: – Amazon Mechanical Turk – Judge 1000 random movies • Consider only movies which have consensual genre classifications in IMDb, Rotten Tomatoes, and Netflix – Only 10,562 movies overall – Use these movies as “truth” – Majority vote of 10 workers each – No Gold questions – $0.02 per HIT with 10 movies

24 Experiment

• Result (stop after $20; 10,000 answers) – 105 minutes (1:45 hours) – 89% reached a consensus • 59% of these movies are classified correctly • What went wrong? – Malicious workers! • 62% selected “comedy” (first choice in form) – 30% of all movies in test set are indeed comedies • 24% selected “no comedy” – 70% of all movies in test set are no comedies • 14% selected “I don’t know this movie”

25 Example

• Observation: the test set contains some very obscure movies – Quick survey among students: knew only 10%-20% – But: Many workers claimed to know all movies • Judged 56% of all movies as comedies, 44% as no comedy • Originate just from two distinct countries – All others workers: • Knew only 26% of all movies • 32% comedy • 68% no comedy • Realistic values!

26 Example

• Adjusted Settings: – Similar to experiment above, but exclude all workers from the two offending countries • Hopefully, only trustworthy workers remain • Result (stop after $20; 10,000 look-ups) – 116 minutes (1:56 hours) – 63% of all movies reached consensus • Of those, 79% are classified correctly • Result still disappointing – Obscure movies do not reach consensus – Consensus still not reliable

27 Hybrid Approaches

• How to perform better? – Employ hybrid techniques combining crowdsourcing, information extraction, machine learning and the Social Web! • Tackle the following challenges – Performance • Drastically speed up crowdsourcing times – Costs • Require just few crowdsourcing HITs for obtaining a large number of judgements – Data Quality • Circumvent the impact of malicious workers • Reliably obtain judgements for even obscure and rare items

28 Hybrid Approaches

• Reconsider crowd-enabled databases – Large table with movies • e.g. like IMDb, ~2 Million movies

– Task • Introduce new column with a rating for humour (0-10) – Traditional approach • Create crowd-sourcing task asking users for judgement • Consensual result requiring background knowledge – Extremely challenging (and expensive) task!

29 Hybrid Approaches

• Can we do better?

• Let’s take all the social web information into account! – Massive amount of data in acceptable quality – Already successfully used for generating recommendations

30 The Social Web

• The Social Web as a Data Source has become common-place – Collect information before buying products (reviews) – Recommend news articles, movies, books,…

• Mostly all this data is aggregated into a rating – Easy to do, rich in information, and rather ubiquitous – Valuable to extract: collaborative filtering, etc.

31 Perceptual Spaces

• Idea: – Each user has personal likes/dislikes, preferences, etc. that explain the respective rating behaviour – Ratings of each individual will be rather consistent regarding likes/dislikes… a systematic bias – How to dissemble ratings into the individual biases?

• Let users and items be d-dimensional points – Coordinates of a user represent his/her personality (bias) – Coordinates of an item represent its profile regarding personality traits

32 Perceptual Spaces

• Building the perceptual space – Possible from ratings, review texts, tags,…

• Factor Models – Developed to estimate the value of non-observed ratings for the purpose of recommending new unrated items – Ratings are seen as a function of user vectors and item vectors – Prominent factor models: SVD, Euclidian embedding,…

33 Perceptual Spaces

34 Perceptual Spaces

• Intuition – There are good and bad movies • Average rating per movie • Movie bias compared to the average – e.g. “The Good, the Bad, and the Ugly” is a good movie: 9.1 vs. 6.3 on average (Bias +1.8) – There are good-natured and bad-natured users • i.e. providing generally more positive or more negative ratings • Average per-user rating – e.g. user Bob always rates movies worse than the average user (negative bias)

35

Perceptual Spaces

• Modeling – A user without any preference regarding a movie’s properties should rate a movie as • Average rating of all movies + Movie Bias + User Bias – If a rating diverges from this estimation, then he/she expresses preferences • There are some properties he specifically likes, leading to better ratings – e.g. like Science Fiction and Giant Monsters

36 How to use a Perceptual Space?

• Extract the correct distances regarding the topic of interest from the perceptual space… – However, the data is hidden in the space! – What dimensions should contribute to the distances?

• Main idea: train a classifier via crowdsourcing – Provide training set via the crowd: positive and negative examples for humorous movies, good books,… • Non-linear SVM for classification • Non-linear regression for values

37 How to use a Perceptual Space?

Tags Reviews The Social Web Extract Perceptual space Ratings Links

Query Crowdsourcing Crowd- HITs enabled service Result DB

38 Experimental Results

– Yelp.com • Large, but only mildly motivated community – Restaurant rating of San Francisco • 3.8k restaurants, 128k users, 626k ratings • No additional tuning, evaluate against “truth” provided by human expert editors

39 Summary

• Discussion of crowdsourcing for real world applications – Information systems, recommendations, etc. • Quality of crowdsourcing tasks needs to be addressed – Correctness, time, and costs – What type of task, possible quality assurance,… • Training classifiers over perceptual spaces can solve the problem to some degree

40 References

• Anhai Doan, Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing Systems on the World-Wide Web. Communications of the ACM (CACM), No. 54, 2011.

• Franklin, M., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB: Answering Queries with Crowdsourcing. ACM SIGMOD Int. Conf. on Management of Data, Athens, Greece, 2011.

• Selke, J., Lofi, C., Balke, W.-T.: Pushing the Boundaries of Crowd-Enabled Databases with Query-Driven Schema Expansion. 38th Int. Conf. on Very Large Data Bases (VLDB), in PVLDB 5(2), Istanbul, Turkey, 2012.

• Demartini, G., Difallah, D. E., Cudré-Mauroux, P.: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. 21st Int. Conf. on World Wide Web (WWW), Lyon, France, 2012.

41 Thanks for Your Attention

Questions?

[email protected]