Compiling Attention Datasets

DEGREE PROJECT, IN COMPUTER SCIENCE , FIRST LEVEL STOCKHOLM, SWEDEN 2015 Compiling attention datasets DEVELOPING A METHOD FOR ANNOTATING FACE DATASETS WITH HUMAN PERFORMANCE ATTENTION LABELS USING CROWDSOURCING DANIEL ROMULD AND MARKUS RUHMÉN KTH ROYAL INSTITUTE OF TECHNOLOGY CSC SCHOOL KTH Computer Science and Communication Compiling attention datasets Developing a method for annotating face datasets with human performance attention labels using crowdsourcing Daniel Romuld & Markus Ruhmén Degree Project in Computer Science, DD143X Supervisor: Richard Glassey Examiner: Örjan Ekeberg CSC, KTH June 2, 2015 Sammanfattning Denna uppsats behandlar problemet med att upptäcka mänsklig uppmärk- samhet, vilket är ett problem inom datorseende. För att göra framsteg mot att lösa problemet utvecklades en metod för att skapa uppmärksamhets- märkningar till dataset av ansiktsbilder. Märkningarna utgör ett mått av den uppfattade uppmärksamhetsnivån hos personerna i bilderna. Arbetet i denna uppsats motiveras av avsaknaden av dataset med uppmärksamhets- märkningar och den potentiella användbarheten av den framtagna metoden. Metoden konstruerades med fokus på att maximera tillförlitligheten och användbarheten av insamlad data och det resulterande datasetet. Som ett första steg i metodutvecklingen genererades bilder på folkmassor genom att använda datasetet Labeled Faces in the Wild. Evaluering av uppmärk- samhetsnivån hos personerna i bilderna, som individer i en folkmassa, blev då möjligt. Denna egenskap utvärderades av arbetare på crowdsourcing- plattformen CrowdFlower. Svaren analyserades och kombinerades för att be- räkna ett uppmärksamhetsmått med mänsklig prestanda för varje individ i bilderna. Resultatanalysen visade att svaren från arbetarna på CrowdFlower var tillförlitliga med hög intern konsistens. Den framtagna metoden ansågs vara ett giltigt tillvägagångssätt för att skapa uppmärksamhetsmärkningar. Möjliga förbättringar identifierades i flera delar av metoden och redovisas som del av uppsatsens huvudresultat. Abstract This essay expands on the problem of human attention detection in computer vision. This is achieved by providing a method for annotating existing face datasets with attention labels through the use of human intelligence. The work described in this essay is justified by a lack of human performance attention datasets and the potential uses of the developed method. Several images of crowds were generated using the Labeled Faces in the Wild dataset of images depicting faces. Thus enabling evaluation of the level of attention of the depicted subjects as part of a crowd. The data collection methodology was carefully designed to maximise reliability and usability of the resulting dataset. The crowd images were evaluated by workers on the crowdsourcing platform CrowdFlower, which yielded human performance attention labels. Analysis of the results showed that the submissions from workers on the crowdsourcing platform displayed a high level of consistency and reliability. Hence, the developed method, although not fully optimised, was deemed to be a valid process for creating a representation of human attention in a dataset. Contents 1 Introduction 1 1.1 Problem statement . .1 1.2 Method structure . .2 1.3 Projected outcome and contribution . .3 1.4 Overview . .3 1.5 Terminology . .4 2 Background 4 2.1 Face datasets . .5 2.1.1 Uses for datasets . .5 2.1.2 Compilation and design of datasets . .7 2.2 Data gathering . .7 2.2.1 Conventional methods . .8 2.2.1.1 College samples . .8 2.2.1.2 Web surveys . .8 2.2.2 Crowdsourcing . .9 2.2.2.1 CrowdFlower . .9 2.2.2.2 Amazon’s Mechanical Turk . 10 2.2.2.3 Ethical concerns of using crowdsourcing . 11 2.3 Summary . 11 3 Method 11 3.1 Generating images . 11 3.1.1 Why LFW . 12 3.1.2 Method of choosing members . 13 3.1.3 Background removal . 13 3.1.4 Actual image generation . 14 3.1.5 Design details . 15 3.1.6 Generated files . 16 3.1.7 Software used . 17 3.2 Collecting data . 17 3.2.1 Beta test . 18 3.2.1.1 Question . 19 3.2.1.2 Setting . 19 3.2.2 CrowdFlower study . 19 3.2.2.1 Why CrowdFlower . 20 3.2.2.2 Question and instructions . 20 3.2.2.3 Example . 21 3.2.2.4 Image . 21 3.2.2.5 Input . 21 3.2.2.6 Wage . 21 4 Beta test results, analysis and conclusions 22 4.1 Results . 22 4.2 Analysis and conclusions . 22 5 CrowdFlower study results and analysis 23 5.1 Primary results . 23 5.2 Secondary results . 23 5.3 Analysis . 23 6 Discussion 25 6.1 Image generation . 25 6.2 CrowdFlower setting . 26 6.3 Sampling method . 26 6.4 Further development . 27 7 Summary and conclusions 27 A Crowd details 32 A.1 Details and naming . 32 A.2 Post-study changes . 32 B CrowdFlower submissions 33 B.1 Format . 33 B.2 Post-study changes . 33 C The LFW+at dataset 34 D CrowdFlower task example 39 List of Figures 1 Structure of the essay’s method . .2 2 Structure of the proposed annotation method . .2 3 Examples of images before and after background removal . 13 4 Examples of images before and after head size normalisation . 14 5 Example of a crowd partitioned into zones . 15 6 Example of small section of a crowd permutation image . 17 7 Spreadsheet used during beta test . 18 8 Second LFW image of Anibal Ibarra . 24 List of Tables 1 Some modern datasets . .6 2 Beta test completion times . 22 3 Small sample of beta test results . 22 4 Cronbach’s α of CrowdFlower submissions . 24 5 Submissions to Anibal_Ibarra_0002 . 24 1 Introduction This essay describes the development of a method for annotating face datasets with attention labels. The annotation process consists of assigning a value to each element in a face dataset, which is a collection of images depicting faces. The values are a measure of the perceived level of attention of the depicted subject. Therefore, they are referred to in this essay as human performance attention labels. The method is demonstrated and developed by compiling attention labelling for the Labeled Faces in the Wild (LFW) [1] face dataset. The labels were gathered through crowdsourcing, which is a method of outsourcing the annotation task to large crowds of people [2]. The annotation method’s reliability was evaluated in each step of the process to identify pitfalls and factors that could reduce the method’s reliability. The goal of developing an annotation method is to aid in solving the attention detection problem. The attention detection problem is the problem, in the field of computer vision, of visually detecting whether a person is pay- ing attention to some given object, place or person. Compiling a labelled dataset is a crucial part in solving the attention detection problem since advances in the problem domain cannot be made without data. 1.1 Problem statement A problem in the field of computer vision is to reliably find a suitable and suf- ficient dataset that will fit the needs of the researcher. Correctly compiling a new dataset is a time consuming and costly endeavor that requires careful planning and consideration. The most commonly used face datasets are of- ten labelled with identity labels. Some include additional labels or attributes such as the angle between camera and face orientation [3, 4], distance [5, 6] and displayed emotion [7, 8], but none include human performance attention labels. This essay proposes a method for annotating existing facial datasets by adding attention labelling through the use of crowdsourcing. The goal is to answer the following questions: • Does the proposed annotation method yield statistically satisfactory results in term of consistency and reliability? • Which pitfalls and possible improvements can be identified by using the proposed annotation method? These questions were answered by formulating a method based on background research and evaluating both the submissions from the crowdsourcing platform and the yielded attention labels. 1 1.2 Method structure The annotation method differs from the overall method used in this essay. The essay’s method consists of two stages as illustrated in Figure 1. The first stage consists of using the proposed annotation method to compile a dataset with attention labels. The second stage involves identifying pitfalls and possible improvements from the knowledge gained from the first stage. Figure 1: Structure of the essay’s method The proposed annotation method consists of three stages as illustrated in Figure 2. The first stage in the proposed annotation method is to generate crowd images. Background removal and head size normalization is per- formed on pseudo-randomly chosen images in the chosen face dataset. The resulting images are inserted into a virtual lecture hall. In the second stage, human intelligence is used by gathering data on the chosen crowdsourcing platform. The workers’ opinion of the crowd attendees’ level of attention is requested. In the third stage, the workers’ answers are analysed to evalu- ate reliability and remove frivolous entries. As a result of performing the described stages, a new dataset with attention annotation is created. Figure 2: Structure of the proposed annotation method 2 1.3 Projected outcome and contribution The developed annotation method and the compiled dataset, named Labeled Faces in the Wild+attention (LFW+at), will facilitate and enable further research on attention in behavioural sciences such as psychology and peda- gogy. By annotating a dataset with labels such as attention, a new platform on which to build and perform new experiments is created. The compiled dataset would benefit in interdisciplinary research by using it in machine learning algorithms and iteratively approach an accurate definition of attention. The developed method and the identified strengths and weaknesses will enable further research and exploration of the attention detection problem and related problems. 1.4 Overview The essay mostly follows a conventional structure. Section 2 explores the relevant state of the art and theoretical background.

Compiling Attention Datasets

Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk THESIS Presented in Partial Fulfillment of the Requiremen

Return of the Crowds: Mechanical Turk and Neoliberal States of Exception 79 Ayhan Aytes Vi Contents

I Found Work on an Amazon Website. I Made 97 Cents an Hour. - the New York Times Crossword Times Insider Newsletters the Learning Network

Privacy Experiences on Amazon Mechanical Turk

Crowd Economy and Digital Precarity

Crowdsourcing Language Resources for Dutch Using PYBOSSA: Case Studies on Blends, Neologisms and Language Variation

Mechanical Turk

Guide to Chess.Pdf

Give a Penny for Their Thoughts

Using Amazon's Mechanical Turk & Machine Learning to Identify & Model Owners of Solar Panels

Using Crowdsourced Online Experiments to Study Context-Dependency of Behavior

What Motivates Effort? Evidence and Expert Forecasts∗