Crowdsourcing Language Resources for Dutch Using PYBOSSA: Case Studies on Blends, Neologisms and Language Variation

Crowdsourcing Language Resources for Dutch using PYBOSSA: Case Studies on Blends, Neologisms and Language Variation Peter Dekker, Tanneke Schoonheim Instituut voor de Nederlandse Taal (Dutch Language Institute) {peter.dekker,tanneke.schoonheim}@ivdnt.org Abstract In this paper, we evaluate PYBOSSA, an open-source crowdsourcing framework, by performing case studies on blends, neologisms and language variation. We describe the procedural aspects of crowdsourcing, such as working with a crowdsourcing platform and reaching the desired audience. Furthermore, we analyze the results, and show that crowdsourcing can shed new light on how language is used by speakers. Keywords: crowdsourcing, lexicography, neologisms, language variation 1. Introduction Open-task crowdsourcing has been applied to lexicography Crowdsourcing (or: citizen science) has shown to be a for other languages, such as Slovene, where crowdsourcing quick and cost-efficient way to perform tasks by a large was integrated in the thesaurus and collocation dictionary number of lay people, which normally have to be per- applications (Holdt et al., 2018; Kosem et al., 2018). On formed by a small number of experts (Holley, 2010; Causer top of this goal of language documentation, we would like et al., 2018). In this paper, we use the PYBOSSA (PB) to use crowdsourcing to make language material available framework1 for crowdsourcing language resources for the for language learners. Dutch language. We will describe our experiences with this framework, to accomplish the goals of language doc- 2. Method umentation and generation of language learning material. In addition to sharing our experiences, we will report on As the basis for our experiments, we hosted an instance linguistic findings based on the experiments we performed of PYBOSSA at our institute. We named our crowdsourc- on blends, neologisms and language variation. ing platform Taalradar (‘language radar’): this signifies For the Dutch language, crowdsourcing has been valuable both the ‘radar’ (overview) we would like to gain over in the past. We distinguish two types of approaches. On the entire language through crowdsourcing, and the per- one hand, there are fixed tasks, where more or less one an- sonal ‘language radar’ or linguistic intuition of contribu- swer is correct. As fixed tasks, crowdsourcing has been tors, which we would like to exploit. We ran two crowd- applied to the transcription of letters from 17th and 18th- sourcing rounds: in september 2018 and in november- century Dutch sailors (Van der Wal et al., 2012) and his- december 2018. torical Dutch Bible translations (Beelen and Van der Sijs, 2014). 2.1. Tasks On the other hand, there are open tasks, referred to in em- We designed four tasks, which are well-suited to reach our pirical sciences as elicitation tasks, where different answers goals: documentation of the Dutch language and develop- by different users are welcomed, in order to capture varia- ing material for language learning. Since we would like to tion. Examples of open tasks are Palabras (Burgos et al., get a picture of the speakers of the language, we ask for 2015; Sanders et al., 2016), where lay native Dutch speak- user details (gender, age and city of residence) in all tasks. ers were asked to transcribe vowels produced by L2 learn- The tasks were created as Javascript/HTML files inside PB. ers, and Emigrant Dutch2, which tries to capture the lan- Tasks 1 and 2: Blends analysis and recognition Blends guage use of emigrant Dutch speakers. Of course, mixture are compound words, formed “by fusing parts of at least forms between open and fixed tasks are possible. two other source words of which either one is shortened in The Dutch Language Institute strives to document the lan- the fusion and/or where there is some form of phonemic or guage as it is used, by compiling language resources (eg. graphemic overlap of the source words” (Gries, 2004). An dictionaries) based on corpora from different sources, such example of a blend in both English and Dutch is brunch, as newspapers and websites. Fixed-task crowdsourcing, which consists of breakfast and lunch. For our experi- such as transcription and correction, can help in this pro- ments, we used blends collected for the Algemeen Ned- cess. However, we see even greater possibilities for open- erlands Woordenboek (Dictionary of Contemporary Dutch; task crowdsourcing, asking speakers how they use and per- ANW) (Tiberius and Niestadt, 2010; Schoonheim and Tem- ceive the language, which we will explore in this paper. pelaars, 2010; Tiberius and Schoonheim, 2016). analysis recognition 1Homepage: http://pybossa.com. DOI: https:// We developed two tasks: and of doi.org/10.5281/zenodo.1485460 blends. In the analysis task, contributors are presented with 2http://www.meertens.knaw.nl/ a blend, and asked of which source words this blend con- vertrokken-nederlands/ sists. No context of the blend is provided. 10 blends are EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 36 presented in total. Figure 1 shows the task as it is presented of the international society of Dutch linguistics and Drongo to the contributor. festival, an event for the language sector in The Nether- lands. Finally, we advertised our experiments via social media (Twitter, LinkedIn) and a Dutch linguistics blog. 3. Results The results section consists of two parts. We will first de- Figure 1: Screenshot of the blends analysis task. scribe our experiences with PYBOSSA as a crowdsourcing platform. Then, we will report on the linguistic findings on In the recognition task, contributors are presented with a ci- the language phenomena we performed experiments on. tation from the ANW dictionary. 10 citations are presented. Contributors should recognize the blend in the citation. Ev- 3.1. Experiences of crowdsourcing with ery citation contains one blend, but we ask for “one or mul- PYBOSSA tiple blends” and present users with tree input fields to enter Table 1 shows the number of contributors for each of the blends. We deliberately designed the task in this somewhat tasks. It can be observed that only a small number of visi- deceptive way, to see which other words are candidates for tors did not finish the whole task. This could be due to the being perceived as blends. small number of questions we offered per task. The tasks in Task 3: Neologisms In this task, contributors were asked the second round (november-december 2018) received less to judge neologisms (new words) in a citation, on two cri- contributors than in the first round (september 2018), this teria: endurance of the concept (“This word will be used could be due to a less prominent place of the announcement for long time.”) and diversity of users and situations (“This in our newsletter in the second round. In all experiments, word will be used by different people [eg. young, old] in more women than men participated. Also, participants with different situations [eg. conversation, newspaper].”). We ages above 50 were well represented. More participants selected these two criteria from the FUDGE test, a test came from The Netherlands than from Flanders. to rate the sustainability of a neologism, which normally consists of 5 criteria (Metcalf, 2004). The neologisms and Task # started # completed period their citations were taken from newspaper material, which Blends analysis 326 305 sept 2018 is used in the lexicographic workflow (see section 4.). From Blends recognition 223 209 sept 2018 this corpus, sentences which contain a hitherto unknown Neologisms 118 111 nov-dec 2018 word are extracted: these are possible neologisms, but can Language variation 114 108 nov-dec 2018 also be words that have been formed ad hoc. Lexicogra- phers accept or reject a word as neologism. We presented Table 1: Number of contributors per task. 15 words in a citation to users: 5 which have been attested by lexicographers as neologisms, 5 which have been re- We will now discuss our experiences with the PYBOSSA jected as neologisms, and 5 unattested words. platform. A strength of PB is the freedom it offers when de- Task 4: Language variation In this task, contributors are signing a task: the whole interface can be written in HTML asked how they call a certain concept or how they would ex- and Javascript and can be customized. This also makes it press a certain sentence. The goal is to chart dialectal vari- easy to share tasks with other researchers4. The account ation, but also other kinds of language variation. We used system and saving/loading of tasks is handled by PB, so a list of questions from Taalverhalen.be, a website which this does not have to be implemented by the task developer. tries to chart language variation using questionnaires3. The Responses of the PB authors on the bug tracker are quick list contains 16 questions: 9 questions on words for sweets, and concise. It is clear that PB is mainly designed for fixed- and 6 questions about the general vocabulary. An example task crowdsourcing, not focusing on variation and the de- of a question is: “How do you call VINEGAR?”. On top tails of the contributor. For open-task, linguistic purposes, of the user details we ask in other tasks (gender, age, city some points require attention (at time of writing). Firstly, of residence), we also ask for province, mother tongue and there is no built-in support for asking contributor details. educational level. We handled this by asking contributor details via a normal question. However, since all given answers are visible pub- 2.2. Audience licly in PB, this also applies to the details, which may not Our experiments were advertised via our institutional be ideal from a privacy perspective. Secondly, contributors newsletter, which reaches 3891 subscribers with an inter- cannot go back to a previous task and change their answers.

Crowdsourcing Language Resources for Dutch Using PYBOSSA: Case Studies on Blends, Neologisms and Language Variation

Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk THESIS Presented in Partial Fulfillment of the Requiremen

Return of the Crowds: Mechanical Turk and Neoliberal States of Exception 79 Ayhan Aytes Vi Contents

I Found Work on an Amazon Website. I Made 97 Cents an Hour. - the New York Times Crossword Times Insider Newsletters the Learning Network

Privacy Experiences on Amazon Mechanical Turk

Crowd Economy and Digital Precarity

Mechanical Turk

Guide to Chess.Pdf

Give a Penny for Their Thoughts

Using Amazon's Mechanical Turk & Machine Learning to Identify & Model Owners of Solar Panels

Using Crowdsourced Online Experiments to Study Context-Dependency of Behavior

What Motivates Effort? Evidence and Expert Forecasts∗

Conducting Interactive Experiments Online