Semantic Web 0 (0) 1 1 IOS Press 1 1 2 2 3 3 4 Using Natural Language Generation to 4 5 5 6 Bootstrap Missing Wikipedia Articles: 6 7 7 8 8 9 A Human-centric Perspective 9 10 10 * 11 Lucie-Aimée Kaffee , Pavlos Vougiouklis and Elena Simperl 11 12 School of Electronics and Computer Science, University of Southampton, UK 12 13 E-mails: [email protected], [email protected], [email protected] 13 14 14 15 15 16 16 17 17 Abstract. Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media 18 18 management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human- 19 19 level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and 20 evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in 14 under-resourced 20 21 Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. 21 22 We train a neural network to generate a ’introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and 22 23 explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based 23 24 component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and 24 25 can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at 25 26 large, including content quality, trust in technology, and algorithmic transparency. 26 27 27 Keywords: Wikipedia, Wikidata, ArticlePlaceholder, Multilingual, Natural Language Generation, Neural networks 28 28 29 29 30 30 31 31 1. Introduction riety of content integration affordances in Wikipedia, 32 32 including links to articles in other languages and in- 33 33 Wikipedia is available in 301 languages, but its con- foboxes. An example can be seen in Figure 1: in the 34 34 tent is unevenly distributed [1]. Language versions French Wikipedia, the infobox shown in the article 35 35 with less coverage than e.g. English Wikipedia face about cheese (right) automatically draws in data from 36 36 multiple challenges: fewer editors means less quality 37 Wikidata (left) and displays it in French. 37 control, making that particular Wikipedia less attrac- 38 In previous work of ours, we proposed the Article- 38 tive for readers in that language, which in turn makes 39 Placeholder, a tool that takes advantage of Wikidata’s 39 it more difficult to recruit new editors from among the 40 multilingual capabilities to increase the coverage of 40 readers. 41 under-resourced Wikipedias [5]. When someone looks 41 42 Wikidata, the structured-data backbone of Wikipedia for a topic that is not yet covered by Wikipedia in 42 43 [2], offers some help. It contains information about their language, the ArticlePlaceholder tries to match 43 44 more than 55 million entities, for example, people, the topic with an entity in Wikidata. If successful, it 44 45 places or events, edited by an active international com- then redirects the search to an automatically generated 45 46 munity of volunteers [3]. More importantly, it is mul- placeholder page that displays the relevant informa- 46 47 tilingual by design and each aspect of the data can be tion, for example the name of the entity and its main 47 48 translated and rendered to the user in their preferred properties, in their language. The ArticlePlaceholder is 48 49 language [4]. This makes it the tool of choice for a va- currently used in 14 Wikipedias (see Section 3.1). 49 50 In this paper, we propose an iteration of the Ar- 50 51 *Corresponding author. E-mail: [email protected]. ticlePlaceholder to improve the representation of the 51 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved 2 Kaffee et al. / Using Natural Language Generation to Bootstrap Empty Wikipedia Articles 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 Fig. 1. Representation of Wikidata statements and their inclusion in a Wikipedia infobox. Wikidata statements in French (middle, English 16 translation to their left) are used to fill out the fields of the infobox in articles using the fromage infobox on the French Wikipedia. 16 17 17 18 data on the placeholder page. The original version of RQ1 Can we train a neural network to generate 18 19 the tool pulled the raw data from Wikidata (available text from triples in a multilingual setting? To 19 20 as triples with labels in different languages) and dis- answer this question we first evaluated the system 20 21 played it in tabular form (see Figure 3 in Section 3). In using a series of predefined metrics and baselines. 21 22 the current version, we use Natural Language Gener- In addition, we undertook a quantitative study 22 23 ation (NLG) techniques to automatically produce one with participants from two different Wikipedia 23 24 summary sentence from the triples instead. Present- language communities (Arabic and Esperanto), 24 25 ing structured data as text rather than tables helps peo- who were asked to assess, from a reader’s per- 25 26 ple uninitiated with the involved technologies to make spective, whether the text is fluent and appropri- 26 27 sense of it [6]. This is particularly useful in contexts ate for Wikipedia. 27 28 where one cannot make any assumptions about the lev- RQ2 How do editors perceive the generated text 28 29 els of data literacy of the audience, as it is the case for on the ArticlePlaceholder page? To add depth 29 30 a large share of the Wikipedia readers. to the quantitative findings of the first study, we 30 31 Our NLG solution builds upon the general encoder- undertook a second, mixed-methods study within 31 32 decoder framework for neural networks, which is cred- six Wikipedia language communities (Arabic, 32 33 ited with promising results in similar text-centric tasks, Swedish, Hebrew, Persian, Indonesian, and Ukrainian). 33 34 such as machine translation [7, 8] and question gen- We carried out semi-structured interviews, in 34 35 eration [9–11]. We extend this framework to meet the which we asked editors to comment on their ex- 35 36 needs of different Wikipedia language communities in perience with reading the summaries generated 36 37 terms of text fluency, appropriateness to Wikipedia, through our approach and we identified common 37 38 and reuse during article editing. Given an entity that themes in their answers. Among others, we were 38 39 was matched by the ArticlePlaceholder, our system interested to understand how editors perceive text 39 40 uses its triples to generate a short Wikipedia-style sum- that is the result of the artificial intelligence (AI) 40 41 mary. Many existing NLG techniques produce sen- algorithm rather than being manually crafted, and 41 42 tences with limited usability in user-facing systems; how they deal with so-called <rare> tokens in 42 43 one of the most common problems is their ability to the sentences. Those tokens represent realisations 43 44 handle rare words [12, 13], which are words that the of infrequent entities in the text, that data-driven 44 45 model does not meet frequently enough during train- approaches generally struggle to verbalise [12]. 45 46 ing, such as localisations of names in different lan- RQ3 How do editors use the generated sentence in 46 47 guages. We introduce a mechanism called property their work? As part of the second study, we also 47 48 placeholder [14] to tackle this problem, learning mul- asked participants to edit the placeholder page, 48 49 tiple verbalisations of the same entity in the text [6]. starting from the automatically generated text or 49 50 In building the system we aimed to pursue the fol- removing it completely. We assessed text reuse 50 51 lowing research questions: both quantitatively, using a string-matching met- 51 Kaffee et al. / Using Natural Language Generation to Bootstrap Empty Wikipedia Articles 3 1 ric, and qualitatively through the interviews. Just in Section 6 and 7, before concluding with a summary 1 2 like in RQ2, we were also interested to under- of contributions and planned future work in Section 8. 2 3 stand whether summaries with <rare> tokens, 3 Previous submissions A preliminary version of this 4 which point to limitations in the algorithm, would 4 work was published in [14, 16]. In the current pa- 5 5 be used when editing and how the editors would per, we have carried out a comprehensive evaluation of 6 6 work around the tokens. the approach, including a new qualitative study and a 7 7 task-based evaluation with editors from six language 8 The evaluation helps us build a better understand- 8 communities. By comparison, the previous publica- 9 ings of the tools and experience we need to help nur- 9 tions covered only a metric-based corpus evaluation 10 ture under-served Wikipedias. Our quantitative analy- 10 which was complemented by a small quantitative study 11 sis of the reading experience showed that participants 11 of text fluency and appropriateness in the second one. 12 rank the summary sentences close to the expected 12 The neural network architecture has been presented in 13 quality standards in Wikipedia, and are likely to con- 13 detail in [14].
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages30 Page
-
File Size-