Semantic Web -1 (2021) 1–30 1 DOI 10.3233/SW-210431 IOS Press Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective Lucie-Aimée Kaffee a,*, Pavlos Vougiouklis b,** and Elena Simperl c a School of Electronics and Computer Science, University of Southampton, UK E-mail:
[email protected] b Huawei Technologies, UK E-mail:
[email protected] c King’s College London, UK E-mail:
[email protected] Editor: PhilippCimiano, UniversitätBielefeld,Germany Solicitedreviews: John Bateman, Bremen University,Germany;LeoWanner, PompeuFabraUniversity,Spain;Denny Vrandecic,Wikimedia Foundation, USA Abstract. Nowadays naturallanguage generation(NLG) is used ineverything fromnewsreportingand chatbotstosocialmedia management.Recentadvances in machinelearning have made it possibletotrain NLGsystems that seek to achieve human- levelperformance in text writingand summarisation. In this paper, we proposesucha system in thecontextofWikipedia and evaluate it withWikipediareadersand editors.Our solutionbuilds upon theArticlePlaceholder,atool used in 14 under-resourced Wikipedialanguageversions,which displays structured data fromtheWikidata knowledge base on emptyWikipedia pages. We traina neural networktogenerateanintroductory sentence fromtheWikidata triplesshown by theArticlePlaceholder, and explorehowWikipediausers engage with it. Theevaluation, whichincludesanautomatic,a judgement-based,andatask-based component,shows that thesummary sentences scorewellinterms of perceivedfluencyand appropriateness