Proceedings of the 1St Workshop on Tools and Resources to Empower
Total Page:16
File Type:pdf, Size:1020Kb
LREC 2020 Workshop Language Resources and Evaluation Conference 11–16 May 2020 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) PROCEEDINGS Edited by Nuria´ Gala and Rodrigo Wilkens Proceedings of the LREC 2020 first workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) Edited by: Núria Gala and Rodrigo Wilkens ISBN: 979-10-95546-44-3 EAN: 9791095546443 For more information: European Language Resources Association (ELRA) 9 rue des Cordelières 75013, Paris France http://www.elra.info Email: [email protected] c European Language Resources Association (ELRA) These workshop proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License ii Preface Recent studies show that the number of children and adults facing difficulties in reading and understanding written texts is steadily growing. Reading challenges can show up early on and may include reading accuracy, speed, or comprehension to the extent that the impairment interferes with academic achievement or activities of daily life. Various technologies (text customization, text simplification, text to speech devices, screening for readers through games and web applications, to name a few) have been developed to help poor readers to get better access to information as well as to support reading development. Among those technologies, text simplification is a powerful way to leverage document accessibility by using NLP techniques. The “First Workshop on Tools and Resources to Empower People with REAding DIfficulties” (READI), collocated with the “International Conference on Language Resources and Evaluation” (LREC 2020), aims at presenting current state-of-the-art techniques and achievements for text simplification together with existing reading aids and resources for lifelong learning, addressing a variety of domains and languages, including natural language processing, linguistics, psycholinguistics, psychophysics of vision and education. A particular focus is put on methods, tools and resources obtained through automatic text adaptation, ultimately addressed to children struggling with difficulties in learning to read, to the community of teachers, to speech-language pathologists, to parents seeking solutions, and to those professionals involved with adults struggling with reading (e.g. illiterates, aphasic readers and low vision readers). These proceedings comprise the papers of the workshop, initially foreseen for May 11, 2020, in Marseille (France), but postponed at the time of preparing these proceedings due to the COVID-19 situation around the world. 21 propositions have been submitted from 62 different authors from 12 different countries (France 5, UK 4, Italy 2, Spain 2, Sweden 2, Switzerland 2, Belgium 1, Brazil 1, Germany 1, Iceland 1, Netherlands 1, Pakistan 1). The total rate of accepted papers is 66% (14 papers), 5 of them chosen as oral presentations and 9 for the poster session. READI also features one invited speaker, Arne Jönsson from Linköping University. We are thankful to the authors who submitted their work to this workshop, to our Program Committee members for their contributions, to the reviewers and the additional reviewers who did a thorough job reviewing submissions, to Arne Jönsson who kindly accepted to be our invited speaker, and to LREC committee for including this workshop into their program. Núria Gala and Rodrigo Wilkens iii Organizers: Delphine Bernhard, Université de Strasbourg, France Thomas François, Université catholique de Louvain, Belgium Núria Gala, Aix Marseille Université, France Daria Goriachun, Aix Marseille Université, France Ludivine Javourey-Drevet, Aix Marseille Université, France Anne Laure Ligozat, Université Paris Sud, France Horacio Saggion, Université Pompeu Fabra, Catalonia, Spain Amalia Todirascu, Université de Strasbourg, France Rodrigo Wilkens, University of Essex, United Kingdom Johannes Ziegler, Aix Marseille Université, France Program Committee: Delphine Bernhard, Université de Strasbourg, France Dominique Brunato, ILC, Pisa, Italy Eric Castet, Aix Marseille Université, France Thomas François, Université catholique de Louvain, Belgium Núria Gala, Aix Marseille Université, France Arne Jönsson, Linköping University, Sweden Ekaterina Kochmar, University of Cambridge, United Kingdom Anne Laure Ligozat, Université Paris Sud, France Detmar Meurers, Université de Tübingen, Germany Ildiko Pilan, University of Oslo, Norway Horacio Saggion, Université Pompeu Fabra, Catalonia, Spain Sanja Stajner, Symanto Research, Germany Anaïs Tack, Université catholique de Louvain, Belgium Amalia Todirascu, Université de Strasbourg, France Vincent Vandeghinste, Instituut voor de Nederlandse Taal, Netherlands Giulia Venturi, ILC, Pisa, Italy Aline Villavicencio, University of Sheffield, United Kingdom Elena Volodina, University of Gotheburg, Sweden Rodrigo Wilkens, University of Essex, United Kingdom Johannes Ziegler, Aix Marseille Université, France Additional Reviewers: Carlos Ramisch, Aix Marseille Université, France Leonardo Zilio, University of Surrey, United Kingdom Michael Zock, Aix Marseille Université, France Invited Speaker: Arne Jönsson, Linköping University, Sweden iv Table of Contents Disambiguating Confusion Sets as an Aid for Dyslexic Spelling Steinunn Rut Friðriksdóttir and Anton Karl Ingason . .1 Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences Evelina Rennes . .6 Incorporating Multiword Expressions in Phrase Complexity Estimation Sian Gooding, Shiva Taslimipoor and Ekaterina Kochmar . 14 Automatically Assess Children’s Reading Skills Ornella Mich, Nadia Mana, Roberto Gretter, Marco Matassoni and Daniele Falavigna . 20 Text Simplification to Help Individuals with Low Vision Read More Fluently Lauren Sauvan, Natacha Stolowy, Carlos Aguilar, Thomas François, Núria Gala, Frédéric Matonti, Eric Castet and Aurélie Calabrèse . 27 Identifying Abstract and Concrete Words in French to Better Address Reading Difficulties Daria Goriachun and Núria Gala . 33 Benchmarking Data-driven Automatic Text Simplification for German Andreas Säuberli, Sarah Ebling and Martin Volk. 41 Visualizing Facets of Text Complexity across Registers Marina Santini, Arne Jonsson and Evelina Rennes . 49 CompLex — A New Corpus for Lexical Complexity Predicition from LikertScale Data. Matthew Shardlow, Marcos Zampieri and Michael Cooper . 57 LagunTest: A NLP Based Application to Enhance Reading Comprehension Kepa Bengoetxea, Itziar Gonzalez-Dios and Amaia Aguirregoitia. .63 A Lexical Simplification Tool for Promoting Health Literacy Leonardo Zilio, Liana Braga Paraguassu, Luis Antonio Leiva Hercules, Gabriel Ponomarenko, Laura Berwanger and Maria José Bocorny Finatto . 70 A multi-lingual and cross-domain analysis of features for text simplification Regina Stodden and Laura Kallmeyer . 77 Combining Expert Knowledge with Frequency Information to Infer CEFR Levels for Words Alice Pintard and Thomas François . 85 Coreference-Based Text Simplification Rodrigo Wilkens, Bruno Oberle and Amalia Todirascu . 93 v 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI2020), pages 1–5 Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC Disambiguating Confusion Sets as an Aid for Dyslexic Spelling Steinunn Rut Friðriksdóttir, Anton Karl Ingason Faculty of Icelandic and Comparative Cultural Studies University of Iceland, Sæmundargata 2, 102 Reykjavík srf2, [email protected] Abstract Spell checkers and other proofreading software are crucial tools for people with dyslexia and other reading disabilities. Most spell checkers automatically detect spelling mistakes by looking up individual words and seeing if they exist in the vocabulary. However, one of the biggest challenges of automatic spelling correction is how to deal with real-word errors, i.e. spelling mistakes which lead to a real but unintended word, such as when then is written in place of than. These errors account for 20% of all spelling mistakes made by people with dyslexia. As both words exist in the vocabulary, a simple dictionary lookup will not detect the mistake. The only way to disambiguate which word was actually intended is to look at the context in which the word appears. This problem is particularly apparent in languages with rich morphology where there is often minimal orthographic difference between grammatical items. In this paper, we present our novel confusion set corpus for Icelandic and discuss how it could be used for context-sensitive spelling correction. We have collected word pairs from seven different categories, chosen for their homophonous properties, along with sentence examples and frequency information from said pairs. We present a small-scale machine learning experiment using a decision tree binary classification which results range from 73% to 86% average accuracy with 10-fold cross validation. While not intended as a finalized result, the method shows potential and will be improved in future research. Keywords: homophones, dyslexia, reading disabilities, confusion sets, disambiguation, context dependency, Icelandic 1. Introduction Confusion Set Corpus (ICoSC). In Section 4, we briefly discuss our machine learning experiments with the corpus. According to Mody and Silliman, dyslexia accounts for We conclude in Section 5. 80% of diagnosed learning disabilities. It causes problems with the mapping process between orthographic and phono- 2. Context