Proceedings of the 14Th Conference of the European Chapter of The
Total Page:16
File Type:pdf, Size:1020Kb
EACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) April 27, 2014 Gothenburg, Sweden c 2014 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-937284-91-6 ii Introduction Welcome to the Third International Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR). The last few years have seen a resurgence of work on text simplification and readability. Examples include learning lexical and syntactic simplification operations from Simple English Wikipedia revision histories, exploring more complex lexico-syntactic simplification operations requiring morphological changes as well as constituent reordering, simplifying mathematical form, applications for target users such as dyslexics, deaf students, second language learners and low literacy adults, and fresh attempts at predicting readability. The PITR 2014 workshop has been organised to provide a cross-disciplinary forum for discussing key issues related to predicting and improving text readability for target users. It will be held on April 27, 2014 in conjunction with the 14th Conference of the European Association for Computational Linguistics in Gothenburg, Sweden, and is sponsored by the ACL Special Interest Group on Speech and Language Processing for Assistive Technologies (SIG-SLPAT). These proceedings include fifteen papers that cover various perspectives on the topic: simplification in specific domains such as medicine and patents, simplification for specific languages, tailoring text for specific users (e.g., dyslexia and autism), development of new corpora, automatic system evaluation, analyses of human simplifications, studies of human reading, and predicting the reading level of text in general and for particular genres. We hope this volume is a valuable addition to the literature, and look forward to an exciting Workshop. Sandra Williams Advaith Siddharthan Ani Nenkova iii Organizers: Sandra Williams, The Open University, UK. Advaith Siddharthan, University of Aberdeen, UK. Ani Nenkova, University of Pennsylvania, USA. Programme Committee: Stefan Bott, Universitat Pompeu Fabra, Spain Kevyn Collins-Thompson, University of Michigan, USA Siobhan Devlin, University of Sunderland, UK Micha Elsner, Ohio State University, USA Richard Evans, University of Wolverhampton, UK Oliver Ferschke, Technische Universität Darmstadt, Germany Thomas Francois, University of Louvain, Belgium Caroline Gasperin, SwiftKey, UK Albert Gatt, University of Malta, Malta Raquel Hervas, Universidad Complutense de Madrid, Spain Veronique Hoste, University College Ghent, Belgium Matt Huenerfauth, The City University of New York (CUNY), USA David Kauchak, Middlebury College, USA Annie Louis, University of Edinburgh, UK Ruslan Mitkov, University of Wolverhampton, UK Hitoshi Nishikawa, NTT, Japan Ehud Reiter, University of Aberdeen, UK Matthew Shardlow, Uni of Manchester, UK Lucia Specia, University of Sheffield, UK Ivelina Stoyanova, BAS, Bulgaria Irina Temnikova, Qatar Computing Research Institute, Qatar Sowmya Vajjala, Uni Tuebingen, Germany Ielka van der Sluis, University of Groningen, The Netherlands Jennifer Williams, MIT, USA Kristian Woodsend, University of Edinburgh, UK v Table of Contents One Step Closer to Automatic Evaluation of Text Simplification Systems Sanja Štajner, Ruslan Mitkov and Horacio Saggion . .1 Automatic diagnosis of understanding of medical words Natalia Grabar, Thierry Hamon and Dany Amiot . 11 Exploring Measures of “Readability” for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs Sowmya Vajjala and Detmar Meurers . 21 Keyword Highlighting Improves Comprehension for People with Dyslexia Luz Rello, Horacio Saggion and Ricardo Baeza-Yates . 30 An eye-tracking evaluation of some parser complexity metrics Matthew J. Green . 38 Syntactic Sentence Simplification for French Laetitia Brouwers, Delphine Bernhard, Anne-Laure Ligozat and Thomas Francois . 47 Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language Emil Abrahamsson, Timothy Forni, Maria Skeppstedt and Maria Kvist . 57 Segmentation of patent claims for improving their readability Gabriela Ferraro, Hanna Suominen and Jaume Nualart . 66 Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Re- sults Gintare Grigonyte, Maria Kvist, Sumithra Velupillai and Mats Wirén . 74 An Open Corpus of Everyday Documents for Simplification Tasks David Pellow and Maxine Eskenazi. .84 EACL - Expansion of Abbreviations in CLinical text Lisa Tengstrand, Beáta Megyesi, Aron Henriksson, Martin Duneld and Maria Kvist . 94 A Quantitative Insight into the Impact of Translation on Readability Alina Maria Ciobanu and Liviu Dinu . 104 Classifying easy-to-read texts without parsing Johan Falkenjack and Arne Jonsson . 114 An Analysis of Crowdsourced Text Simplifications Marcelo Amancio and Lucia Specia . 123 An evaluation of syntactic simplification rules for people with autism Richard Evans, Constantin Orasan and Iustin Dornescu . 131 vii Workshop Program Sunday April 27, 2014 (09:00) Session 1 - Keynote 09:00 Welcome and opening remarks 09:10 Keynote: Ehud Reiter Choosing Appropriate Words in Generated Texts for Low- Skill Readers 10:30 Coffee break (11:00) Session 2 - Papers 11:00 One Step Closer to Automatic Evaluation of Text Simplification Systems Sanja Štajner, Ruslan Mitkov and Horacio Saggion 11:20 Automatic diagnosis of understanding of medical words Natalia Grabar, Thierry Hamon and Dany Amiot 11:40 Exploring Measures of “Readability” for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs Sowmya Vajjala and Detmar Meurers 12:00 Keyword Highlighting Improves Comprehension for People with Dyslexia Luz Rello, Horacio Saggion and Ricardo Baeza-Yates 12:20 Discussion 12:30 Lunch break (14:00) Session 3 - Papers 14:00 An eye-tracking evaluation of some parser complexity metrics Matthew J. Green 14:20 Syntactic Sentence Simplification for French Laetitia Brouwers, Delphine Bernhard, Anne-Laure Ligozat and Thomas Francois 14:40 Panel 15:30 Coffee break (16:00) Session 4 - Posters Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language Emil Abrahamsson, Timothy Forni, Maria Skeppstedt and Maria Kvist ix Sunday April 27, 2014 continued Segmentation of patent claims for improving their readability Gabriela Ferraro, Hanna Suominen and Jaume Nualart Improving Readability of Swedish Electronic Health Records through Lexical Sim- plification: First Results Gintare Grigonyte, Maria Kvist, Sumithra Velupillai and Mats Wirén An Open Corpus of Everyday Documents for Simplification Tasks David Pellow and Maxine Eskenazi EACL - Expansion of Abbreviations in CLinical text Lisa Tengstrand, Beáta Megyesi, Aron Henriksson, Martin Duneld and Maria Kvist A Quantitative Insight into the Impact of Translation on Readability Alina Maria Ciobanu and Liviu Dinu Classifying easy-to-read texts without parsing Johan Falkenjack and Arne Jonsson An Analysis of Crowdsourced Text Simplifications Marcelo Amancio and Lucia Specia An evaluation of syntactic simplification rules for people with autism Richard Evans, Constantin Orasan and Iustin Dornescu + Guest Poster, a preview of an EACL Main Session Paper: Assessing the Relative Reading Level of Sentence Pairs for Text Simplification Sowmya Vajjala and Detmar Meurers 16:50 Closing Remarks x One Step Closer to Automatic Evaluation of Text Simplification Systems Sanja Stajnerˇ 1 and Ruslan Mitkov1 and Horacio Saggion2 1Research Group in Computational Linguistics, University of Wolverhampton, UK 2TALN Research Group, Universitat Pompeu Fabra, Spain [email protected], [email protected], [email protected] Abstract send and Lapata, 2011a; Coster and Kauchak, This study explores the possibility of re- 2011; Wubben et al., 2012), Spanish (Saggion et placing the costly and time-consuming al., 2011), and Portuguese (Alu´ısio et al., 2008a), human evaluation of the grammaticality with recent attempts at Basque (Aranzabe et al., and meaning preservation of the output 2012), Swedish (Rybing et al., 2010), Dutch of text simplification (TS) systems with (Ruiter et al., 2010), and Italian (Barlacchi and some automatic measures. The focus is on Tonelli, 2013). six widely used machine translation (MT) Usually, TS systems are either evaluated for: (1) evaluation metrics and their correlation the quality of the generated output, or (2) the effec- with human judgements of grammatical- tiveness/usefulness of such simplification on read- ity and meaning preservation in text snip- ing speed and comprehension of the target popula- pets. As the results show a significant cor- tion. For the purpose of this study we focused only relation between them, we go further and on the former. The quality of the output generated try to classify simplified sentences into: by TS systems is commonly evaluated by using (1) those which are acceptable; (2) those a combination of readability metrics (measuring which need minimal post-editing; and (3) the degree of simplification) and human assess- those which should be discarded.