Workshopabstracts

Total Page:16

File Type:pdf, Size:1020Kb

Workshopabstracts TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION MAY 23 – 28, 2016 GRAND HOTEL BERNARDIN CONFERENCE CENTRE Portorož , SLOVENIA WORKSHOP ABSTRACTS Editors: Please refer to each single workshop list of editors. Assistant Editors: Sara Goggi, Hélène Mazo The LREC 2016 Proceedings are licensed under a Creative Commons Attribution- NonCommercial 4.0 International License LREC 2016, TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2016 Workshop Abstracts Distributed by: ELRA – European Language Resources Association 9, rue des Cordelières 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] ISBN 978-2-9517408-9-1 EAN 9782951740891 ii TABLE OF CONTENTS W12 Joint 2nd Workshop on Language & Ontology (LangOnto2) & Terminology & Knowledge Structures (TermiKS) …………………………………………………………………………….. 1 W5 Cross -Platform Text Mining & Natural Language Processing Interoperability ……. 12 W39 9th Workshop on Building & Using Comparable Corpora (BUCC 2016)…….…….. 23 W19 Collaboration & Computing for Under-Resourced Languages (CCURL 2016).….. 31 W31 Improving Social Inclusion using NLP: Tools & Resources ……………………………….. 43 W9 Resources & ProcessIng of linguistic & extra-linguistic Data from people with various forms of cognitive / psychiatric impairments (RaPID)………………………………….. 49 W25 Emotion & Sentiment Analysis (ESA 2016)…………..………………………………………….. 58 W28 2nd Workshop on Visualization as added value in the development, use & evaluation of LRs (VisLR II)……….……………………………………………………………………………… 68 W15 Text Analytics for Cybersecurity & Online Safety (TA-COS)……………………………… 75 W13 5th Workshop on Linked Data in Linguistics (LDL-2016)………………………………….. 83 W10 Lexicographic Resources for Human Language Technology (GLOBALEX 2016)… 93 W18 Translation Evaluation: From Fragmented Tools & Data Sets to an Integrated Ecosystem?................................................................................................................... 106 W16 2nd Workshop on Arabic Corpora & Processing Tools (OSACT2)........................ 116 W38 3rd Workshop on Indian Language Data Resource & Evaluation (WILDRE-3)….. 123 W40 ETHics In Corpus collection, Annotation and Application (ETHI-CA² 2016)………. 136 W1 Sideways 2016 ………………………………………………………………………………………………….. 148 W7 Emotions, Metaphors, Ontology & Terminology during disaster (EMOT)….………. 155 W35 Multimodal Corpora 2016 (MMC 2016)..............…………………………………………….. 163 W8 7th Workshop on the Representation & Processing of Sign Languages…………….. 172 W4 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-12)… 186 W34 Natural Language Processing for Translation Memories 2016…………………………. 194 W41 Controlled Language Applications…………………………………………………………………… 201 W23 Workshop on Research Results Reproducibility, and Resources Citation in Science and Technology of Language (4REAL)…………………………………………………..…….. 207 W43 Novel Incentives for Collecting Data & Annotation from People……………………… 214 W36 Normalisation & Analysis of Social Media Texts (NormSoMe) ……….……………….. 221 W6 4th Workshop on Challenges in the Management of Large Corpora (CMLC-4)….. 228 W27 Just talking Casual Talk among Humans and Machines …………………………………… 235 W30 Workshop on Collecting &Generating Resources for Chatbots & Conversational Agents Development & Evaluation (RE-WOCHAT 2016)……..…………… 242 W11 Quality Assessment for Text Simplification (QATS)......……………………………………. 249 iii iv Language and Ontology (LangOnto2) & Terminology and Knowledge Structures (TermiKS) 23 May 2016 ABSTRACTS Editors: Larisa Grčić Simeunović, Špela Vintar, Fahad Khan, Pilar León Araúz, Pamela Faber, Francesca Fontini, Artemis Parvisi, Christina Unger 1 Workshop Programme Session 1 09:00 – 09:30 – Introductory Talk Pamela Faber, The Cultural Dimension of Knowledge Structures 09:30 – 10:10 – Introductory Talk John McCrae, Putting ontologies to work in NLP: The lemon model and its applications 10:10 – 10:30 Gregory Grefenstette, Karima Rafes, Transforming Wikipedia into an Ontology-based Information Retrieval Search Engine for Local Experts using a Third-Party Taxonomy 10:30 – 11:00 Coffee break Session 2 11:00 – 11:30 Juan Carlos Gil-Berrozpe, Pamela Faber, Refining Hyponymy in a Terminological Knowledge Base 11:30 – 12:00 Pilar León-Araúz, Arianne Reimerink, Evaluation of EcoLexicon Images 12:00 – 12:30 Špela Vintar, Larisa Grčić Simeunović, The Language-Dependence of Conceptual Structures in Karstology 12:30 – 13:00 Takuma Asaishi, Kyo Kageura, Growth of the Terminological Networks in Junior-high and High School Textbooks 13:30 – 14:00 Lunch break Session 3 14:00 – 14:20 Gabor Melli, Semantically Annotated Concepts in KDD’s 2009-2015 Abstracts 14:20 – 14:40 Bhaskar Sinha, Somnath Chandra, Neural Network Based Approach for Relational Classification for Ontology Development in Low Resourced Indian Language 14:40 – 15:00 Livy Real, Valeria de Paiva, Fabricio Chalub, Alexandre Rademaker, Gentle with the Gentilics 15:00 – 15:30 Dante Degl’Innocenti, Dario De Nart and Carlo Tasso, The Importance of Being Referenced: Introducing Referential Semantic Spaces 15:30 – 16:00 Haizhou Qu, Marcelo Sardelich, Nunung Nurul Qomariyah and Dimitar Kazakov, Integrating Time Series with Social Media Data in an Ontology for the Modelling of Extreme Financial Events 2 16:00 – 16:30 Coffee break Session 4 16:30 – 16:50 Christian Willms, Hans-Ulrich Krieger, Bernd Kiefer, ×-Protégé An Ontology Editor for Defining Cartesian Types to Represent n-ary Relations 16:50 – 17:10 Alexsandro Fonseca, Fatiha Sadat and François Lareau, A Lexical Ontology to Represent Lexical Functions 17:10 – 17:40 Ayla Rigouts Terryn, Lieve Macken and Els Lefever, Dutch Hypernym Detection: Does Decompounding Help? 17:40 – 18:30 Discussion and Closing 3 Workshop Organizers Istituto di Linguistica Computazionale “A. Fahad Khan Zampolli” - CNR, Italy Špela Vintar University of Ljubljana, Slovenia Pilar León Araúz University of Granada, Spain Pamela Faber University of Granada, Spain Istituto di Linguistica Computazionale “A. Francesca Frontini Zampolli” - CNR, Italy Artemis Parvizi Oxford University Press Larisa Grčić Simeunović University of Zadar, Croatia Christina Unger University of Bielefeld, Germany Workshop Programme Committee Guadalupe Aguado-de-Cea Universidad Politécnica de Madrid, Spain Amparo Alcina Universitat Jaume I, Spain Nathalie Aussenac-Gilles IRIT, France Caroline Barrière CRIM, Canada Institute of Croatian Language and Linguistics, Maja Bratanić Croatia Paul Buitelaar Insight Centre for Data Analytics, Ireland Federico Cerutti Cardiff University Béatrice Daille University of Nantes, France Aldo Gangemi LIPN University, ISTC-CNR Rome Eric Gaussier University of Grenoble, France Emiliano Giovannetti ILC-CNR Ulrich Heid University of Hildesheim, Germany Caroline Jay University of Manchester Kyo Kageura University of Tokio, Japan Hans-Ulrich Krieger DFKI GmbH Roman Kutlak Oxford University Press Marie-Claude L'Homme OLST, Université de Montréal, Canada Monica Monachini ILC-CNR Mojca Pecman University of Paris Diderot, France Silvia Piccini ILC-CNR Yuan Ren Microsoft China Fabio Rinaldi Universität Zürich, Switzerland Irena Spasic University of Cardiff Markel Vigo University of Manchester 4 Introductory talk I Monday 23 May, 9:00 – 9:30 The Cultural Dimension of Knowledge Structures Pamela Faber Frame-Based Terminology (FBT) is a cognitive approach to terminology, which directly links specialized knowledge representation to cognitive linguistics and cognitive semantics (Faber 2011, 2012, 2014). More specifically, the FBT approach applies the notion of frame as a schematization of a knowledge structure, which is represented at the conceptual level and held in long-term memory and which emphasizes both hierarchical and non-hierarchical conceptual relations. Frames also link elements and entities associated with general human experience or with a particular culturally embedded scene or situation. Culture is thus a key element in knowledge structures. Cultural frames are directly connected to what has been called ‘design principle’ (O’Meara and Bohnemeyer 2008), ‘template’, ‘model’, ‘schema’ or ‘frame ‘(Brown 2008; Burenhult 2008, Cablitz 2008, Levinson 2008). In EcoLexicon, a frame is a representation that integrates various ways of combining semantic generalizations about one category or a group of categories. In contrast, a template is a representational pattern for individual members of the same category. Burenhult and Levinson (2008: 144) even propose the term, semplate, which refers to the cultural themes or linguistic patterns that are imposed on the environment to create, coordinate, subcategorize, or contrast categories. Although rarely explored, cultural situatedness has an impact on semantic networks, which reflect differences between terms used in closely related language cultures. Nevertheless, the addition of a cultural component to term meaning is considerably more complicated than the inclusion of terms that designate new concepts specific to other cultures. One reason for this is that certain conceptual categories are linked, for example, to the habitat of the speakers of a language and derive their meaning from the geographic and meteorological characteristics of a given geographic area or region. This paper explains the need for typology of cultural frames or profiles linked to the most prominent semantic categories. As an example, the terms for different types of local wind are analyzed
Recommended publications
  • Measuring Text Simplification with the Crowd
    Measuring Text Simplification with the Crowd Walter S. Lasecki Luz Rello Jeffrey P. Bigham ROC HCI HCI Institute HCI and LT Institutes Dept. of Computer Science School of Computer Science School of Computer Science University of Rochester Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] [email protected] ABSTRACT the U.S. [41]1 and the European Guidelines for the Production of Text can often be complex and difficult to read, especially for peo­ Easy-to-Read Information in Europe [23]. ple with cognitive impairments or low literacy skills. Text simplifi­ Text simplification is an area of Natural Language Processing cation is a process that reduces the complexity of both wording and (NLP) that attempts to solve this problem automatically by reduc­ structure in a sentence, while retaining its meaning. However, this ing the complexity of both wording (lexicon) and structure (syn­ is currently a challenging task for machines, and thus, providing tax) in a sentence [46]. Previous efforts on automatic text simpli­ effective on-demand text simplification to those who need it re­ fication have focused on people with autism [20, 39], people with mains an unsolved problem. Even evaluating the simplicity of text Down syndrome [48], people with dyslexia [42, 43], and people remains a challenging problem for both computers, which cannot with aphasia [11, 18]. understand the meaning of text, and humans, who often struggle to However, automatic text simplification is in an early stage and agree on what constitutes a good simplification. still is not useful for the target users [20, 43, 48].
    [Show full text]
  • Gnu Smalltalk Library Reference Version 3.2.5 24 November 2017
    gnu Smalltalk Library Reference Version 3.2.5 24 November 2017 by Paolo Bonzini Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". 1 3 1 Base classes 1.1 Tree Classes documented in this manual are boldfaced. Autoload Object Behavior ClassDescription Class Metaclass BlockClosure Boolean False True CObject CAggregate CArray CPtr CString CCallable CCallbackDescriptor CFunctionDescriptor CCompound CStruct CUnion CScalar CChar CDouble CFloat CInt CLong CLongDouble CLongLong CShort CSmalltalk CUChar CByte CBoolean CUInt CULong CULongLong CUShort ContextPart 4 GNU Smalltalk Library Reference BlockContext MethodContext Continuation CType CPtrCType CArrayCType CScalarCType CStringCType Delay Directory DLD DumperProxy AlternativeObjectProxy NullProxy VersionableObjectProxy PluggableProxy SingletonProxy DynamicVariable Exception Error ArithmeticError ZeroDivide MessageNotUnderstood SystemExceptions.InvalidValue SystemExceptions.EmptyCollection SystemExceptions.InvalidArgument SystemExceptions.AlreadyDefined SystemExceptions.ArgumentOutOfRange SystemExceptions.IndexOutOfRange SystemExceptions.InvalidSize SystemExceptions.NotFound SystemExceptions.PackageNotAvailable SystemExceptions.InvalidProcessState SystemExceptions.InvalidState
    [Show full text]
  • GNU MP the GNU Multiple Precision Arithmetic Library Edition 6.2.1 14 November 2020
    GNU MP The GNU Multiple Precision Arithmetic Library Edition 6.2.1 14 November 2020 by Torbj¨ornGranlund and the GMP development team This manual describes how to install and use the GNU multiple precision arithmetic library, version 6.2.1. Copyright 1991, 1993-2016, 2018-2020 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being \A GNU Manual", and with the Back-Cover Texts being \You have freedom to copy and modify this GNU Manual, like GNU software". A copy of the license is included in Appendix C [GNU Free Documentation License], page 132. i Table of Contents GNU MP Copying Conditions :::::::::::::::::::::::::::::::::::: 1 1 Introduction to GNU MP ::::::::::::::::::::::::::::::::::::: 2 1.1 How to use this Manual :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 2 2 Installing GMP ::::::::::::::::::::::::::::::::::::::::::::::::: 3 2.1 Build Options:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 3 2.2 ABI and ISA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 8 2.3 Notes for Package Builds:::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 11 2.4 Notes for Particular Systems :::::::::::::::::::::::::::::::::::::::::::::::::::::: 12 2.5 Known Build Problems ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 14 2.6 Performance
    [Show full text]
  • Proceedings of the 1St Workshop on Tools and Resources to Empower
    LREC 2020 Workshop Language Resources and Evaluation Conference 11–16 May 2020 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) PROCEEDINGS Edited by Nuria´ Gala and Rodrigo Wilkens Proceedings of the LREC 2020 first workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) Edited by: Núria Gala and Rodrigo Wilkens ISBN: 979-10-95546-44-3 EAN: 9791095546443 For more information: European Language Resources Association (ELRA) 9 rue des Cordelières 75013, Paris France http://www.elra.info Email: [email protected] c European Language Resources Association (ELRA) These workshop proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License ii Preface Recent studies show that the number of children and adults facing difficulties in reading and understanding written texts is steadily growing. Reading challenges can show up early on and may include reading accuracy, speed, or comprehension to the extent that the impairment interferes with academic achievement or activities of daily life. Various technologies (text customization, text simplification, text to speech devices, screening for readers through games and web applications, to name a few) have been developed to help poor readers to get better access to information as well as to support reading development. Among those technologies, text simplification is a powerful way to leverage document accessibility by using NLP techniques. The “First Workshop on Tools and Resources to Empower People with REAding DIfficulties” (READI), collocated with the “International Conference on Language Resources and Evaluation” (LREC 2020), aims at presenting current state-of-the-art techniques and achievements for text simplification together with existing reading aids and resources for lifelong learning, addressing a variety of domains and languages, including natural language processing, linguistics, psycholinguistics, psychophysics of vision and education.
    [Show full text]
  • Dynamic Object-Oriented Programming with Smalltalk
    Dynamic Object-Oriented Programming with Smalltalk 1. Introduction Prof. O. Nierstrasz Autumn Semester 2009 LECTURE TITLE What is surprising about Smalltalk > Everything is an object > Everything happens by sending messages > All the source code is there all the time > You can't lose code > You can change everything > You can change things without restarting the system > The Debugger is your Friend © Oscar Nierstrasz 2 ST — Introduction Why Smalltalk? > Pure object-oriented language and environment — “Everything is an object” > Origin of many innovations in OO development — RDD, IDE, MVC, XUnit … > Improves on many of its successors — Fully interactive and dynamic © Oscar Nierstrasz 1.3 ST — Introduction What is Smalltalk? > Pure OO language — Single inheritance — Dynamically typed > Language and environment — Guiding principle: “Everything is an Object” — Class browser, debugger, inspector, … — Mature class library and tools > Virtual machine — Objects exist in a persistent image [+ changes] — Incremental compilation © Oscar Nierstrasz 1.4 ST — Introduction Smalltalk vs. C++ vs. Java Smalltalk C++ Java Object model Pure Hybrid Hybrid Garbage collection Automatic Manual Automatic Inheritance Single Multiple Single Types Dynamic Static Static Reflection Fully reflective Introspection Introspection Semaphores, Some libraries Monitors Concurrency Monitors Categories, Namespaces Packages Modules namespaces © Oscar Nierstrasz 1.5 ST — Introduction Smalltalk: a State of Mind > Small and uniform language — Syntax fits on one sheet of paper >
    [Show full text]
  • Data-Driven Text Simplification
    Data-Driven Text Simplification Sanja Štajner and Horacio Saggion [email protected] http://web.informatik.uni-mannheim.de/sstajner/index.html University of Mannheim, Germany [email protected] #TextSimplification2018 https://www.upf.edu/web/horacio-saggion University Pompeu Fabra, Spain COLING 2018 Tutorial Presenters • Sanja Stajner • Horacio Saggion • http://web.informatik.uni- • http://www.dtic.upf.edu/~hsaggion mannheim.de/sstajner • https://www.linkedin.com/pub/horacio- • https://www.linkedin.com/in/sanja- saggion/16/9b9/174 stajner-a6904738 • https://twitter.com/h_saggion • Data and Web Science Group • Large Scale Text Understanding • https://dws.informatik.uni - Systems Lab / TALN group mannheim.de/en/home/ • http://www.taln.upf.edu • University of Mannheim, Germany • Department of Information & Communication Technologies • Universitat Pompeu Fabra, Barcelona, Spain © 2018 by S. Štajner & H. Saggion 2 Tutorial antecedents • Previous tutorials on the topic given at: • IJCNLP 2013 and RANLP 2015 (H. Saggion) • RANLP 2017 (S. Štajner) • Automatic Text Simplification. H. Saggion. 2017. Morgan & Claypool. Synthesis Lectures on Human Language Technologies Series. https://www.morganclaypool.com/doi/abs/10.2200/S00700ED1V01Y201602 HLT032 © 2018 by S. Štajner & H. Saggion 3 Outline • Motivation for ATS • Automatic text simplification • TS projects • TS resources • Neural text simplification © 2018 by S. Štajner & H. Saggion 4 PART 1 Motivation for Text Simplification © 2018 by S. Štajner & H. Saggion 5 Text Simplification (TS) The process of transforming a text into an equivalent which is more readable and/or understandable by a target audience • During simplification, complex sentences are split into simple ones and uncommon vocabulary is replaced by more common expressions • TS is a complex task which encompasses a number of operations applied at different linguistic levels: • Lexical • Syntactic • Discourse • Started to attract the attention of natural language processing some years ago (1996) mainly as a pre-processing step © 2018 by S.
    [Show full text]
  • Classifier-Based Text Simplification for Improved Machine Translation
    Classifier-Based Text Simplification for Improved Machine Translation Shruti Tyagi Deepti Chopra Iti Mathur Nisheeth Joshi Department of Computer Department of Computer Department of Computer Department of Computer Science Science Science Science Banasthali University Banasthali University Banasthali University Banasthali University Rajasthan, India Rajasthan, India Rajasthan, India Rajasthan, India [email protected] [email protected] [email protected] [email protected] Abstract— Machine Translation is one of the research fields E. Text Summarization: Splitting of sentence or conversion of of Computational Linguistics. The objective of many MT long sentence into smaller ones helps in text summarization. Researchers is to develop an MT System that produce good quality and high accuracy output translations and which also There are also different approaches through which we can covers maximum language pairs. As internet and Globalization is apply text simplification. These are: increasing day by day, we need a way that improves the quality of translation. For this reason, we have developed a Classifier based Text Simplification Model for English-Hindi Machine A. Lexical Simplification: In this approach, identification of Translation Systems. We have used support vector machines and complex words takes place and generation of Naïve Bayes Classifier to develop this model. We have also evaluated the performance of these classifiers. substitutes/synonyms takes place accordingly. For example, in the example below, we have highlighted the complex words Keywords— Machine Translation, Text Simplification, Naïve and their respective substitutes. Bayes Classifier, Support Vector Machine Classifier Original Sentence: Audible word was originally named audire in Latin. I. INTRODUCTION Simplified Sentence: Audible word was first called audire in Text Simplification is the process that reduces the Latin.
    [Show full text]
  • Ontology-Based Chatbot for Disaster Management: Use Case Coronavirus
    TUNISIAN REPUBLIC MINISTRY OF HIGHER EDUCATION AND SCIENTIFIC RESEARCH TUNIS EL MANAR UNIVERSITY FACULTY OF SCIENCES OF TUNIS Master Thesis Submitted for a Research Master’s Degree in Computer Science By Khouloud Hwerbi Ontology-Based Chatbot for Disaster Management: Use Case CoronaVirus Defended on October 9, 2020 in front of the following jury: President Hella Kaffel Ben Ayed Associate Professor at FST Reporter Narjes Doggaz Assistant Professor at FST arXiv:2011.02340v1 [cs.AI] 2 Nov 2020 Co-Supervisor Salvatore Flavio Pileggi Lecturer at UTS Supervisor Sadok Ben Yahia Full Professor at FST In the Laboratory: LIPAH 2019–2020 Abstract Today is the era of intelligence in machines. With the advances in Artificial Intelligence, machines have started to impersonate different human traits, a chatbot is the next big thing in the domain of conversational services. A chatbot is a virtual person who is capable to carry out a natural conversation with people. They can include skills that enable them to converse with the humans in audio, visual, or textual formats. Artificial intelligence conversational entities, also called chatbots, conversational agents, or dialogue system, are an excellent example of such machines. Obtaining the right information at the right time and place is the key to effective disaster management. The term "disaster management" encompasses both natural and human-caused disasters. To assist citizens, our project is to create a COVID Assistant to provide the need of up to date information to be available 24 hours. With the growth in the World Wide Web, it is quite intelligible that users are interested in the swift and relatedly correct information for their hunt.
    [Show full text]
  • Natural Language Processing Tools for Reading Level Assessment and Text Simplification for Bilingual Education
    Natural Language Processing Tools for Reading Level Assessment and Text Simplification for Bilingual Education Sarah E. Petersen A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2007 Program Authorized to Offer Degree: Computer Science & Engineering University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Sarah E. Petersen and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: Mari Ostendorf Reading Committee: Mari Ostendorf Oren Etzioni Henry Kautz Date: In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of this dissertation may be referred to Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346, 1-800-521-0600, to whom the author has granted “the right to reproduce and sell (a) copies of the manuscript in microform and/or (b) printed copies of the manuscript made from microform.” Signature Date University of Washington Abstract Natural Language Processing Tools for Reading Level Assessment and Text Simplification for Bilingual Education Sarah E. Petersen Chair of the Supervisory Committee: Professor Mari Ostendorf Electrical Engineering Reading proficiency is a fundamental component of language competency.
    [Show full text]
  • E-Content Submission to INFLIBNET Subject Name Linguistics Paper Name Corpus Linguistics Paper Coordinator Name & Contact Mo
    e-Content Submission to INFLIBNET Subject Name Linguistics Paper Name Corpus Linguistics Paper Coordinator Name & Contact Module name #0 5: Corpus Type: Genre of Text Module ID Content Writer (CW) Name Email id Phone Pre-requisites Objectives Keywords E-Text Self Learn Self Assessment Learn More Story Board √ √ √ √ √ Contents 5.0 Introduction 5.1 Why Classify Corpora ? 5.2 Genre of Text 5.2.1 Text Corpus 5.2.2 Speech Corpus 5.2.3 Spoken Corpus 5.3 Conclusion Assessment & Evaluation Resources & Lin 5.0 INTRODUCTION For last fifty years or so, corpus linguistics is attested as one of the mainstays of linguistics for various reasons. At various points of time scholars have discussed about the methods of generating corpora, techniques of processing them, and using information from corpora in linguistic works – starting from mainstream linguistics to applied linguistics and language technology. However, in general, these discussions often ignore an important aspect relating to classification of corpora, although scholars sporadically attempt to discuss about the form, formation, and function of corpora of various types. People avoid this issue, because it is a difficult scheme to classify corpora by way of a single frame or type. Any scheme that attempts to put various corpora within a single frame is destined to turn out to be unscientific and non-reliable. Digital corpora are designed to be used in various linguistic works. Sometimes, these are used for general linguistics research and application, some other times these are utilised for works of language technology and computational linguistics. The general assumption is that a corpus developed for certain types of work is not much useful for works of other types.
    [Show full text]
  • Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb
    pipenightdreams osgcal-doc mumudvb mpg123-alsa tbb-examples libgammu4-dbg gcc-4.1-doc snort-rules-default davical cutmp3 libevolution5.0-cil aspell-am python-gobject-doc openoffice.org-l10n-mn libc6-xen xserver-xorg trophy-data t38modem pioneers-console libnb-platform10-java libgtkglext1-ruby libboost-wave1.39-dev drgenius bfbtester libchromexvmcpro1 isdnutils-xtools ubuntuone-client openoffice.org2-math openoffice.org-l10n-lt lsb-cxx-ia32 kdeartwork-emoticons-kde4 wmpuzzle trafshow python-plplot lx-gdb link-monitor-applet libscm-dev liblog-agent-logger-perl libccrtp-doc libclass-throwable-perl kde-i18n-csb jack-jconv hamradio-menus coinor-libvol-doc msx-emulator bitbake nabi language-pack-gnome-zh libpaperg popularity-contest xracer-tools xfont-nexus opendrim-lmp-baseserver libvorbisfile-ruby liblinebreak-doc libgfcui-2.0-0c2a-dbg libblacs-mpi-dev dict-freedict-spa-eng blender-ogrexml aspell-da x11-apps openoffice.org-l10n-lv openoffice.org-l10n-nl pnmtopng libodbcinstq1 libhsqldb-java-doc libmono-addins-gui0.2-cil sg3-utils linux-backports-modules-alsa-2.6.31-19-generic yorick-yeti-gsl python-pymssql plasma-widget-cpuload mcpp gpsim-lcd cl-csv libhtml-clean-perl asterisk-dbg apt-dater-dbg libgnome-mag1-dev language-pack-gnome-yo python-crypto svn-autoreleasedeb sugar-terminal-activity mii-diag maria-doc libplexus-component-api-java-doc libhugs-hgl-bundled libchipcard-libgwenhywfar47-plugins libghc6-random-dev freefem3d ezmlm cakephp-scripts aspell-ar ara-byte not+sparc openoffice.org-l10n-nn linux-backports-modules-karmic-generic-pae
    [Show full text]
  • Experimenting Sentence Split-And-Rephrase Using Part-Of-Speech Labels
    Experimenting Sentence Split-and-Rephrase Using Part-of-Speech Labels P. Berlanga Neto and E. Y. Okano and E. E. S. Ruiz Departamento de Computação e Matemática, FFCLRP Universidade de São Paulo (USP). Av. Bandeirantes, 3900, Monte Alegre. 14040-901, Ribeirão Preto, SP – Brazil [pauloberlanga, okano700, evandro]@usp.br Abstract. Text simplification (TS) is a natural language transformation process that reduces linguistic complexity while preserving semantics and retaining its original meaning. This work aims to present a research proposal for automatic simplification of texts, precisely a split-and-rephrase approach based on an encoder-decoder neural network model. The proposed method was trained against the WikiSplit English corpus with the help of a part-of-speech tagger and obtained a BLEU score validation of 74.72%. We also experimented with this trained model to split-and-rephrase sentences written in Portuguese with relative success, showing the method’s potential. CCS Concepts: • Computing methodologies → Natural language processing. Keywords: natural language processing, neural networks, sentence simplification 1. INTRODUCTION Text Simplification (TS) is the process of modifying natural language to reduce complexity and im- prove both readability and understandability [Shardlow 2014]. A simplified vocabulary or a simplified text structure can benefit different publics, such as people with low education levels, children, non- native individuals, people who have learning disorders (such as autism, dyslexia, and aphasia), among others [Štajner et al. 2015]. Although the simplicity of a text seems intuitively obvious, it does not have a precise definition in technical terms. Traditional assessment metrics consider factors such as sentence length, syllable count, and other linguistic features of the text to identify it as elaborate or not [Shardlow 2014].
    [Show full text]