Learning Multiword Expressions from Corpora and Dictionaries

Learning multiword expressions from corpora and dictionaries Orsolya Vincze Doctoral thesis 2015 Supervisor: Dr. Margarita Alonso Ramos Departamento de Galego-Portugués, Francés e Lingüística Orsolya Vincze Learning multiword expressions from corpora and dictionaries Universidade da Coruña Departamento de Galego-Portugués, Francés e Lingüística Supervisor: Dr. Margarita Alonso Ramos Acknowledgements First and foremost, I would like to express my gratitude to my supervisor, Margarita Alonso Ramos. I am really thankful for her guidance, time and dedication, not only when it comes to research, but also in matters of bureaucracy and life. Clearly, without her perseverance in standing up to and supporting her students, I would have abandoned my studies before I even started. She has taught me that research is done with rigor and requires certain sacrifice, but is, at the same time, exciting and rewarding. I also express my appreciation for the funding received through the FPU grant (AP2010-4334) of which I was recipient during the completion of the thesis. The research presented in this thesis was carried out in the framework of the research project “Herramienta de ayuda a la redacción en español: fundamentos lingüísticos para el procesamiento de colocaciones” supported by the Spanish Ministry of Science and Innovation and the FEDER Funds of the European Commission under the contract number FFI2011-30219-C02-01. This project was undertaken by a research group at the University of A Coruña directed by Margarita Alonso Ramos and a research group at the Pompeu Fabra University under the direction of Leo Wanner. I am greatly indebted to all members of both teams for their valuable contributions at different stages of the project. I am especially thankful to the members of the DiCE group, Estela Mosqueira Suárez, Ana Orol González and Marcos García Salido, who I have worked most closely with, and whose contribution and companionship has been invaluable for the development of this thesis, as has also been Nancy Vázquez Veiga’s enthusiasm and wisdom in all matters of academia. Similarly, I express my gratitude to Cristobal Lozano and Amaya Mendicoexea, who compiled the CEDEL2 corpus, and kindly provided us access to the data. I would also like to thank those who opened up their classrooms and offered help with the collection of data necessary for the experimental studies, especially Begoña Sanromán Vilas at the University of Helsinki, Elena Sánchez Trigo at the University of Vigo, Ángel Francisco Martínez Fernández at the EOI A Coruña and Montserrat Muriano Rodríguez at the University of A Coruña. I am also indebted to Miguel Ramos Naveira, who offered great help on the technological front, especially with the DiCE interface and the usability logs. v I am thankful for the support received from Averil Coxhead, who welcomed me at the School of Language and Applied Linguistics Studies of Victoria University of Wellington. Working in such stimulating environment has contributed immensely to my dissertation. I would also like to thank my former professors back in Hungary, at the University of Szeged, who introduced me to the exciting and multifaceted field of linguistics, among them, especially, Anna Fenyvesi and Tibor Berta. My gratitude also goes to all my friends. I am especially grateful to my PhD “inmates”, both for the valuable discussions and the encouragement in the more difficult times; Iria, Rike, Cath, Luz, María, Luis and, of course, all inhabitants of VZ410, thank you. Thanks also to Tito for pointing out repeatedly that a dissertation “is never finished, only abandoned”. I am immensely grateful to my friends not – as yet – completing dissertations, who, regardless, endured my complaints and provided encouragement. Arturo, Noelia, the two Marías, Fer Vet, Mauro, Fran, Bea, Paula, Yulia, Éva, Judit, and several others, you have always been there for me. Lastly, I would like to thank my family for all their love and support. I am very grateful to my parents for granting me complete freedom in pursuing whatever goals I preferred. Above all, I would like to express my gratitude to Chan for his love, constant support and encouragement during the final stages of this dissertation. vi Abstract The purpose of the present thesis is to examine Spanish as a foreign language (SFL) learners’ needs when it comes to enhancing their collocation competence and use, with a view to designing an online collocation learning tool aimed at learners of Spanish. Accordingly, the research presented here corresponds to the following three aims. Firstly, SFL learners’ collocation use is explored through a learner corpus study carried out using material from the CEDEL2 corpus. SFL learners’ collocation use is compared to that of native speakers, while learner collocation errors are also examined. Secondly, the thesis examines the design and functionalities of existing learning tools that can support collocation learning, such as collocation dictionaries and corpus-based tools. More specifically, it describes a usability experiment focusing on the interface of the Diccionario de colocaciones del español, as well as a study testing SFL learners’ ability to autonomously correct collocation errors with the help of concordance data obtained from corpus. Thirdly, taking into account the findings of these studies, the design of an online collocation learning tool aimed at SFL learners is described. vii Resumen El propósito de la presente tesis es examinar las necesidades de los aprendices de español como lengua extranjera (ELE) en lo que respecta el desarrollo de su competencia y uso colocacional con el objetivo de diseñar una nueva herramienta didáctica dirigida a aprendices de español. Por consiguiente, la investigación que presentamos corresponde a los siguientes tres objetivos principales. En primer lugar, exploramos el uso colocacional de aprendices de ELE mediante un estudio de corpus de aprendices que se ha llevado a cabo utilizando datos del corpus CEDEL2. Comparamos el uso de colocaciones de aprendices al de hablantes nativos del español, y, al mismo tiempo, examinamos los errores colocacionales de aprendices. En segundo lugar, la tesis examina el diseño y las funcionalidades de herramientas didácticas existentes que pueden ser aprovechados en el aprendizaje de colocaciones como son los diccionarios de colocaciones y herramientas basadas en datos de corpus. Más específicamente, presentamos un experimento de usabilidad del Diccionario de colocaciones del español, así como un estudio que examina la destreza de aprendices de ELE en corregir errores colocacionales autónomamente con la ayuda de concordancias obtenidas de corpus. En tercer lugar, teniendo en cuenta los resultados de estos estudios, describimos el diseño de una herramienta en línea centrada en colocaciones y destinada a aprendices de ELE. ix Resumo O propósito da presente tese é examinar as necesidades dos aprendices de español como lingua estranxeira (ELE) no que respecta ao desenvolvemento da súa competencia e uso colocacional, co obxectivo de deseñar unha nova ferramenta didáctica dirixida a aprendices de español. Por conseguinte, a investigación que presentamos corresponde aos seguintes tres obxectivos principais. En primeiro lugar, exploramos o uso colocacional de aprendices de ELE mediante un estudo de corpus de aprendices que se levou a cabo utilizando datos do corpus CEDEL2. Comparamos o uso de colocacións de aprendices ao de falantes nativos de español e, asemade, analizamos os erros colocacionais de aprendices. En segundo lugar, a tese examina o deseño e as funcionalidades de ferramentas didácticas existentes que poden ser aproveitados para a aprendizaxe de colocacións, como son os dicionarios de colocacións e ferramentas baseadas en datos de corpus. Máis especificamente, presentamos un experimento de usabilidade do Diccionario de colocaciones del español, así como un estudo que examina a destreza de aprendices de ELE na corrección de erros colocacionais autonomamente coa axuda de concordancias obtidas de corpus. En terceiro lugar, tendo en conta os resultados destes estudos, describimos o deseño dunha ferramenta en liña centrada en colocacións e destinada a aprendices de ELE. xi Table of contents Acknowledgements ............................................................................................................... v Abstract ............................................................................................................................vii Resumen ............................................................................................................................. ix Resumo ............................................................................................................................. xi Table of contents ............................................................................................................... xiii List of tables and figures .................................................................................................... xix List of abbreviations .......................................................................................................... xxv Chapter 1. Introduction ...................................................................................................... 1 1.1 Motivation and context ................................................................................. 1 1.2 The main objectives of the thesis .................................................................. 2 1.3 Thesis outline

Learning Multiword Expressions from Corpora and Dictionaries

HTTP Cookie - Wikipedia, the Free Encyclopedia 14/05/2014

INSECURE-Mag-9.Pdf

Does Hardware Configuration and Processor Load Impact Software Fault Observability?

Deployment Guide for Cisco Directory Connector

The Kate Handbook

Efficient Navigation on the World Wide Web for the Physically Disabled

Jedit 5.6 User's Guide the Jedit All-Volunteer Developer Team Jedit 5.6 User's Guide the Jedit All-Volunteer Developer Team

Bbedit 10.5.5 User Manual

Onguard® Credentials Module

CS147L Lecture 1 Mike Krieger

Learn the Essential Modes and Editing Features of Emacs Get Going with This Famous Open Source Editor

The Design of the Mathspad Editor