Proceedings of the 52Nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp
Total Page:16
File Type:pdf, Size:1020Kb
Computational and Corpus-based Phraseology: Recent Advances and Interdisciplinary Approaches Proceedings of the Conference Volume II (short papers, posters and student workshop papers) November 13-14, 2017 London, UK ISBN 978-2-9701095-2-5 2017. Editions Tradulex, Geneva c European Association for Phraseology EUROPHRAS c University of Wolverhampton (Research Group in Computational Linguistics) c Association for Computational Linguistics – Bulgaria This document is downloadable from www.tradulex.com and http://rgcl.wlv.ac.uk/europhras2017/ ii Preface As the late and inspiring John Sinclair (1991, 2007) observed, knowledge of vocabulary and grammar is not sufficient for someone to express himselfherself idiomatically or naturally in a specific language. One has to have the knowledge and skill to produce effective and naturally-phrased utterances which are often based on phraseological units (the idiom principle). This contrasts with the traditional assumption or open choice principle which lies at the heart of generative approaches to language. As Pawley and Syder (1983) stated more than three decades ago, the traditional approach cannot account for nativelike selection (idiomaticity) or fluency. Language is indeed phraseological and Phraseology is the discipline which studies phraseological units (PUs) or their related concepts referred to (and regarded largely synonymous) by scholars as multiword units, multiword expressions, fixed expressions, set expressions, phraseological units, formulaic language, phrasemes, idiomatic expressions, idioms, collocations, and/or polylexical expressions. PUs or multiword expressions (MWEs), are ubiquitous and pervasive in language. They are a fundamental linguistic concept which is central to a wide range of Natural Language Processing and Applied Linguistics applications, including, but not limited to, phraseology, terminology, translation, language learning, teaching and assessment, and lexicography. Jackendoff (1977) observes that the number of MWEs in a speaker’s lexicon is of the same order of magnitude as the number of single words (Jackendoff 1977). Biber et al. (1999) argue that they constitute up to 45% of spoken English and up to 21% of academic prose in English. Sag et al. (2002) comment that they are overwhelmingly present in terminology and 41% of the entries in WordNet 1.7 are reported to be MWEs. PUs do not play a crucial role only in the computational treatment of natural languages. Terms are often MWEs (and not single words), which makes them highly relevant to terminology. Translation and interpreting are two other fields where phraseology plays an important role, as finding correct translation equivalents of PUs is a pivotal step in the translation process. Given their pervasive nature, PUs are absolutely central to the work carried out by lexicographers, who analyse and describe both single words and PUs. Last but not least, PUs are vital not only for language learning, teaching and assessment, but also for more theoretical linguistic areas such as pragmatics, cognitive linguistics and construction grammars. All the areas listed above are nowadays aided by (and often driven by) corpora, which makes PUs particularly relevant for corpus linguists. Finally, PUs provide an excellent basis for inter- and multidisciplinary studies, fostering fruitful collaborations between researchers across different disciplines, which are, for the time being, unfortunately still largely unexplored. The e-proceedings feature the short and poster papers presented at the conference "Computational and Corpus-based Phraseology: recent advances and interdisciplinary approaches" (Europhras 2017) as well as the papers from the student seminar which accompanies Europhras 2017. (Regular papers and papers written by the invited speakers are published in a separate Springer LNAI volume). This e-proceedings volume comes with ISBN and DOI numbers assigned to every contribution. The conference, which is organised jointly by the European Association of Phraseology (Europhras), the Research Institute in Information and Language Processing of the University of Wolverhampton, and the Association for Computational Linguistics – Bulgaria, and sponsored by Europhras, the Sketch Engine, The European Language Resources Association (ELRA) and the University of Wolverhampton, provides the perfect opportunity for researchers to present their work, fostering interaction between (and iii joint work by) scholars working in disciplines as diverse as natural language processing, translation, terminology, lexicography, languages learning, teaching and assessment, and cognitive science, to name only a few. In other words, Europhras 2017 provides an excellent basis for interdisciplinary research and for collaboration between researchers across different areas of study related to phraseology, which for the time being is underexplored. The conference programme is thematically organised into different sessions which demonstrate the breath of the topics represented at Europhras 2017 and illustrate the application of phraseology in (and its links to) disciplines such as translation, cross-linguistic studies, lexicography, terminography, language learning, theoretical and descriptive linguistics, natural language processing, computational linguistics, corpus linguistics, cognitive studies, cultural studies, specialised languages, technical writing and academic writing. Every submission to the conference was evaluated by 3 reviewers – i.e. members of the Programme Committee consisting of 46 scholars from 23 different countries, or 12 additional reviewers from 8 countries, who were recommended by the Programme Committee. The conference contributions were authored by a total of 86 scholars from 24 different countries. These figures attest to the truly international dimension of Europhras 2017. I would like to thank all colleagues who made this truly interdisciplinary and international event possible. In the first place, I would like to acknowledge Kathrin Steyer, the President of Europhras, whose initiative was to organise a Europhras conference in London. I would like to thank our delegates, who have travelled from countries all across the globe to attend this conference, thus providing a living acknowledgement of this special event. I am grateful to all members of the Programme Committee and the additional reviewers for carefully examining all submissions and providing substantial feedback on all papers, helping the authors of accepted papers to improve and polish the final versions of their papers. A special thanks goes to the invited speakers – the keynote speakers of the main conference Ken Church, Gloria Corpas, Dmitrij Dobrovol’skij, Patrick Hanks, Miloš Jakubícek,ˇ the invited speakers of the 2 accompanying workshops Carlos Ramish and Jean-Pierre Colson and the tutorial co-speaker Ondrejˇ Matuška. Words of gratitude go to our sponsors – Europhras, the Sketch Engine, ELRA and the University of Wolverhampton including Jan Gilder from the Project Support Office. Many thanks also to João Esteves-Ferreira for publishing the e-proceedings with Tradulex. Last but not least, I would like to use this paragraph to acknowledge the members of the Organising Committee, who worked very hard during the last 12 months and whose dedication and efforts made the organisation of this event possible. I would like to mention (in alphabetical order) the following colleagues whom I would like to highlight for competently carrying out numerous organisational tasks and being ready to step in and support the organisation of the conference whenever needed. My big ‘thank you’ goes out to Amanda Bloore, Martina Cotella, Arianna Fabbri, April Harper, Sara Moze, Nikolai Nikolov, Ivelina Nikolova, Rocío Sánchez González, Andrea Silvestre Baquero, Shiva Taslimipoor and Victoria Yaneva. References Biber, D., Finegan„ E., Johansson, S., Conrad, S. and Leech, G. 1999. Longman Grammar of Spoken and Written English. Longman, Harlow. Jackendoff, R. 2007. Language, consciousness, culture: Essays on mental structure. The MIT Press. Monti, J., Seretan, V., Corpas Pastor, G. and Mitkov R. (forthcoming) "Multiword Units in Machine iv Translation and Translation Technology." In Mitkov, R. Monti, J., Corpas Pastor, G. and Seretan V. (Eds.) Multiword Units in Machine Translation and Translation Technology. John Benjamins. Pawley, A. and Syder, F.H. 1983. "Two puzzles for linguistic theory: nativelike selection and nativelike fluency". In Richards J.C. and Schmidt R.W. (Eds.) Language and communication. London: Longman. Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. 2002. Multiword expressions: A pain in the neck for NLP. In Proceedings of the third international conference on intelligent text processing and computational linguistics (CICLING 2002) (pp. 1-15). Mexico City, Mexico. Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press. Sinclair, J. (2008). Preface. In Granger, S., & Meunier, F. (Eds.), Phraseology. An interdisciplinary perspective. Amsterdam: John Benjamins publishers. Ruslan Mitkov, Conference Chair London 13.11.2017 v Organisers: Europhras 2017 is jointly organised by the European Association for Phraseology EUROPHRAS, the University of Wolverhampton (Research Institute of Information and Language Processing) and the Association for Computational Linguistics – Bulgaria. Conference Chair: Ruslan Mitkov, University of Wolverhampton, UK Programme Committee: Julio Bernal, Caro and Cuervo Institute, Colombia Douglas