Fourth Workshop on Statistical Parsing of Morphologically Rich Languages
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												
												Student Research Workshop Associated with RANLP 2011, Pages 1–8, Hissar, Bulgaria, 13 September 2011
RANLPStud 2011 Proceedings of the Student Research Workshop associated with The 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011) 13 September, 2011 Hissar, Bulgaria STUDENT RESEARCH WORKSHOP ASSOCIATED WITH THE INTERNATIONAL CONFERENCE RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING’2011 PROCEEDINGS Hissar, Bulgaria 13 September 2011 ISBN 978-954-452-016-8 Designed and Printed by INCOMA Ltd. Shoumen, BULGARIA ii Preface The Recent Advances in Natural Language Processing (RANLP) conference, already in its eight year and ranked among the most influential NLP conferences, has always been a meeting venue for scientists coming from all over the world. Since 2009, we decided to give arena to the younger and less experienced members of the NLP community to share their results with an international audience. For this reason, further to the first successful and highly competitive Student Research Workshop associated with the conference RANLP 2009, we are pleased to announce the second edition of the workshop which is held during the main RANLP 2011 conference days on 13 September 2011. The aim of the workshop is to provide an excellent opportunity for students at all levels (Bachelor, Master, and Ph.D.) to present their work in progress or completed projects to an international research audience and receive feedback from senior researchers. We have received 31 high quality submissions, among which 6 papers have been accepted as regular oral papers, and 18 as posters. Each submission has been reviewed by - 
												
												Conference Abstracts
EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Held under the Patronage of Ms Neelie Kroes, Vice-President of the European Commission, Digital Agenda Commissioner MAY 23-24-25, 2012 ISTANBUL LÜTFI KIRDAR CONVENTION & EXHIBITION CENTRE ISTANBUL, TURKEY CONFERENCE ABSTRACTS Editors: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis. Assistant Editors: Hélène Mazo, Sara Goggi, Olivier Hamon © ELRA – European Language Resources Association. All rights reserved. LREC 2012, EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2012 Conference Abstracts Distributed by: ELRA – European Language Resources Association 55-57, rue Brillat Savarin 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] Copyright by the European Language Resources Association ISBN 978-2-9517408-7-7 EAN 9782951740877 All rights reserved. No part of this book may be reproduced in any form without the prior permission of the European Language Resources Association ii Introduction of the Conference Chair Nicoletta Calzolari I wish first to express to Ms Neelie Kroes, Vice-President of the European Commission, Digital agenda Commissioner, the gratitude of the Program Committee and of all LREC participants for her Distinguished Patronage of LREC 2012. Even if every time I feel we have reached the top, this 8th LREC is continuing the tradition of breaking previous records: this edition we received 1013 submissions and have accepted 697 papers, after reviewing by the impressive number of 715 colleagues. - 
												
												ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica Upsaliensia 16
ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica Upsaliensia 16 Morphosyntactic Corpora and Tools for Persian Mojgan Seraji Dissertation presented at Uppsala University to be publicly examined in Universitetshuset / IX, Uppsala, Wednesday, 27 May 2015 at 10:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Professor of Computational Linguistics Jan Hajic (Charles University in Prague). Abstract Seraji, M. 2015. Morphosyntactic Corpora and Tools for Persian. Studia Linguistica Upsaliensia 16. 191 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9229-8. This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian. In developing these resources and tools, two key requirements are observed: compatibility and reuse. The compatibility requirement encompasses two parts. First, the tools in the pipeline should be compatible with each other in such a way that the output of one tool is compatible with the input requirements of the next. Second, the tools should be compatible with the annotated corpora and deliver the same analysis that is found in these. The reuse requirement means that all the components in the pipeline are developed by reusing resources, standard methods, and open source state-of-the-art tools. This is necessary to make the project feasible. Given these requirements, the thesis investigates two main research questions. - 
												
												Metafoor in Die Vertaalde Mediadiskoers Oor Aandele En Markte in Finweek
METAFOOR IN DIE VERTAALDE MEDIADISKOERS OOR AANDELE EN MARKTE IN FINWEEK deur Erica du Preez Proefskrif voorgelê ter vervulling van die graad Philosophiae Doctor in Taalpraktyk (Ph.D. Taalpraktyk) in DIE DEPARTEMENT AFRO-ASIATIESE STUDIE, GEBARETAAL EN TAALPRAKTYK FAKULTEIT GEESTESWETENSKAPPE UNIVERSITEIT VAN DIE VRYSTAAT BLOEMFONTEIN NOVEMBER 2009 Promotor: Prof. J.A. Naudé i VERKLARING Ek, die ondergetekende, verklaar dat die proefskrif wat hierby vir die kwalifikasie Ph.D. (Taalpraktyk) aan die Universiteit van die Vrystaat deur my ingedien word, my selfstandige werk is en nie voorheen deur my vir ʼn graad aan ʼn ander universiteit/fakulteit ingedien is nie. Handtekening: .................................... E S du Preez Datum: 27 November 2009 Ek dra hiermee die kopiereg van hierdie produk oor aan die Universiteit van die Vrystaat. Handtekening: .................................... E S du Preez Datum: 27 November 2009 ii ERKENNINGS Alles wat ek vermag, is ’n gawe van Bo. Alle eer aan God alleen! Hiermee wil ek dank en erkenning gee aan die volgende persone waarsonder ek nie die eindpaal sou kon haal nie: • Prof. J.A. Naudé, my studieleier, vir u geduld, aanmoediging en deeglike professionele akademiese leiding tydens my studie en veral met die skryf en finalisering van hierdie proefskrif. • Aan my man James en my allerliefste kinders Liesl en Eric, Inge en Ian, en Werner: baie dankie vir jul deernis en liefdevolle ondersteuning. • Opregte dank aan my vriendin en kollega Igna du Plooy, vir jou hulp met die soektogte. • Spesiaal dankie aan prof. Tienie Crous, dekaan van die Fakulteit Ekonomiese en Bestuurswetenskappe, Universiteit van die Vrystaat, vir u begrip en die geleentheid om hierdie studie binne die verpligtinge van my werk as fakulteitsbestuurder te kon aanpak. - 
												
												Proceedings of the 9Th Workshop on Multiword Expressions (MWE 2013), Pages 1–10, Atlanta, Georgia, 13-14 June 2013
NAACL HLT 2013 9th Workshop on Multiword Expressions MWE 2013 Proceedings of the Workhop 13-14 June 2013 Atlanta, Georgia, USA c 2013 The Association for Computational Linguistics 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-937284-47-3 ii Introduction The 9th Workshop on Multiword Expressions (MWE 2013)1 took place on June 13 and 14, 2013 in Atlanta, Georgia, USA in conjunction with the 2013 Conference of the North American Chapter of the (NAACL HLT 2013), and was endorsed by the Special Interest Group on the Lexicon of the Association for Computational Linguistics (SIGLEX)2. The workshop has been held almost every year since 2003 in conjunction with ACL, EACL, NAACL, COLING and LREC. It provides an important venue for interaction, sharing of resources and tools and collaboration efforts for advancing the computational treatment of Multiword Expressions (MWEs), attracting the attention of an ever-growing community working on a variety of languages and MWE types. MWEs include idioms (storm in a teacup, sweep under the rug), fixed phrases (in vitro, by and large, rock’n roll), noun compounds (olive oil, laser printer), compound verbs (take a nap, bring about), among others. These, while easily mastered by native speakers, are a key issue and a current weakness for natural language parsing and generation, as well as real-life applications depending on some degree of semantic interpretation, such as machine translation, just to name a prominent one among many. However, thanks to the joint efforts of researchers from several fields working on MWEs, significant progress has been made in recent years, especially concerning the construction of large-scale language resources. - 
												
												Open Linked Data E Nuovi Approcci Al Trattamento Del Significato Linguistico
TAL 2014 Evoluzione delle interfacce di dialogo e Open Data Open Linked Data e nuovi approcci al trattamento del significato linguistico Guido Vetere IBM Italia - Centro Studi Avanzati K-Drive Knowledge Driven Data Exploitation © 2009 IBM Corporation La dimensione del significato 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Visioni della semantica lessicale concettuale Geeraerts D. (2010), Theories of Lexical Semantics, Oxford University Press 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Visioni della semantica lessicale testuale Geeraerts D. (2010), Theories of Lexical Semantics, Oxford University Press 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Dati linguistici aperti lessici tesauri ontologie corpora annotati 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Dati ed evidenze linguistiche in Watson conoscenze di sfondo evidenze testuali 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Inferenze linguistiche guidate dai dati Q: 1506 What's the name of King Arthur's sword? …..where Arthur is believed to have thrown his sword, Excalibur, to be received by the Lady of the Lake;..... Kateryna Tymoshenko, Alessandro Moschitti and Aliaksei Severyn. "Encoding Semantic Resources in Syntactic Structures for Passage Reranking", accepted to the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2014), Gothenburg, Sweden, 2014. 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Dati linguistici: l'importanza di tenerli aperti ©2011-2014 Winsord 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Dati linguistici dell'italiano 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Senso Comune 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Problemi (sempre) aperti 19 75 20 gennaio 2014 Guido Vetere, IBM - TAL 2014 © 2014 IBM Corporation Semantica, vaghezza e ontologia persona soggetti situazioni Achille C. - 
												
												And the Causality Detection Benchmark
1 Persian Causality Corpus (PerCause) and the Causality Detection Benchmark Zeinab Rahimi, Mehrnoush ShamsFard*1 NLP Research Lab., Shahid Beheshti University, Tehran, Iran {z_rahimi, m-shams}@sbu.ac.ir ABSTRACT. Recognizing causal elements and causal relations in text is one of the challenging issues in natural language processing; specifically, in low resource languages such as Persian. In this research we prepare a causality human annotated corpus for the Persian language which consists of 4446 sentences and 5128 causal relations and three labels of cause, effect and causal mark -if possible- are specified for each relation. We have used this corpus to train a system for detecting causal elements’ boundaries. Also, we present a causality detection benchmark for three machine learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through CRF classifier which has F-measure of 0.76 and the best accuracy obtained through BiLSTM-CRF deep learning method with Accuracy equal to %91.4. Keywords: PerCause, Causality annotated corpus, causality detection, deep learning, CRF 1. Introduction Causal relation extraction is a challenging natural language processing (NLP) open problem that mainly involves semantic analysis and is used in many NLP tasks such as textual entailment recognition, question answering, event prediction and narrative extraction. Causation indicates that one event, state or action is the result of the occurrence of the other state, event or action. Consider the following examples: (1) John is poisoned because he ate a poisonous apple. (2) It’s raining. The streets are slippery. (3) Narrow roads are dangerous. - 
												
												Proceedings of the 9Th Workshop on Multiword Expressions (MWE 2013), Pages 1–10, Atlanta, Georgia, 13-14 June 2013
NAACL HLT 2013 9th Workshop on Multiword Expressions MWE 2013 Proceedings of the Workhop 13-14 June 2013 Atlanta, Georgia, USA c 2013 The Association for Computational Linguistics 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-937284-47-3 ii Introduction The 9th Workshop on Multiword Expressions (MWE 2013)1 took place on June 13 and 14, 2013 in Atlanta, Georgia, USA in conjunction with the 2013 Conference of the North American Chapter of the (NAACL HLT 2013), and was endorsed by the Special Interest Group on the Lexicon of the Association for Computational Linguistics (SIGLEX)2. The workshop has been held almost every year since 2003 in conjunction with ACL, EACL, NAACL, COLING and LREC. It provides an important venue for interaction, sharing of resources and tools and collaboration efforts for advancing the computational treatment of Multiword Expressions (MWEs), attracting the attention of an ever-growing community working on a variety of languages and MWE types. MWEs include idioms (storm in a teacup, sweep under the rug), fixed phrases (in vitro, by and large, rock’n roll), noun compounds (olive oil, laser printer), compound verbs (take a nap, bring about), among others. These, while easily mastered by native speakers, are a key issue and a current weakness for natural language parsing and generation, as well as real-life applications depending on some degree of semantic interpretation, such as machine translation, just to name a prominent one among many. However, thanks to the joint efforts of researchers from several fields working on MWEs, significant progress has been made in recent years, especially concerning the construction of large-scale language resources. - 
												
												A Treebank-Based Investigation of IPP-Triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch Liesbeth Augustinus Frank Van Eynde [email protected] [email protected] Centre for Computational Linguistics, KU Leuven Abstract Based on a division into subject raising, subject control, and object rais- ing verbs, syntactic and semantic distinctions are drawn to account for the differences and similarities between verbs triggering Infinitivus Pro Partici- pio (IPP) in Dutch. Furthermore, quantitative information provides a gen- eral idea of the frequency of IPP-triggering verb patterns, as well as a more detailed account of verbs which optionally trigger IPP. The classification is based on IPP-triggers occurring in two treebanks. 1 Introduction Infinitivus Pro Participio (IPP) or Ersatzinfinitiv appears in a subset of the West- Germanic languages, such as German, Afrikaans, and Dutch. Only verbs that select a (te-)infinitive are possible IPP-triggers. If those verbs occur in the perfect tense, they normally appear as a past participle, such as gevraagd ‘asked’ in (1). In IPP constructions, however, those verbs appear as an infinitival verb form instead of a past participle, such as kunnen ‘can’ in (2). Haeseryn et al. [3], Rutten [5], and Schmid [7] (among others) provide lists of verbs which trigger IPP. The examples show that for some verbs, IPP is triggered obligatorily (2), whereas for other verbs, IPP appears optionally (3a - 3b), or is ungrammatical (1). (1) Of bent u als partner gevraagd deel te nemen in een IST project ...? Or are you as partner ask-PSP part to take in an IST project ‘Or are you asked as a partner to take part in an IST project ...?’ (WR-P-E-E- 0000000019.p.2.s.2) (2) Pas nu hebben we dat ook kunnen zien in de hersenen.