Semantic Approaches in Natural Language Processing
Total Page:16
File Type:pdf, Size:1020Kb
Cover:Layout 1 02.08.2010 13:38 Seite 1 10th Conference on Natural Language Processing (KONVENS) g n i s Semantic Approaches s e c o in Natural Language Processing r P e g a u Proceedings of the Conference g n a on Natural Language Processing 2010 L l a r u t a This book contains state-of-the-art contributions to the 10th N n Edited by conference on Natural Language Processing, KONVENS 2010 i (Konferenz zur Verarbeitung natürlicher Sprache), with a focus s e Manfred Pinkal on semantic processing. h c a o Ines Rehbein The KONVENS in general aims at offering a broad perspective r on current research and developments within the interdiscipli - p p nary field of natural language processing. The central theme A Sabine Schulte im Walde c draws specific attention towards addressing linguistic aspects i t of meaning, covering deep as well as shallow approaches to se - n Angelika Storrer mantic processing. The contributions address both knowledge- a m based and data-driven methods for modelling and acquiring e semantic information, and discuss the role of semantic infor - S mation in applications of language technology. The articles demonstrate the importance of semantic proces - sing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting se - mantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection high - lights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field. universaar Universitätsverlag des Saarlandes Saarland University Press Presses universitaires de la Sarre Manfred Pinkal, Ines Rehbein, Sabine Schulte im Walde, Angelika Storrer (Eds.) Semantic Approaches in Natural Language Processing Proceedings of the Conference on Natural Language Processing 2010 universaar Universitätsverlag des Saarlandes Saarland University Press Presses universitaires de la Sarre © September 2010 universaar Universitätsverlag des Saarlandes Saarland University Press Presses universitaires de la Sarre Postfach 151150, 66041 Saarbrücken ISBN 978-3-86223-004-4 gedruckte Ausgabe ISBN 978-3-86223-005-1 Online-Ausgabe URN urn:nbn:de:bsz:291-universaar-124 Projektbetreuung universaar : Isolde Teufel Gestaltung Umschlagseiten: Andreas Franz Gedruckt auf säurefreiem Papier von Monsenstein & Vannerdat Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über <http://dnb.d-nb.de> abrufbar. 3 Contents Preface 5 Long Papers Querying Linguistic Corpora with Prolog Gerlof Bouma ........................................ 9 Rapid Bootstrapping of Word Sense Disambiguation Resources for German Samuel Broscheit, Anette Frank, Dominic Jehle, Simone Paolo Ponzetto, Danny Rehl, Anja Summa, Klaus Suttner and Saskia Vola ........................ 19 Presupposition Projection and Accommodation in Mathematical Texts Marcos Cramer, Daniel Kuehlwein and Bernhard Schroder¨ ................ 29 Inkrementelle Koreferenzanalyse f¨ur das Deutsche Manfred Klenner, Angela Fahrni and Don Tuggener ................... 37 Determining the Degree of Compositionality of German Particle Verbs by Clustering Approaches Natalie Kuhner¨ and Sabine Schulte im Walde ....................... 47 Noun Phrase Chunking and Categorization for Authoring Aids Cerstin Mahlow and Michael Piotrowski .......................... 57 Learning Meanings of Words and Constructions, Grounded in a Virtual Game Hilke Reckman, Jeff Orkin and Deb Roy .......................... 67 EXCERPT – Ein integriertes System zum informativen Summarizing und Within-Document- Retrieval Jurgen¨ Reischer ....................................... 77 Using tf-idf-related Measures for Determining the Anaphoricity of Noun Phrases Julia Ritz .......................................... 85 Natural Language Processing for the Swiss German Dialect Area Yves Scherrer and Owen Rambow ............................. 93 Short Papers Clustering Urdu Verbs on the Basis of Related Aspectual Auxiliaries and Light Verbs Tafseer Ahmed .......................................105 Qualitative and Quantitative Error Analysis in Context Christian Chiarcos and Julia Ritz .............................111 POS-Tagging of Historical Language Data: First Experiments Stefanie Dipper .......................................117 Trainable Tree Distance and an Application to Question Categorisation Martin Emms ........................................123 Training and Evaluating a German Named Entity Recognizer with Semantic Generalization Manaal Faruqui and Sebastian Pado´ ...........................129 Aufbau eines linguistischen Korpus aus den Daten der englischen Wikipedia Markus Fuchs .......................................135 4 Proceedings of KONVENS 2010, Saarbrücken, Germany, September 2010 Detecting Vagueness in Two Languages and Two Domains Viola Ganter ........................................141 Transliteration among Indian Languages using WX Notation Rohit Gupta, Pulkit Goyal and Sapan Diwakar ......................147 Konstruktionsgrammatische Analyse von Gliederungsstrukturen Harald Lungen¨ and Mariana Hebborn ...........................151 Productivity of NPN Sequences in German, English, French, and Spanish Claudia Roch, Katja Keßelmeier and Antje Muller¨ ....................157 Automatically Determining the Semantic Gradation of German Adjectives Peter Schulam and Christiane Fellbaum ..........................163 Analytic and Synthetic Verb Forms in Irish – An Agreement-Based Implementation in LFG Sebastian Sulger ......................................169 A Base-form Lexicon of Content Words for Correct Word Segmentation and Syntactic-Semantic Annotation Carsten Weber and Johannes Handl ............................175 Starting a Sentence in L2 German – Discourse Annotation of a Learner Corpus Heike Zinsmeister and Margit Breckle ...........................181 5 Preface This proceedings volume contains all long and short papers presented at the 10th biennial conference on Natural Language Processing KONVENS (Konferenz zur Verarbeitung nat¨urlicher Sprache), which took place on September 6-8, 2010 at Saarland University. The papers represent a cross-section of recent research in the area of Natural Language Processing. They cover a variety of topics, ranging from NLP applications such as Named Entity Recognition, Word Sense Disambiguation or Part-of-Speech tagging of historical corpora to real-world applications like authoring aids or text summarisation and retrieval. Due to the central theme of the 10th KONVENS, Semantic Approaches in Natural Language Processing, a main focus of the contributions is on linguistic aspects of meaning, covering both deep and shallow approaches to semantic processing, and foundational aspects as well as applications. The KONVENS is organised by the scientific societies DGfS-CL (German Linguistic Society, Spe- cial Interest Group “Computational Linguistics”), the GSCL (German Society for Computational Lin- guistics and Language Technology) and the OGAI¨ (Austrian Society for Artificial Intelligence). We would like to thank the Cluster of Excellence “Multimodal Computing and Interaction” at Saar- land University as well as the DGFS-CL, whose funding made the conference possible. We would also like to thank the members of the program committee as well as the members of the review board for providing detailed and fair reviews in due time. These proceedings are also available electronically from Saarland University Press (universaar) at http://www.uni-saarland.de/de/campus/service-und-kultur/universaar/monographien.html. Saarbr¨ucken, September 2010 The editors Program Committee: Review Board: Stephan Busemann Maya Bangerter Brigitte Krenn Olga Pustylnikov Brigitte Krenn Ernst Buchberger Marco Kuhlmann Michaela Regneri Manfred Pinkal Stephan Busemann Lothar Lemnitzer Arndt Riester Ines Rehbein Stefanie Dipper Harald L¨ungen Josef Ruppenhofer Arndt Riester Kurt Eberle Torsten Marek Bernhard Schr¨oder Sabine Schulte im Walde Christiane Fellbaum Johannes Matiasek Caroline Sporleder Caroline Sporleder Antske Fokkens Alexander Mehler Manfred Stede Manfred Stede Anette Frank Friedrich Neubarth Stefan Thater Angelika Storrer Hagen F¨urstenau Rainer Osswald Harald Trost Magdalena Wolska Alexander Geyken Sebastian Pad´o Christina Unger Jeremy Jancsary Ulrike Pad´o Martin Volk Bryan Jurish Alexis Palmer Kay-Michael W¨urzner Alexander Koller Hannes Pirker Heike Zinsmeister 6 Proceedings of KONVENS 2010, Saarbrücken, Germany, September 2010 7 LONG PAPERS 8 Proceedings of KONVENS 2010, Saarbrücken, Germany, September 2010 9 Querying Linguistic Corpora with Prolog Gerlof Bouma Department Linguistik Universität Potsdam Postsdam, Germany [email protected] Abstract Prolog a good language for corpus querying, but also what its disadvantages are. It should be clear In this paper we demonstrate how Prolog can be from the outset, however, that we do not propose used to query linguistically annotated corpora, com- Prolog per se to be used as a query