Download Book
Total Page:16
File Type:pdf, Size:1020Kb
Adrian-Horia Dediu Carlos Martín-Vide Klára Vicsi (Eds.) Statistical Language LNAI 9449 and Speech Processing Third International Conference, SLSP 2015 Budapest, Hungary, November 24–26, 2015 Proceedings 123 Lecture Notes in Artificial Intelligence 9449 Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany More information about this series at http://www.springer.com/series/1244 Adrian-Horia Dediu • Carlos Martín-Vide Klára Vicsi (Eds.) Statistical Language and Speech Processing Third International Conference, SLSP 2015 Budapest, Hungary, November 24–26, 2015 Proceedings 123 Editors Adrian-Horia Dediu Klára Vicsi Research Group on Mathematical Department of Telecommunications Linguistics and Media Informatics Rovira i Virgili University Budapest University of Technology Tarragona and Economics Spain Budapest Hungary Carlos Martín-Vide Research Group on Mathematical Linguistics Rovira i Virgili University Tarragona Spain ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-319-25788-4 ISBN 978-3-319-25789-1 (eBook) DOI 10.1007/978-3-319-25789-1 Library of Congress Control Number: 2015952770 LNCS Sublibrary: SL7 – Artificial Intelligence Springer Cham Heidelberg New York Dordrecht London © Springer International Publishing Switzerland 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www.springer.com) Preface This volume contains the papers presented at the Third International Conference on Statistical Language and Speech Processing (SLSP 2015), held in Budapest, Hungary, during November 24–26, 2015. SLSP 2015 was the third event in a series to host and promote research on the wide spectrum of statistical methods that are currently in use in computational language or speech processing; it aims to attract contributions from both fields. The conference encourages discussion on the employment of statistical methods (including machine learning) within language and speech processing. The scope of the SLSP series is rather broad, and includes the following areas: anaphora and coreference resolution; author- ship identification, plagiarism, and spam filtering; computer-aided translation; corpora and language resources; data mining and the Semantic Web; information extraction; information retrieval; knowledge representation and ontologies; lexicons and dic- tionaries; machine translation; multimodal technologies; natural language understand- ing; neural representation of speech and language; opinion mining and sentiment analysis; parsing; part-of-speech tagging; question-answering systems; semantic role labeling; speaker identification and verification; speech and language generation; speech recognition; speech synthesis; speech transcription; spelling correction; spoken dialogue systems; term extraction; text categorization; text summarization; user modeling. SLSP 2015 received 71 submissions, which were reviewed by the Program Com- mittee members, some of whom consulted with external referees as well. Most of the papers received three reviews, and several even four reviews. After a thorough and lively discussion, the committee decided to accept 26 papers (representing an accep- tance rate of 37 %). The program also included three invited talks. Part of the success in managing such a high number of submissions is due to the excellent facilities provided by the EasyChair conference management system. We would like to thank all invited speakers and authors for their contributions, the Program Committee and the reviewers for their cooperation, and Springer for its very profes- sional publishing work. August 2015 Adrian-Horia Dediu Carlos Martín-Vide Klára Vicsi Organization SLSP 2015 was organized by the Laboratory of Speech Acoustics (LSA), Department of Telecommunications and Telematics, Budapest University of Technology and Economics, Hungary, and the Research Group on Mathematical Linguistics (GRLMC), Rovira i Virgili University, Tarragona, Spain. Program Committee Steven Abney University of Michigan, Ann Arbor, USA Roberto Basili University of Rome Tor Vergata, Italy Jean-François Bonastre University of Avignon, France Jill Burstein Educational Testing Service, Princeton, USA Nicoletta Calzolari National Research Council, Pisa, Italy Kevin Bretonnel Cohen University of Colorado, Denver, USA W. Bruce Croft University of Massachusetts, Amherst, USA Marc Dymetman Xerox Research Centre Europe, Meylan, France Guillaume Gravier IRISA, Rennes, France Kadri Hacioglu Sensory Inc., Santa Clara, USA Udo Hahn University of Jena, Germany Thomas Hain University of Sheffield, UK Mark Hasegawa-Johnson University of Illinois, Urbana, USA Jing Jiang Singapore Management University, Singapore Tracy Holloway King A9.com, Palo Alto, USA Sadao Kurohashi Kyoto University, Japan Claudia Leacock McGraw-Hill Education CTB, Monterey, USA Mark Liberman University of Pennsylvania, Philadelphia, USA Carlos Martín-Vide (Chair) Rovira i Virgili University, Tarragona, Spain Alessandro Moschitti University of Trento, Italy Jian-Yun Nie University of Montréal, Canada Maria Teresa Pazienza University of Rome Tor Vergata, Italy Adam Pease IPsoft Inc., New York, USA Fuchun Peng Google Inc., Mountain View, USA Bhiksha Raj Carnegie Mellon University, Pittsburgh, USA Javier Ramírez University of Granada, Spain Paul Rayson Lancaster University, UK Dietrich University of Zurich, Switzerland Rebholz-Schuhmann Douglas A. Reynolds Massachusetts Institute of Technology, Lexington, USA Michael Riley Google Inc., Mountain View, USA Horacio Saggion Pompeu Fabra University, Barcelona, Spain David Sánchez Rovira i Virgili University, Tarragona, Spain VIII Organization Roser Saurí Oxford University Press, Oxford, UK Stefan Schulz Medical University of Graz, Austria Efstathios Stamatatos University of the Aegean, Karlovassi, Greece Yannis Stylianou Toshiba Research Europe Ltd., Cambridge, UK Maosong Sun Tsinghua University, Beijing, China Tomoki Toda Nara Institute of Science and Technology, Japan Yoshimasa Tsuruoka University of Tokyo, Japan Klára Vicsi Budapest University of Technology and Economics, Hungary Enrique Vidal Technical University of Valencia, Spain Atro Voutilainen University of Helsinki, Finland Andy Way Dublin City University, Ireland Junichi Yamagishi University of Edinburgh, UK Luke Zettlemoyer University of Washington, Seattle, USA Pierre Zweigenbaum LIMSI-CNRS, Orsay, France Additional Reviewers Chenhui Chu Sergio Martínez Ying Ding Jianfei Yu Mónica Domínguez Bajo Organizing Committee Adrian-Horia Dediu Rovira i Virgili University, Tarragona, Spain Carlos Martín-Vide Rovira i Virgili University, Tarragona, Spain (Co-chair) György Szaszák Budapest University of Technology and Economics, Hungary Klára Vicsi (Co-chair) Budapest University of Technology and Economics, Hungary Florentina-Lilica Voicu Rovira i Virgili University, Tarragona, Spain Invited Talks (Abstracts) Low-Rank Matrix Learning for Compositional Objects, Strings and Trees Xavier Carreras Xerox Research Centre Europe, 38240 Meylan, France [email protected] Many prediction tasks in Natural Language Processing deal with structured objects that exhibit a compositional nature. Consider this sentence in the context of a Named Entity Recognition task: zfflfflfflfflfflffl}|fflfflfflfflfflffl{Loc A shipload of 12 tonnes of rice arrives in Umm Qasr port in the Gulf: In this example we can distinguish three main parts: the entity mention Umm Qasr annotated with a location tag (LOC), a left context, and a right context. We could generate other examples by substituting one of these parts with other values. We would like to develop statistical models that correctly capture the compositional properties of our structured objects (i.e. what parts can be combined together to form valid objects), and exploit these properties to make predictions. For instance, we could design a model that computes predictions using this expresssion: > aðÞA shipload of 12 tonnes of rice arrives in cLoc ðÞUmm Qasr bðÞport in the Gulf: where: – The model induces latent spaces of n dimensions – a: is a function that embeds left contexts to vectors in Rn – b: is a function that embeds