Proceedings of the 1St Workshop on Sense, Concept and Entity Representations and Their Applications, Pages 1–11, Valencia, Spain, April 4 2017

SENSE 2017 EACL 2017 Workshop on Sense, Concept and Entity Representations and their Applications Proceedings of the Workshop April 4, 2017 Valencia, Spain c 2017 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-945626-50-0 ii Preface Welcome to the 1st Workshop on Sense, Concept and Entity Representations and their Applications (SENSE 2017). The aim of SENSE 2017 is to focus on addressing one of the most important limitations of word-based techniques in that they conflate different meanings of a word into a single representation. SENSE 2017 brings together researchers in lexical semantics, and NLP in general, to investigate and propose sense-based techniques as well as to discuss effective ways of integrating sense, concept and entity representations into downstream applications. The workshop is targeted at covering the following topics: Utilizing sense/concept/entity representations in applications such as Machine Translation, • Information Extraction or Retrieval, Word Sense Disambiguation, Entity Linking, Text Classification, Semantic Parsing, Knowledge Base Construction or Completion, etc. Exploration of the advantages/disadvantages of using sense representations over word • representations. Proposing new evaluation benchmarks or comparison studies for sense vector representations. • Development of new sense representation techniques (unsupervised, knowledge-based or hybrid). • Compositionality of senses: learning representations for phrases and sentences. • Construction and use of sense representations for languages other than English as well as • multilingual representations. We received 21 submissions, accepting 15 of them (acceptance rate: 71%). We would like to thank the Program Committee members who reviewed the papers and helped to improve the overall quality of the workshop. We also thank Aylien for their support in funding the best paper award. Last, a word of thanks also goes to our invited speakers, Roberto Navigli (Sapienza University of Rome) and Hinrich Schütze (University of Munich). Jose Camacho-Collados and Mohammad Taher Pilehvar Co-Organizers of SENSE 2017 iii Organizers: Jose Camacho-Collados, Sapienza University of Rome Mohammad Taher Pilehvar, University of Cambridge Program Committee: Eneko Agirre, University of the Basque Country Claudio Delli Bovi, Sapienza University of Rome Luis Espinosa-Anke, Pompeu Fabra University Lucie Flekova, Darmstadt University of Technology Graeme Hirst, University of Toronto Eduard Hovy, Carnegie Mellon University Ignacio Iacobacci, Sapienza University of Rome Richard Johansson, University of Gothenburg David Jurgens, Stanford University Omer Levy, University of Washington Andrea Moro, Microsoft Roberto Navigli, Sapienza University of Rome Arvind Neelakantan, University of Massachusetts Amherst Luis Nieto Piña, University of Gothenburg Siva Reddy, University of Edinburgh Horacio Saggion, Pompeu Fabra University Hinrich Schütze, University of Munich Piek Vossen, University of Amsterdam Ivan Vulic,´ University of Cambridge Torsten Zesch, University of Duisburg-Essen Jianwen Zhang, Microsoft Research Invited Speakers: Roberto Navigli, Sapienza University of Rome Hinrich Schütze, University of Munich v Table of Contents Compositional Semantics using Feature-Based Models from WordNet Pablo Gamallo and Martín Pereira-Fariña . .1 Automated WordNet Construction Using Word Embeddings Mikhail Khodak, Andrej Risteski, Christiane Fellbaum and Sanjeev Arora. .12 Improving Verb Metaphor Detection by Propagating Abstractness to Words, Phrases and Individual Senses Maximilian Köper and Sabine Schulte im Walde . 24 Improving Clinical Diagnosis Inference through Integration of Structured and Unstructured Knowledge Yuan Ling, Yuan An and Sadid Hasan . 31 Classifying Lexical-semantic Relationships by Exploiting Sense/Concept Representations Kentaro Kanada, Tetsunori Kobayashi and Yoshihiko Hayashi . 37 Supervised and unsupervised approaches to measuring usage similarity Milton King and Paul Cook . 47 Lexical Disambiguation of Igbo using Diacritic Restoration Ignatius Ezeani, Mark Hepple and Ikechukwu Onyenwe . 53 Creating and Validating Multilingual Semantic Representations for Six Languages: Expert versus Non- Expert Crowds Mahmoud El-Haj, Paul Rayson, Scott Piao and Stephen Wattam . 61 Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation Alexander Panchenko, Stefano Faralli, Simone Paolo Ponzetto and Chris Biemann . 72 One Representation per Word - Does it make Sense for Composition? Thomas Kober, Julie Weeds, John Wilkie, Jeremy Reffin and David Weir . 79 Elucidating Conceptual Properties from Word Embeddings Kyoung-Rok Jang and Sung-Hyon Myaeng. .91 TTCSê: a Vectorial Resource for Computing Conceptual Similarity Enrico Mensa, Daniele P. Radicioni and Antonio Lieto . 96 Measuring the Italian-English lexical gap for action verbs and its impact on translation Lorenzo Gregori and Alessandro Panunzi . 102 Word Sense Filtering Improves Embedding-Based Lexical Substitution Anne Cocos, Marianna Apidianaki and Chris Callison-Burch . 110 Supervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambigous Synonyms Aleksander Wawer and Agnieszka Mykowiecka . 120 vii Workshop Program 8:30 - 9:30 Registration 9:30 - 11:00 Session 1 9:30-9:40 - Opening Remarks • 9:40-10:00 - Short paper presentations • Improving Verb Metaphor Detection by Propagating Abstractness to Words, Phrases and Individual Senses Maximilian Köper and Sabine Schulte im Walde Using Linked Disambiguated Distributional Networks for Word Sense Disambigua- tion Alexander Panchenko, Stefano Faralli, Simone Paolo Ponzetto and Chris Biemann 10:00-11:00 - Invited talk by Roberto Navigli (Sapienza University) • 11:00 - 11:30 Coffee break 11:30 - 13:00 Session 2 11:30-11:45 - Lightning talks (posters) • Compositional Semantics using Feature-Based Models from WordNet Pablo Gamallo and Martín Pereira-Fariña Automated WordNet Construction Using Word Embeddings Mikhail Khodak, Andrej Risteski, Christiane Fellbaum and Sanjeev Arora Classifying Lexical-semantic Relationships by Exploiting Sense/Concept Represen- tations Kentaro Kanada, Tetsunori Kobayashi and Yoshihiko Hayashi Supervised and unsupervised approaches to measuring usage similarity Milton King and Paul Cook Lexical Disambiguation of Igbo using Diacritic Restoration Ignatius Ezeani, Mark Hepple and Ikechukwu Onyenwe ix Creating and Validating Multilingual Semantic Representations for Six Languages: Expert versus Non-Expert Crowds Mahmoud El-Haj, Paul Rayson, Scott Piao and Stephen Wattam Elucidating Conceptual Properties from Word Embeddings Kyoung-Rok Jang and Sung-Hyon Myaeng TTCSê: a Vectorial Resource for Computing Conceptual Similarity Enrico Mensa, Daniele P. Radicioni and Antonio Lieto Measuring the Italian-English lexical gap for action verbs and its impact on translation Lorenzo Gregori and Alessandro Panunzi Supervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambigous Synonyms Aleksander Wawer and Agnieszka Mykowiecka 11:45-13:00 - Poster session • 13:00 - 14:30 Lunch 14:30 - 16:00 Session 3 14:30-15:00 - Invited talk by Hinrich Schütze (University of Munich) • 15:00-16:00 - Presentations of the best paper award candidates • Improving Clinical Diagnosis Inference through Integration of Structured and Un- structured Knowledge Yuan Ling, Yuan An and Sadid Hasan One Representation per Word - Does it make Sense for Composition? Thomas Kober, Julie Weeds, John Wilkie, Jeremy Reffin and David Weir Word Sense Filtering Improves Embedding-Based Lexical Substitution Anne Cocos, Marianna Apidianaki and Chris Callison-Burch 16:00 - 16:30 Coffee break 16:30 - 17:15 Session 4 16:30-17:15 - Open discussion, best paper award and closing remarks • x Compositional Semantics using Feature-Based Models from WordNet Pablo Gamallo Mart´ın Pereira-Farina˜ Centro Singular de Investigacion´ en Centre for Argument Tecnolox´ıas da Informacion´ (CiTIUS) Technology (ARG-tech) Universidade de Santiago de Compostela University of Dundee Galiza, Spain Dundee, DD1 4HN, Scotland (UK) [email protected] [email protected] Departamento de Filosof´ıa e Antropolox´ıa Universidade de Santiago de Compostela Pz. de Mazarelos, 15782, Galiza, Spain [email protected] Abstract Notwithstanding, these models are usually qual- This article describes a method to build ified as black box systems because they are usually semantic representations of composite ex- not interpretable by humans. Currently, the field of interpretable computational models is gaining rel- pressions in a compositional way by using 1 WordNet relations to represent the mean- evance and, therefore, the development of more ing of words. The meaning of a target explainable and understandable models in compo- word is modelled as a vector in which sitional semantics is also an open challenge. in this its semantically related words are assigned field. On the other hand, distributional semantic weights according to both the type of the models, given the size of the vectors, needs signif- relationship and the distance to the tar- icant resources and they are dependent on particu- get word. Word vectors are composi- lar corpus, which can generate some biases in their tionally combined by syntactic dependen- application to different languages.

Proceedings of the 1St Workshop on Sense, Concept and Entity Representations and Their Applications, Pages 1–11, Valencia, Spain, April 4 2017

The Main Features of the E-Glava Online Valency Dictionary

FERSIWN GYMRAEG ISOD the National Corpus of Contemporary

The Workshop Programme

The National Corpus of Contemporary Welsh 1. Introduction

Overabundance in Croatian Dual-Class Verbs FLUMINENSIA, God

Book of Abstracts

Overabundance in Croatian Dual-Class Verbs FLUMINENSIA, God

The Spoken British National Corpus 2014

Language, Communities & Mobility

Book of Abstracts

Pregled Razvoja Hrvatske E-Leksikografije

Hrvatsko Društvo Za Primijenjenu Lingvistiku