Linked Lexical Knowledge Bases Foundations and Applications Synthesis Lectures on Human Language Technologies

Linked Lexical Knowledge Bases Foundations and Applications Synthesis Lectures on Human Language Technologies Editor Graeme Hirst, University of Toronto Synthesis Lectures on Human Language Technologies is edited by Graeme Hirst of the University of Toronto. e series consists of 50- to 150-page monographs on topics relating to natural language processing, computational linguistics, information retrieval, and spoken language understanding. Emphasis is on important new techniques, on new applications, and on topics that combine two or more HLT subfields. Linked Lexical Knowledge Bases: Foundations and Applications Iryna Gurevych, Judith Eckle-Kohler, and Michael Matuschek 2016 Syntax-based Statistical Machine Translation Philip Williams, Rico Sennrich, Matt Post, and Philipp Koehn 2016 Bayesian Analysis in Natural Language Processing Shay Cohen 2016 Metaphor: A Computational Perspective Tony Veale, Ekaterina Shutova, and Beata Beigman Klebanov 2016 Grammatical Inference for Computational Linguistics Jeffrey Heinz, Colin de la Higuera, and Menno van Zaanen 2015 Automatic Detection of Verbal Deception Eileen Fitzpatrick, Joan Bachenko, and Tommaso Fornaciari 2015 Natural Language Processing for Social Media Atefeh Farzindar and Diana Inkpen 2015 iii Semantic Similarity from Natural Language and Ontology Analysis Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain 2015 Learning to Rank for Information Retrieval and Natural Language Processing, Second Edition Hang Li 2014 Ontology-Based Interpretation of Natural Language Philipp Cimiano, Christina Unger, and John McCrae 2014 Automated Grammatical Error Detection for Language Learners, Second Edition Claudia Leacock, Martin Chodorow, Michael Gamon, and Joel Tetreault 2014 Web Corpus Construction Roland Schäfer and Felix Bildhauer 2013 Recognizing Textual Entailment: Models and Applications Ido Dagan, Dan Roth, Mark Sammons, and Fabio Massimo Zanzotto 2013 Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax Emily M. Bender 2013 Semi-Supervised Learning and Domain Adaptation in Natural Language Processing Anders Søgaard 2013 Semantic Relations Between Nominals Vivi Nastase, Preslav Nakov, Diarmuid Ó Séaghdha, and Stan Szpakowicz 2013 Computational Modeling of Narrative Inderjeet Mani 2012 Natural Language Processing for Historical Texts Michael Piotrowski 2012 iv Sentiment Analysis and Opinion Mining Bing Liu 2012 Discourse Processing Manfred Stede 2011 Bitext Alignment Jörg Tiedemann 2011 Linguistic Structure Prediction Noah A. Smith 2011 Learning to Rank for Information Retrieval and Natural Language Processing Hang Li 2011 Computational Modeling of Human Language Acquisition Afra Alishahi 2010 Introduction to Arabic Natural Language Processing Nizar Y. Habash 2010 Cross-Language Information Retrieval Jian-Yun Nie 2010 Automated Grammatical Error Detection for Language Learners Claudia Leacock, Martin Chodorow, Michael Gamon, and Joel Tetreault 2010 Data-Intensive Text Processing with MapReduce Jimmy Lin and Chris Dyer 2010 Semantic Role Labeling Martha Palmer, Daniel Gildea, and Nianwen Xue 2010 Spoken Dialogue Systems Kristiina Jokinen and Michael McTear 2009 v Introduction to Chinese Natural Language Processing Kam-Fai Wong, Wenjie Li, Ruifeng Xu, and Zheng-sheng Zhang 2009 Introduction to Linguistic Annotation and Text Analytics Graham Wilcock 2009 Dependency Parsing Sandra Kübler, Ryan McDonald, and Joakim Nivre 2009 Statistical Language Models for Information Retrieval ChengXiang Zhai 2008 Copyright © 2016 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Linked Lexical Knowledge Bases: Foundations and Applications Iryna Gurevych, Judith Eckle-Kohler, and Michael Matuschek www.morganclaypool.com ISBN: 9781627059749 paperback ISBN: 9781627059046 ebook DOI 10.2200/S00717ED1V01Y201605HLT034 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON HUMAN LANGUAGE TECHNOLOGIES Lecture #34 Series Editor: Graeme Hirst, University of Toronto Series ISSN Print 1947-4040 Electronic 1947-4059 Linked Lexical Knowledge Bases Foundations and Applications Iryna Gurevych, Judith Eckle-Kohler, and Michael Matuschek Technische Universität Darmstadt, Germany SYNTHESIS LECTURES ON HUMAN LANGUAGE TECHNOLOGIES #34 M &C Morgan & cLaypool publishers ABSTRACT is book conveys the fundamentals of Linked Lexical Knowledge Bases (LLKB) and sheds light on their different aspects from various perspectives, focusing on their construction and use in natural language processing (NLP). It characterizes a wide range of both expert-based and collaboratively constructed lexical knowledge bases. Only basic familiarity with NLP is required and this book has been written for both students and researchers in NLP and related fields who are interested in knowledge-based approaches to language analysis and their applications. Lexical Knowledge Bases (LKBs) are indispensable in many areas of natural language processing, as they encode human knowledge of language in machine readable form, and as such, they are required as a reference when machines attempt to interpret natural language in accor- dance with human perception. In recent years, numerous research efforts have led to the insight that to make the best use of available knowledge, the orchestrated exploitation of different LKBs is necessary. is allows us to not only extend the range of covered words and senses, but also gives us the opportunity to obtain a richer knowledge representation when a particular meaning of a word is covered in more than one resource. Examples where such an orchestrated usage of LKBs proved beneficial include word sense disambiguation, semantic role labeling, semantic parsing, and text classification. is book presents different kinds of automatic, manual, and collaborative linkings between LKBs. A special chapter is devoted to the linking algorithms employing text-based, graph-based, and joint modeling methods. Following this, it presents a set of higher-level NLP tasks and algorithms, effectively utilizing the knowledge in LLKBs. Among them, you will find advanced methods, e.g., distant supervision, or continuous vector space models of knowledge bases (KB), that have become widely used at the time of this book’s writing. Finally, multilingual applications of LLKB’s, such as cross-lingual semantic relatedness and computer-aided translation are discussed, as well as tools and interfaces for exploring LLKBs, followed by conclusions and future research directions. KEYWORDS lexical knowledge bases, linked lexical knowledge bases, sense alignment, word sense disambiguation, graph-based methods, text similarity, distant supervision, automatic knowledge base construction, continuous vector space models, multilingual applications ix Contents Foreword ......................................................... xiii Preface .......................................................... xvii Acknowledgments .................................................. xxi 1 Lexical Knowledge Bases ............................................. 1 1.1 Expert-built Lexical Knowledge Bases . 4 1.1.1 Wordnets . 4 1.1.2 Framenets . 6 1.1.3 Valency Lexicons . 7 1.1.4 Verbnets . 9 1.2 Collaboratively Constructed Knowledge Bases . 11 1.2.1 Wikipedia . 11 1.2.2 Wiktionary . 13 1.2.3 OmegaWiki . 15 1.3 Standards . 16 1.3.1 ISO Lexical Markup Framework . 16 1.3.2 Semantic Web Standards . 19 1.4 Chapter Conclusion . 19 2 Linked Lexical Knowledge Bases ...................................... 21 2.1 Combining LKBs for Specific Tasks . 22 2.2 Large-scale LLKBs . 23 2.3 Automatic Linking Involving Wordnets . 25 2.4 Manual and Collaborative Linking . 26 2.5 Chapter Conclusion . 27 3 Linking Algorithms ................................................ 29 3.1 Information Integration . 29 3.1.1 Ontology Matching . 29 3.1.2 Database Schema Matching . 31 x 3.1.3 Graph Matching . 31 3.2 Evaluation Metrics for WSL . 32 3.3 Gloss Similarity-based WSL . 33 3.3.1 Word Overlap . 33 3.3.2 Vector Representations . 34 3.3.3 Personalized PageRank . 34 3.3.4 Additional Remarks . 35 3.4 Graph Structure-based WSL . 37 3.4.1 Wikipedia Category Alignment . 38 3.4.2 Shortest Paths . 38 3.5 Joint Modeling . 39 3.5.1 Machine Learning Approaches. 40 3.5.2 Unsupervised Approaches . 41 3.6 Chapter Conclusion . 42 4 Fundamental Disambiguation Methods ................................ 45 4.1 Disambiguating Textual Units . 45 4.2 Enhanced Disambiguation Using LLKBs . 46 4.2.1 Approaches . 46 4.2.2 Overview of Work in this Area . 48 4.3 Robust Disambiguation Heuristics . 49 4.4 Sense Clustering . 50 4.4.1 Method . 50 4.4.2 Overview of Work in this Area . 51 4.5 Sense-annotated Corpora . 52 4.6 Chapter Conclusion . 54 5 Advanced Disambiguation Methods ................................... 55 5.1 Automatic Knowledge Base Construction . 55 5.2 Distant Supervision . 56 5.2.1 Method . 56 5.2.2 Overview of Work in this Area . 57 5.3 Continuous Vector Space Models of KBs . 60 5.3.1 Method . 61 5.3.2 Overview of Work in this Area . 61 5.4 Chapter Conclusion . 66 xi 6 Multilingual Applications ........................................... 67 6.1 Multilingual Semantic Relatedness . 67 6.2 Computer-aided Translation . 68 6.2.1 Overview of Work in this Area . 69 6.2.2 Illustrative Example . ..

Load more