The Interplay Between Lexical Resources and Natural Language

NAACL 2018 Tutorial – The Interplay between Lexical Resources and Natural Language Processing Jose Camacho-Collados, Luis Espinosa-Anke Mohammad Taher Pilehvar School of Computer Science and Informatics Language Technology Lab Cardiff University University of Cambridge [email protected] [email protected] [email protected] Abstract topics, as well as the integration of resources of different nature. Incorporating linguistic, world and common As far as the integration of lexical resources in sense knowledge into AI/NLP systems is cur- NLP applications is concerned, we will explain rently an important research area, with sev- some of the current challenges in Word Sense Dis- eral open problems and challenges. At the ambiguation and Entity Linking, as key tasks in same time, processing and storing this knowledge in lexical resources is not a straightfor- natural language understanding which also enable ward task. This tutorial proposes to address a direct integration of knowledge from lexical re- these complementary goals from two method- sources. We will explain some knowledge-based ological perspectives: the use of NLP meth- and supervised methods for these tasks which play ods to help the process of constructing and en- a decisive role in connecting lexical resources and riching lexical resources and the use of lexi- text data. Moreover, we will present the field cal resources for improving NLP applications. of knowledge-based representations, in particu- Two main types of audience can benefit from this tutorial: those working on language re- lar word sense embeddings, as flexible techniques sources who are interested in becoming ac- which act as a bridge between lexical resources quainted with automatic NLP techniques, with and applications. Finally, we will briefly present the end goal of speeding and/or easing up some recent work on the integration of this en- the process of resource curation; and on the coded knowledge from lexical resources into neu- other hand, researchers in NLP who would ral architectures for improving downstream NLP like to benefit from the knowledge of lexical applications. resources to improve their systems and mod- els. The slides of the tutorial are available at https://bitbucket.org/luisespinosa/lr-nlp/. 2 Outline 2.1 Introduction and Motivation 1 Description Adding explicit knowledge into AI/NLP systems The manual construction of lexical resources is a is currently an important challenge due to the arXiv:1807.00571v1 [cs.CL] 2 Jul 2018 prohibitively time-consuming process, and even in gains that can be obtained in many downstream the most restricted knowledge domains and less- applications. At the same time, these resources resourced languages, the use of language tech- can be further enriched and better exploited by nologies to ease up this process is becoming a making use of NLP techniques. In this context, standard practice. NLP techniques can be effec- the main motivation of this tutorial is to show tively leveraged to reduce creation and mainte- how Natural Language Processing and Lexical Re- nance efforts. In this tutorial we will present open sources have interacted so far, and a view towards problems and research challenges in these topics potential scenarios in the near future. concerning the interplay between lexical resources As an introduction we first present an overview and NLP. Additionally, we will summarize exist- of current lexical resources, starting from the ing attempts in this direction, such as modeling de facto standard lexical resource for English, linguistic phenomena like terminology, definitions i.e., WordNet (Fellbaum, 1998). We provide and glosses, examples and relations, phraseologi- a concise overview of WordNet, showing what cal units, or clustering techniques for senses and synsets are and how the resource can be viewed as a semantic network. We then briefly dis- approaches exploiting linguistic knowledge cuss some of the limitations of WordNet and dis- (Hulth, 2003). cuss how these can be alleviated to some ex- tent with the help of collaboratively-constructed 2. Definition extraction. Techniques resources, such as Freebase (Bollacker et al., for extracting definitional text snip- 2008), Wikidata (Vrandeˇcić, 2012) and Babel- pets from corpora (Navigli and Velardi, Net (Navigli and Ponzetto, 2012). As the main 2010; Boella and Di Caro, 2013; building block of these resources, we show Espinosa-Anke et al., 2015; Li et al., 2016; how collaboratively-constructed projects, such as Espinosa-Anke and Schockaert, 2018). Wikipedia1 and Wiktionary2, can serve as mas- sive multilingual sources of lexical information. 3. Automatic extraction of examples. De- The lexical resources session is concluded by scription of example extraction techniques a short introduction to the Paraphrase Database and designs on this direction, e.g., the (PPDB) (Ganitkevitch et al., 2013; Pavlick et al., GDEX criteria and their implementation 2015) and to a domain-specific lexical resource: (Kilgarriff et al., 2008). SNOMED3, which is one of the major ontologies for the medical domain. 4. Information extraction. Recent ap- The tutorial is then divided in two main blocks. proaches for extracting semantic relations First, we delve into NLP for Creation and Enrich- from text: NELL (Carlson et al., 2010), ment of Lexical Resources, where we address a ReVerb (Fader et al., 2011), PATTY range of NLP problems aimed specifically at im- (Nakashole et al., 2012), KB-Unify proving repositories of linguistically expressible (Delli Bovi et al., 2015). knowledge. Second, we cover different use cases in which Lexical Resources for NLP have been 5. Hypernym discovery and taxonomy leveraged successfully. The last part of the tutorial learning. Insights from recent SemEval focuses on lessons learned from work in which we tasks (Bordea et al., 2015, 2016) and re- tried to reconcile both worlds, as well as our own lated efforts on the automatic extraction view towards what the future holds for knowledge- of hypernymy relations from text corpora based approaches to NLP. (Velardi et al., 2013; Alfarone and Davis, 2015; Flati et al., 2016; Shwartz et al., 2016; 2.2 NLP for Lexical Resources Espinosa-Anke et al., 2016; Gupta et al., The application of language technologies to the 2016). automatic construction and extension of lexical re- 6. Topic/domain clustering techniques. sources has proven successful in that it has pro- Relevant techniques for filtering gen- vided various tools for optimizing this often pro- eral domain resources via topic grouping hibitively costly and expensive process. NLP tech- (Roget, 1911; Navigli and Velardi, 2004; niques provide end-to-end technologies that can Camacho-Collados and Navigli, 2017). tackle all challenges in the language resource creation and maintenance pipeline. In this tutorial 7. Alignment of lexical resources4. Align- we summarize existing efforts in this direction, in- ment of heterogeneous lexical resources cluding the extraction from text of linguistic phe- contributing to the creation of large re- nomena like terminology, definitions and glosses, sources containing different sources of examples and relations, as well as clustering tech- knowledge. We will present approaches niques for senses and topics. for the construction of such resources, 1. Terminology extraction. Measures for such as Yago (Suchanek et al., 2007), terminology extraction, the simple conven- UBY (Gurevych et al., 2012), BabelNet tional tf-idf (Sparck Jones, 1972), lexical (Navigli and Ponzetto, 2012) or Con- specificity (Lafon, 1980), and more recent ceptNet (Speer et al., 2017), as well as other works attempting to improve the 1https://www.wikipedia.org/ 2https://www.wiktionary.org/ 4Due to time constraints, items 7 and 8 were not presented 3https://www.snomed.org/ during the tutorial. automatic procedures to align lexical re- Camacho-Collados et al., 2016; Mancini et al., sources (Matuschek and Gurevych, 2013; 2017). Pilehvar and Navigli, 2014). Finally, we briefly present a few successful approaches integrating knowledge-based repre- 8. Ontology enrichment. Enriching lexical on- sentations into downstream tasks such as senti- tologies with novel concepts or with addi- ment analysis (Flekova and Gurevych, 2016), lex- tional relations (Jurgens and Pilehvar, 2016). ical substitution (Cocos et al., 2017) or visual ob- ject discovery (Young et al., 2017). As a case 2.3 Lexical Resources for NLP study, we present an analysis on the integra- In addition to the (semi)automatic efforts for eas- tion of knowledge-based embeddings into neu- ing the task of constructing and enriching lexi- ral architectures via WSD for text classification cal resources presented in the previous section, (Pilehvar et al., 2017), discussing its potential and we present NLP tasks in which lexical resources current open challenges. have shown an important contribution. Effec- tively leveraging linguistically expressible cues 2.4 Open problems and challenges with their associated knowledge remains a difficult In this last section we introduce some of the open task. Knowledge may be extracted from roughly problems and challenges for automatizing the re- three types of resource (Hovy et al., 2013): un- source creation and enrichment process as well as structured, e.g. text corpora; semistructured, for the integration of knowledge from lexical re- such as encyclopedic collaborative repositories sources into NLP applications. like Wikipedia, or structured, which include lex- icographic resources like WordNet. In

The Interplay Between Lexical Resources and Natural Language

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support