Improving Hypernymy Prediction Via Taxonomy Enhanced Adversarial Learning
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Improving Hypernymy Prediction via Taxonomy Enhanced Adversarial Learning Chengyu Wang,1 Xiaofeng He,1∗ Aoying Zhou2 1School of Computer Science and Software Engineering, East China Normal University 2School of Data Science and Engineering, East China Normal University [email protected], [email protected], [email protected] Abstract 2018)), distributional approaches leverage the distributional representations of terms to predict hypernymy. For example, Hypernymy is a basic semantic relation in computational lin- several unsupervised hypernymy measures predict whether guistics that expresses the “is-a” relation between a generic there exists a hypernymy relation between two terms (San- concept and its specific instances, serving as the backbone in taxonomies and ontologies. Although several NLP tasks tus et al. 2014; Kiela et al. 2015). Other works employ su- related to hypernymy prediction have been extensively ad- pervised algorithms to predict hypernymy (Roller, Erk, and dressed, few methods have fully exploited the large number Boleda 2014; Weeds et al. 2014). In these methods, each of hypernymy relations in Web-scale taxonomies. pair of terms is represented as an embedding vector related In this paper, we introduce the Taxonomy Enhanced Adver- to the two terms, which is further fed into an SVM or lo- sarial Learning (TEAL) for hypernymy prediction. We first gistic regression classifier to make the prediction. Addition- propose an unsupervised measure U-TEAL to distinguish hy- ally, Shwartz et al. (2016) combine pattern based and dis- pernymy with other semantic relations. It is implemented tributional methods to improve the performance by using based on a word embedding projection network distantly an integrated network. Because hypernymy relations are re- trained over a taxonomy. To address supervised hypernymy garded to be asymmetric and transitive in most studies, sev- detection tasks, the supervised model S-TEAL and its im- eral recent approaches aim at learning hypernymy embed- proved version, the adversarial supervised model AS-TEAL, dings of terms to capture such properties (Yu et al. 2015; are further presented. Specifically, AS-TEAL employs a cou- pled adversarial training algorithm to transfer hierarchical Luu et al. 2016; Nguyen et al. 2017). These term embed- knowledge in taxonomies to hypernymy prediction models. dings are more task-oriented for predicting hypernymy. We conduct extensive experiments to confirm the effective- We observe that these methods may have three potential ness of TEAL over three standard NLP tasks: unsupervised limitations: (i) few methods have fully exploited knowledge hypernymy classification, supervised hypernymy detection in both Web-scale taxonomies and text corpora. For exam- and graded lexical entailment. We also show that TEAL can ple, Luu et al. (2016) and Nguyen et al. (2017) use the lim- be applied to non-English languages and can detect missing ited number of hypernymy relations in WordNet concept hypernymy relations in taxonomies. hierachy. Yu et al. (2015) leverage hypernymy relations in Probase (Wu et al. 2012) but do not exploit word embed- Introduction dings learned from text corpora, which contain rich contex- tual knowledge of words. (ii) Hypernymy and other seman- Hypernymy (“is-a”) is a basic semantic relation in compu- tic relations (e.g., meronymy, co-hyponymy) are usually dif- tational linguistics, expressing the “is-a” relation between a ficult to distinguish for distributional methods (Weeds et al. generic concept (hypernym) and its specific instances (hy- 2014). The process of how a term is mapped to its hyper- ponyms), such as state-Hawaii, country-United States, etc. nyms or non-hypernyms in the embedding space is not di- The accurate prediction of hypernymy is vital for vari- rectly modeled. Hence, distributional methods are likely to ous downstream NLP applications, such as taxonomy con- suffer from the lexical memorization problem (Levy et al. struction (Wu et al. 2012), textual entailment (Vulic et al. 2015). (iii) Tasks related to hypernymy prediction can be ei- 2017), knowledge base construction (Mahdisoltani, Biega, ther supervised or unsupervised. It is unclear how large the and Suchanek 2015), etc. taxonomies can benefit these tasks in a unified framework. In the literature, pattern based methods and distributional In this paper, we propose a Taxonomy Enhanced Adver- approaches are two major paradigms to harvest hypernymy sarial Learning (TEAL) framework for hypernymy predic- relations from texts (Wang, He, and Zhou 2017). Unlike tion. The basic idea is to employ large taxonomies to learn pattern based methods which rely more on lexical pattern how a term projects to its hypernyms and non-hypernyms in matching (such as Hearst patterns (Roller, Kiela, and Nickel the embedding space. We first propose an unsupervised mea- ∗Corresponding author. sure U-TEAL to distinguish hypernymy with other seman- Copyright c 2019, Association for the Advancement of Artificial tic relations based on a word embedding projection network Intelligence (www.aaai.org). All rights reserved. distantly trained over the taxonomy. To address supervised 7128 hypernymy detection tasks, the supervised model S-TEAL is To overcome the prototypical hypernym problem (Levy et presented, which leverages word embeddings trained from al. 2015), projection based approaches are proposed to learn Web corpus and human-labeled hypernymy training sets. how the representation of a term is mapped to that of its hy- Because adversarial training is beneficial for feature imita- pernym in the embedding space. The piecewise linear pro- tion by imposing implicit regularization, we further propose jection model (Fu et al. 2014) is a pioneer work in this field. a coupled adversarial training algorithm AS-TEAL to trans- This model is improved by Wang et al. (2016) by leverag- fer hierarchical knowledge in taxonomies to hypernymy pre- ing an iterative learning strategy and pattern-based valida- diction models. It employs two adversarial classifiers to en- tion techniques. The representations of hypernymy and non- able an imitation scheme through competition between the hypernymy can be learned jointly by transductive learning, S-TEAL model and a knowledge-rich taxonomy enhanced as shown in (Wang et al. 2017). By learning the represen- projection network, further increasing the performance. tations of both hypernymy and non-hypernymy relations, In experiments, we take Microsoft Concept Graph as it is easier to train a binary classification model to decide the taxonomy and evaluate TEAL over three standard NLP whether a term pair is hypernymy or non-hypernymy. tasks: unsupervised hypernymy classification (Kiela et al. Another research direction is hypernymy embedding 2015), supervised hypernymy detection (Luu et al. 2016) learning. Yu et al. (2015) introduce a supervised model to and graded lexical entailment (Vulic et al. 2017). Because learn term embeddings for hypernymy identification. Luu large taxonomies may not be available for lower-resourced et al. (2016) propose a dynamic weighting neural network languages, we also show how TEAL can be applied to non- based on Wikipedia data. OrderEmb (Vendrov et al. 2015) English languages with no large taxonomies available. Ad- models the partial ordering of terms in the hierarchy of hy- ditionally, we develop an application to detect missing hy- pernymy in WordNet. More recently, HyperVec (Nguyen et pernymy relations in the Microsoft Concept Graph. al. 2017) is proposed, which is suitable for discriminating hypernymy from other relations and distinguishing hyper- Related Work nyms and hyponyms in a hypernymy pair. In this section, we summarize the related work in two major Adversarial training. Adversarial learning is frequently aspects: hypernymy prediction and how adversarial learning applied in image generation (Goodfellow et al. 2014), se- can benefit related tasks. quence modeling (Yan et al. 2018; Xiao et al. 2018), etc. Hypernymy prediction. To discriminate hypernymy re- In NLP, adversarial learning makes less progress. Recently, lations from other semantic relations (e.g., co-hyponymy, several researchers aim at generating texts by neural net- meronymy, synonymy), both unsupervised and supervised works, e.g., SeqGAN (Yu et al. 2017). However, these ap- hypernymy detection tasks have been proposed by the NLP proaches can not be applied to our task because the goal is community. Methods to address the tasks can be divided into to predict relations between words, instead of sequence gen- two categories: unsupervised and supervised. eration. Some other works employ adversarial training in a Unsupervised approaches are based on hypernymy mea- multitask learning framework to improve the performance of sures, which model the degree of the existence of hyper- NLP tasks, such as text classification (Liu, Qiu, and Huang nymy within a term pair. In the literature, typical measures 2017), bilingual lexicon induction (Zhang et al. 2017), etc. are designed based on the distributional inclusion hypothe- In our work, we employ adversarial learning in a multitask sis (Zhitomirsky-Geffet and Dagan 2009; Lenci and Benotto learning objective using both training sets and existing tax- 2012). Recently, this hypothesis has been refined by the dis- onomies. The knowledge in both sources can be automati- tributional informativeness hypothesis (Santus et al. 2014) cally fused, without defining explicit fusion functions.