Mapping the Paraphrase Database to Wordnet

Mapping the Paraphrase Database to WordNet Anne Cocos⇤, Marianna Apidianaki⇤† and Chris Callison-Burch⇤ ⇤ Computer and Information Science Department, University of Pennsylvania LIMSI, CNRS, Universite´ Paris-Saclay, 91403 Orsay † acocos,marapi,ccb @seas.upenn.edu { } Abstract RULE-PRESCRIPT: imperative*, demand*, duty*, re- quest, gun, decree, ranking RULE-REGULATION: constraint*, limit*, derogation*, WordNet has facilitated important re- notion search in natural language processing but RULE-FORMULA: method*, standard*, plan*, proceed- its usefulness is somewhat limited by its ing RULE-LINGUISTIC RULE: notion relatively small lexical coverage. The Paraphrase Database (PPDB) covers 650 Table 1: Example of our model’s top-ranked para- times more words, but lacks the semantic phrases for three WordNet synsets for rule (n). structure of WordNet that would make it Starred paraphrases have a predicted likelihood of more directly useful for downstream tasks. attachment of at least 95%; others have predicted We present a method for mapping words likelihood of at least 50%. Bold text indicates from PPDB to WordNet synsets with 89% paraphrases that match the correct sense of rule. accuracy. The mapping also lays important groundwork for incorporating Word- Net’s relations into PPDB so as to increase mantic awareness into PPDB, either by clustering its utility for semantic reasoning in appli- its paraphrases by word sense (Apidianaki et al., cations. 2014; Cocos and Callison-Burch, 2016) or choos- ing appropriate PPDB paraphrases for a given con- 1 Introduction text (Apidianaki, 2016; Cocos et al., 2017). In WordNet (Miller, 1995; Fellbaum, 1998) is one of this work, we aim to marry the rich semantic the most important resources for natural language knowledge in WordNet with the massive scale of processing research. Despite its utility, Word- PPDB by predicting WordNet synset membership Net1 is manually compiled and therefore relatively for PPDB paraphrases that do not appear in Word- small. It contains roughly 155k words, which does Net. Our goal is to increase the lexical coverage not approach web scale, and very few informal of WordNet and incorporate some of the rich rela- or colloquial words, domain-specific terms, new tional information from WordNet into PPDB. Ta- word uses, or named entities. Researchers have ble 1 shows our model’s top-ranked outputs map- compiled several larger, automatically-generated ping PPDB paraphrases for the word arms onto thesaurus-like resources (Lin and Pantel, 2001; their corresponding WordNet synsets. Dolan and Brockett, 2005; Navigli and Ponzetto, Our overall objective in this work is to map 2012; Vila et al., 2015). One of these is the PPDB paraphrases for a target word to the Word- Paraphrase Database (PPDB) (Ganitkevitch et al., Net synsets of the target. This work has two 2013; Pavlick et al., 2015b). With over 100 million parts. In the first part (Section 4), we train paraphrase pairs, PPDB dwarfs WordNet in size and evaluate a binary lemma-synset member- but it lacks WordNet’s semantic structure. Para- ship classifier. The training and evaluation data phrases for a given word are indistinguishable by comes from lemma-synset pairs with known class sense, and PPDB’s only inherent semantic rela- (member/non-member) from WordNet. In the tional information is predicted entailment relations second part (Section 5), we predict membership between word types (Pavlick et al., 2015a). Sev- for lemma-synset pairs where the lemma appears eral earlier studies attempted to incorporate se- in PPDB, but not in WordNet, using the model 1In this work we refer specifically to WordNet version 3.0 trained in part one. 2 Related Work the paraphrase relationship between pairs of words connected in PPDB (Pavlick et al., 2015b). Scores There has been considerable research directed at range roughly from 0 to 5, with 5 indicating a expanding WordNet’s coverage either by integrat- strong paraphrase relationship. We compute sev- ing WordNet with additional semantic resources, eral features for predicting whether a word wp be- as in Navigli and Ponzetto (2012), or by au- longs to synset si as follows. We call the set of tomatically adding new words and senses. In p all lemmas belonging to si and any of its hyper- the second case, there have been several efforts p nym or hyponym synsets the extended synset s+i. specifically focused on hyponym/hypernym detec- p We calculate features that correspond to the aver- tion and attachment (Snow et al., 2006; Shwartz age and maximum PPDB scores bewteen wp and et al., 2016) There is also previous work aimed lemmas in s+i: at adding semantic structure to PPDB. Cocos and p Callison-Burch (2016) clustered paraphrases by xppdb.max = max PPDBScore(wp,w0) w s+i word sense, effectively forming synsets within 02 p PPDB. By mapping individual paraphrases to w s+i PPDBScore(wp,w0) WordNet synsets, our work could be used in co- 02 p xppdb.avg = +i P sp ordination with these previous results in order to | | extend WordNet relations to the automatically- Distributional Similarity Our distributional sim- induced PPDB sense clusters. ilarity feature encodes the extent to which the 3 WordNet and PPDB Structure word and lemmas from the synset tend to appear within similar contexts. Word embeddings are The core concept in WordNet is the synonym set, real-valued vector representations of words that or synset – a set of words meaning the same thing. capture contextual information from a large cor- Since words can be polysemous, a given lemma pus. Comparing the embeddings of two words is may belong to multiple synsets corresponding to a common method for estimating their semantic its different senses. WordNet also defines rela- similarity and relatedness. Embeddings can also tionships between synsets, such as hypernymy, hy- be constructed to represent word senses (Iacobacci ponymy, and meronymy. In the rest of the pa- et al., 2015; Flekova and Gurevych, 2016; Jauhar per, we will use S(wp) to denote the set of Word- et al., 2015; Ettinger et al., 2016). Camacho- Net synsets containing word wp, where the sub- Collados et al. (2016) developed compositional script p denotes the part of speech. Each synset vector representations of WordNet noun senses – i s S(wp) is a set containing wp as well as called NASARI embedded vectors – that are com- p 2 its synonyms for the corresponding sense. PPDB puted as the weighted average of the embeddings also has a graph structure, where nodes are words, for words in each synset. They share the same and edges connect mutual paraphrases. We will embedding space as a publicly available2 set of use PPDB(wp) to denote the set of PPDB para- 300-dimensional word2vec embeddings cover- phrases connected to target word wp. ing 300 million words (hereafter referred to as the word2vec embeddings) (Mikolov et al., 2013a,b). 4 Predicting Synset Membership We calculate a distributional similarity feature for Our objective is to map paraphrases for a target each word-synset pair by simply taking the co- word, t, to the WordNet synsets of the target. For sine similarity between the word’s word2vec vec- a given target word in a vocabulary, we make a tor and the synset’s NASARI vector: binary synset-attachment prediction between each i xdistrib = cos(vNASARI(sp),vword2vec(wp)) of t’s paraphrases, wp PPDB(t), and each of i 2 t’s synsets, s S(t). We predict the likelihood where vNASARI and vword2vec denote the target p 2 w si word and synset embeddings respectively. Since of a word p belonging to synset p on the basis NASARI covers only nouns, and only 80% of of multiple features describing their relationship. the noun synsets for our target vocabulary are in We construct features from four primary types of NASARI, we construct weighted vector representations for the remaining 20% of noun synsets and information. all non-noun synsets as follows. We take the vector representation for each synset not in NASARI PPDB 2.0 Score The PPDB 2.0 Score is a su- pervised metric trained to estimate the strength of 2https://code.google.com/archive/p/word2vec/ to be the weighted average of the word2vec em- AddCos (Melamud et al., 2015). The AddCos beddings of the synset’s lemmas, where weights method quantifies the fit of substitute word s for are determined by the PPDB2.0 Score between the lemma and the target word, if it exists, or 1.0 if it target word t in context C by measuring the se- does not: mantic similarity of the substitute to the target, and the similarity of the substitute to the context: l si PPDBScore(t, l) vword2vec(l) i 2 p · v(sp)= C cos(s, t)+ cos(s, c) i PPDBScore(t, l) c C P l sp AddCos(s, t, C)= | |· 2 2 2 C P ·|P| Lesk Similarity Among the information con- The vectors s and t are word embeddings of the tained in WordNet for each synset is its definition, substitute and target generated by the skip-gram or gloss. The simplified Lesk algorithm (Vasilescu with negative sampling model (Mikolov et al., et al., 2004) identifies the most likely sense of 2013b,a). The context C is the set of words ap- a target word in context by measuring the over- pearing within a fixed-width window of the tar- lap between the given context and the definition get t in a sentence (we use a window of 2), and of each target sense. We use a slightly modi- the embeddings c are context embeddings gener- fied version of the algorithm to compute features ated by skip-gram. In our implementation, we that measure the overlap between the PPDB para- train 300-dimensional word and context embed- phrases for the target and the gloss of a synset.

Mapping the Paraphrase Database to Wordnet

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support