<<

Mapping the Paraphrase Database to WordNet

Anne Cocos∗, Marianna Apidianaki∗♠ and Chris Callison-Burch∗ ∗ Computer and Information Science Department, University of Pennsylvania ♠ LIMSI, CNRS, Universite´ Paris-Saclay, 91403 Orsay acocos,marapi,ccb @seas.upenn.edu { }

Abstract RULE-PRESCRIPT: imperative*, demand*, duty*, re- quest, gun, decree, ranking RULE-REGULATION: constraint*, limit*, derogation*, WordNet has facilitated important re- notion search in natural language processing but RULE-FORMULA: method*, standard*, plan*, proceed- its usefulness is somewhat limited by its ing RULE-LINGUISTICRULE: notion relatively small lexical coverage. The Paraphrase Database (PPDB) covers 650 Table 1: Example of our model’s top-ranked para- times more , but lacks the semantic phrases for three WordNet synsets for rule (n). structure of WordNet that would make it Starred paraphrases have a predicted likelihood of more directly useful for downstream tasks. attachment of at least 95%; others have predicted We present a method for mapping words likelihood of at least 50%. Bold text indicates from PPDB to WordNet synsets with 89% paraphrases that match the correct sense of rule. accuracy. The mapping also lays impor- tant groundwork for incorporating - Net’s relations into PPDB so as to increase mantic awareness into PPDB, either by clustering its utility for semantic reasoning in appli- its paraphrases by word sense (Apidianaki et al., cations. 2014; Cocos and Callison-Burch, 2016) or choos- ing appropriate PPDB paraphrases for a given con- 1 Introduction text (Apidianaki, 2016; Cocos et al., 2017). In WordNet (Miller, 1995; Fellbaum, 1998) is one of this work, we aim to marry the rich semantic the most important resources for natural language knowledge in WordNet with the massive scale of processing research. Despite its utility, Word- PPDB by predicting WordNet synset membership Net1 is manually compiled and therefore relatively for PPDB paraphrases that do not appear in Word- small. It contains roughly 155k words, which does Net. Our goal is to increase the lexical coverage not approach web scale, and very few informal of WordNet and incorporate some of the rich rela- or colloquial words, domain-specific terms, new tional information from WordNet into PPDB. Ta- word uses, or named entities. Researchers have ble1 shows our model’s top-ranked outputs map- compiled several larger, automatically-generated ping PPDB paraphrases for the word rule onto -like resources (Lin and Pantel, 2001; their corresponding WordNet synsets. Dolan and Brockett, 2005; Navigli and Ponzetto, Our overall objective in this work is to map 2012; Vila et al., 2015). One of these is the PPDB paraphrases for a target word to the Word- Paraphrase Database (PPDB) (Ganitkevitch et al., Net synsets of the target. This work has two 2013; Pavlick et al., 2015b). With over 100 million parts. In the first part (Section4), we train paraphrase pairs, PPDB dwarfs WordNet in size and evaluate a binary lemma-synset member- but it lacks WordNet’s semantic structure. Para- ship classifier. The training and evaluation data phrases for a given word are indistinguishable by comes from lemma-synset pairs with known class sense, and PPDB’s only inherent semantic rela- (member/non-member) from WordNet. In the tional information is predicted entailment relations second part (Section5), we predict membership between word types (Pavlick et al., 2015a). Sev- for lemma-synset pairs where the lemma appears eral earlier studies attempted to incorporate se- in PPDB, but not in WordNet, using the model 1In this work we refer specifically to WordNet version 3.0 trained in part one.

84 Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), pages 84–90, Vancouver, Canada, August 3-4, 2017. c 2017 Association for Computational Linguistics

2 Related Work the paraphrase relationship between pairs of words connected in PPDB (Pavlick et al., 2015b). Scores There has been considerable research directed at range roughly from 0 to 5, with 5 indicating a expanding WordNet’s coverage either by integrat- strong paraphrase relationship. We compute sev- ing WordNet with additional semantic resources, eral features for predicting whether a word wp be- as in Navigli and Ponzetto(2012), or by automat- longs to synset si as follows. We call the set of ically adding new words and senses. In the sec- p all lemmas belonging to si and any of its hyper- ond case, there have been several efforts specif- p nym or hyponym synsets the extended synset s+i. ically focused on hyponym/hypernym detection p We calculate features that correspond to the aver- and attachment (Snow et al., 2006; Shwartz et al., age and maximum PPDB scores bewteen wp and 2016). There is also previous work aimed at lemmas in s+i: adding semantic structure to PPDB. Cocos and p Callison-Burch(2016) clustered paraphrases by xppdb.max = max P P DBScore(wp, w0) w s+i word sense, effectively forming synsets within 0∈ p PPDB. By mapping individual paraphrases to w s+i P P DBScore(wp, w0) WordNet synsets, our work could be used in co- 0∈ p xppdb.avg = +i P sp ordination with these previous results in order to | | extend WordNet relations to the automatically- Distributional Similarity Our distributional sim- induced PPDB sense clusters. ilarity feature encodes the extent to which the 3 WordNet and PPDB Structure word and lemmas from the synset tend to appear within similar contexts. Word embeddings are The core concept in WordNet is the synonym set, real-valued vector representations of words that or synset – a set of words meaning the same thing. capture contextual information from a large cor- Since words can be polysemous, a given lemma pus. Comparing the embeddings of two words is may belong to multiple synsets corresponding to a common method for estimating their semantic its different senses. WordNet also defines rela- similarity and relatedness. Embeddings can also tionships between synsets, such as hypernymy, hy- be constructed to represent word senses (Iacobacci ponymy, and meronymy. In the rest of the pa- et al., 2015; Flekova and Gurevych, 2016; Jauhar per, we will use S(wp) to denote the set of Word- et al., 2015; Ettinger et al., 2016). Camacho- Net synsets containing word wp, where the sub- Collados et al.(2016) developed compositional script p denotes the part of speech. Each synset vector representations of WordNet senses – i s S(wp) is a set containing wp as well as called NASARI embedded vectors – that are com- p ∈ its synonyms for the corresponding sense. PPDB puted as the weighted average of the embeddings also has a graph structure, where nodes are words, for words in each synset. They share the same and edges connect mutual paraphrases. We will embedding space as a publicly available2 set of use PPDB(wp) to denote the set of PPDB para- 300-dimensional word2vec embeddings cover- phrases connected to target word wp. ing 300 million words (hereafter referred to as the word2vec embeddings) (Mikolov et al., 2013a,b). 4 Predicting Synset Membership We calculate a distributional similarity feature for Our objective is to map paraphrases for a target each word-synset pair by simply taking the co- word, t, to the WordNet synsets of the target. For sine similarity between the word’s word2vec vec- a given target word in a , we make a tor and the synset’s NASARI vector: binary synset-attachment prediction between each i xdistrib = cos(vNASARI (sp), vword2vec(wp)) of t’s paraphrases, wp PPDB(t), and each of i ∈ t’s synsets, sp S(t). We predict the likelihood where vNASARI and vword2vec denote the target ∈ i of a word wp belonging to synset sp on the basis word and synset embeddings respectively. Since of multiple features describing their relationship. NASARI covers only , and only 80% of We construct features from four primary types of the noun synsets for our target vocabulary are in information. NASARI, we construct weighted vector represen- tations for the remaining 20% of noun synsets and PPDB 2.0 Score The PPDB 2.0 Score is a su- pervised metric trained to estimate the strength of 2https://code.google.com/archive/p/word2vec/

85 all non-noun synsets as follows. We take the vec- feature we measure lexical substitutability using tor representation for each synset not in NASARI a simple but high-performing vector space model, to be the weighted average of the word2vec em- AddCos (Melamud et al., 2015). The AddCos beddings of the synset’s lemmas, where weights method quantifies the fit of substitute word s for are determined by the PPDB2.0 Score between the target word t in context C by measuring the se- lemma and the target word, if it exists, or 1.0 if it mantic similarity of the substitute to the target, and does not: the similarity of the substitute to the context: P P l si P P DBScore(t, l) vword2vec(l) C cos(s, t) + cos(s, c) i ∈ p · c C v(sp) = P AddCos(s, t, C) = | |· ∈ l si P P DBScore(t, l) 2 C ∈ p · | | The vectors s and t are word embeddings of the Among the information con- Lesk Similarity substitute and target generated by the skip-gram tained in WordNet for each synset is its definition, with negative sampling model (Mikolov et al., or gloss. The simplified Lesk algorithm (Vasilescu 2013b,a). The context C is the set of words ap- et al., 2004) identifies the most likely sense of pearing within a fixed-width window of the tar- a target word in context by measuring the over- get t in a sentence (we use a window of 2), and lap between the given context and the definition the embeddings c are context embeddings gener- of each target sense. We use a slightly modi- ated by skip-gram. In our implementation, we fied version of the algorithm to compute features train 300-dimensional word and context embed- that measure the overlap between the PPDB para- dings over the 4B words in the Annotated Giga- phrases for the target and the gloss of a synset. word (AGiga) corpus (Napoles et al., 2012) us- For calculating these Lesk-based features, we find ing the gensim word2vec package (Mikolov et al., synset glosses from WordNet 3.0 and from Babel- 2013b,a; Rehˇ u˚rekˇ and Sojka, 2010). 3 Net v3.0 (Navigli and Ponzetto, 2012). First, we To compute the lexical substitutability score be- find D, the set of content words of the gloss for i tween a word wp and synset s , we first retrieve synset si , by taking all nouns, , adjectives, p p example sentences e E containing t in sense si and adverbs that appear within the gloss. In cases ∈ p from BabelNet v3.0 (Navigli and Ponzetto, 2012). where more than one gloss is available, we take Then, for each example e, we compute the Ad- D to be the set of all content words in all glosses. dCos lexical substitutability between wp and the We also calculate an extended version of each fea- target word in context Ce. We compute two types ture, in which we take D to be the set of content of this feature: The average AddCos score over all words, plus the PPDB paraphrases for each con- synset examples, and the maximum AddCos score tent word. Next, we calculate features that mea- over all synset examples. sure the relationship between the paraphrase wp and the words in D in terms of PPDB2.0 Scores. xaddcos.max = maxe EAddCos(wp, t, Ce) ∈ These features include the maximum PPDB score xaddcos.avg = avge EAddCos(wp, t, Ce) between the paraphrase and any word in D, the ∈ average score over all words in D, the percent of Derived Features For each paraphrase, we also words in D that are connected to the paraphrase in compute a set of derived features using the soft- PPDB, and the count of words in D that are con- max and logodds functions over all synsets with nected to the paraphrase in PPDB: which that paraphrase is paired. This is to en- code the relative strength of association with each xlesk.max = max P P DBScore(wp, d) d D ∈ synset as compared to the others. P d D P P DBScore(wp, d) For a given feature x calculated between xlesk.avg = ∈ ∗ D w si | | lemma p and synset p, the derived versions of xlesk.cnt = d D : P P DBScore(wp, d) > 0 the feature are calculated as: |{ ∈ }| d D : P P DBScore(w , d) > 0 p i xlesk.pct = |{ ∈ }| x (wp,sp) D softmax i e ∗ | | x (wp, sp) = j ∗ P x (wp,sp) sj S(w ) e ∗ Lexical Substitutability The fourth feature type p∈ p 3 word2vec that we compute to predict whether word wp be- The training parameters we use are a context i window of size 3, learning rate alpha from 0.025 to 0.0001, s 4 longs to synset p is based on the substitutability of minimum word count 100, sampling parameter 1e− , 10 neg- i wp for instances of sp in context. To compute this ative samples per target word, and 5 training epochs.

86 Cross-Validation Cross-Validation-LexSplit Test Prec. Rec. F1 Acc. Prec. Rec. F1 Acc. Prec. Rec. F1 Acc. Baseline: All negative attachments 0 0.854 0 0.854 0 0.854 0 0.854 0 0.858 0 0.858 Baseline: PPDB2.0 Score Match 0.536 0.419 0.471 0.862 0.536 0.419 0.471 0.862 0.268 0.718 0.390 0.681 Gaussian Naive Bayes (All Feat.) 0.528 0.369 0.372 0.800 0.527 0.310 0.352 0.825 0.496 0.282 0.359 0.858 Gaussian Naive Bayes (Sel. Feat.) 0.605 0.600 0.600 0.883 0.606 0.572 0.581 0.882 0.622 0.558 0.588 0.889

Table 2: Precision, recall, F1, and accuracy results over the training set (normal 10-fold Cross-Validation, and lexical split 20-fold Cross-Validation-LexSplit) and test set for predicting paraphrase-synset attach- ment.

GNB is advantageous for our setting, as 10% per-

i cent of our instances have missing data (e.g. in logodds i x (wp, sp) + α x (wp, sp) = ln ∗ the case where a synset does not have an ex- ∗ P j sj S(w ),i=j (x (wp, sp) + α) p∈ p 6 ∗ ample). For feature selection, we use two ver- sions of cross-validation. The first is standard 10- Model Training We train a binary classification fold Cross-Validation. In order to estimate how model that takes lemma-synset pairs as input, and well our model will generalize to unseen lem- predicts whether the lemma belongs in the synset. mas, we also experiment with a lexical split tech- We train the model by generating features for a set nique described in Roller and Erk(2016)( Cross- of lemma-synset pairs from WordNet for which Validation-LexSplit). This method ensures that for we know the correct classification. We evalu- each cross-validation fold, none of the lemmas in ate whether the resulting model correctly finds that fold’s validation lemma-synset pairs are seen lemma-synset pairs that belong together. in the training split. Specifically, for each split we Our target vocabulary comes from the SensE- randomly select 5% of training pairs for validation val3 English Lexical Sample Task data (Mihal- and take the remainder of the training set that does cea et al., 2004) which contains sentences corre- not share a lemma with the validation set as that sponding to 57 noun, , and adjective lemmas. fold’s training instances. As a result, the valida- Each sentence may contain a different form of the tion set size remains constant for each fold, but lemma (i.e. different in number or tense), and the training set sizes may vary between folds. PPDB paraphrases vary depending on the form. We train two versions of the model. The first So we take the set of all forms of all lemmas (251 (All Features) uses all computed features. The sec- word types in total) as our target vocabulary. To ond (Selected Features) includes features selected generate pairs for training and evaluation, for each using cross validation (the selected features were of the 251 targets wp, we find the lemmas in the in- the same using standard and lexical split cross- tersection of wp’s synsets – S(wp) – and its para- validation). We select one feature of each type phrases – PPDB(wp). We call the set of lem- (PPDB Score, distributional, Lesk, and lexical mas in the intersection L(wp). Then, we take the substitutability) whose combination maximizes lemma-synset pairs in L(w ) S(w ) as instances p × p cross-validation F1 score. The selected features for training and evaluation. The total number of are xlesk.cnt (non-extended), xdistrib, xaddcos.max, resulting lemma-synset pairs is 7459. We ran- and softmax(xppdb.max). domly divide these into 80% training and 20% test pairs. Model Evaluation We report results of the model We then generate all variations of each of the using all features, and the results of the best model four feature types for the lemma-synset pairs in achieved after feature selection (Table2). In our training and test sets. In the case of positive each case we give both the Cross-Validation and synset-lemma pairs – i.e. those pairs for which Cross-Validation-LexSplit performances, and per- the lemma actually belongs to the WordNet synset formance on the held-out test set. We compare – we exclude the lemma from the synset before our model to two simple baselines. The first pre- calculating the PPDB Score and distributional fea- dicts all negative attachments, which yields an ac- tures. curacy of 85.8% on the test set (with F1 of 0). Finally, we train a Gaussian Naive Bayes The second baseline maps each paraphrase to the (GNB) classification model over the training data. synset of t with which it has the highest-scoring

87 Precision-Recall Curve for Lemma-Synset Attachment long to any synset of the target or any of their di- 1.0 rect hypernyms or hyponyms. We then make an at- 0.8 tachment prediction between each remaining para- phrase and each of the target’s WordNet synsets. 0.6 In total, we make 160,813 unique paraphrase- Precision 0.4 synset attachment predictions for the 4821 unique

Cross-Validation Precision-Recall: AUC=0.61 0.2 paraphrase lemmas and 458 unique synsets asso- Cross-Validation-LexSplit Precision-Recall: AUC=0.61 Test Precision-Recall: AUC=0.58 ciated with the targets in our dataset. 0.0 0.0 0.2 0.4 0.6 0.8 1.0 When we map PPDB to WordNet we can esti- Recall mate the expected precision and recall of attach- Figure 1: Precision-Recall curve for our ment decisions based on the results of our model paraphrase-synset attachment classifier. evaluation on the test set. If we would like to em- phasize precision over recall in the predicted at- PPDB feature (xppdb.max) and yields an accuracy of 68.1% on the test set. In comparison, our GNB tachments, we can adjust a threshold for attach- model with selected features yields an accuracy ment corresponding to the predicted likelihood of of over 88% on the cross-validation and test sets. our model (as shown in Figure1). At a threshold Both cross-validation and test accuracies are sig- of 50% predicted likelihood, our classifier predicts nificantly higher than baselines (based on McNe- attachment for 7032 (4.4%) of the paraphrase- mar’s test, p < .001). synset pairs with an estimated precision of 62.2%. If we increase the threshold to 95% predicted like- Ablated Feature Type Change in Cross-Val. F1 lihood, the number of predicted attachments is PPDB 0.042 3690 (2.3%) with an estimated precision of 66.3%. Lexical Substitution 0.031 With the publication of this paper we release our Distributional 0.009 PPDB to WordNet mapping results. Lesk -0.092 6 Conclusion Table 3: Absolute decrease in mean cross- validation F1 with different feature types ablated. We have proposed a method for mapping PPDB Higher numbers indicate greater feature impor- paraphrases to WordNet synsets. Our classifier tance. makes accurate paraphrase-synset attachment pre- dictions using features that capture paraphrase and In order to interpret the importance of each feature distributional similarity, and the substitutability of type, we also run an ablation experiment where paraphrases and synsets in context. The results we train our GNB model with all features except show that the classifier can successfully add new those from a particular type (Table3). We find that PPDB paraphrases to WordNet synsets and in- removing the PPDB features leads to the greatest crease their coverage. drop in cross-validation F1, indicating that these are the most important for our classifier. Ablating Acknowledgments all Lesk features improved F1, but on further anal- This research is supported in part by DARPA ysis we found that ablating only the derived Lesk grant FA8750-13-2-0017 (the DEFT program). log-odds features led to a decrease in F1. This The U.S. Government is authorized to reproduce suggests that the Lesk features in general are use- and distribute reprints for Governmental purposes. ful for classification, but the derived Lesk log-odds The views and conclusions contained in this pub- features are not. lication are those of the authors and should not be 5 Mapping PPDB to WordNet interpreted as representing official policies or en- dorsements of DARPA and the U.S. Government. Using our trained lemma-synset attachment clas- This work is also supported by the French Na- sifier, we can now augment the lexical coverage of tional Research Agency under project ANR-16- WordNet with PPDB paraphrases. For the 251 tar- CE33-0013. gets in our original dataset, we retrieve the PPDB We would like to thank our anonymous review- paraphrases (with PPDB score greater than 2.5, to ers for their thoughtful and helpful comments. ensure high-quality paraphrases) that do not be-

88 References Dekang Lin and Patrick Pantel. 2001. Dirt@ sbt@ discovery of inference rules from text. In Proceed- Marianna Apidianaki. 2016. Vector-space models for ings of the seventh ACM SIGKDD international con- ppdb paraphrase ranking in context. In Proceedings ference on Knowledge discovery and data mining. of EMNLP. Austin, Texas, pages 2028–2034. ACM, pages 323–328.

Marianna Apidianaki, Emilia Verzeni, and Diana Mc- Oren Melamud, Omer Levy, and Ido Dagan. 2015. A Carthy. 2014. Semantic Clustering of Pivot Para- Simple Word Embedding Model for Lexical Substi- phrases. In Proceedings of LREC. Reykjavik, Ice- tution. In Proceedings of the 1st Workshop on Vector land, pages 4270–4275. Space Modeling for Natural Language Processing. Denver, Colorado, pages 1–7. Jose´ Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Nasari: Integrating ex- Rada Mihalcea, Timothy Chklovski, and Adam Kilgar- plicit knowledge and corpus statistics for a multilin- riff. 2004. The Senseval-3 English lexical sample gual representation of concepts and entities. Artifi- task. In Rada Mihalcea and Phil Edmonds, edi- cial Intelligence 240:36–64. tors, Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis Anne Cocos, Marianna Apidianaki, and Chris Callison- of Text. Barcelona, Spain, pages 25–28. Burch. 2017. Word sense filtering improves embedding-based lexical substitution. In Proceed- Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- ings of the 1st Workshop on Sense, Concept and En- frey Dean. 2013a. Efficient estimation of word tity Representations and their Applications. pages representations in vector space. arXiv preprint 110–119. arXiv:1301.3781 .

Anne Cocos and Chris Callison-Burch. 2016. Cluster- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor- ing Paraphrases by Word Sense. In Proceedings of rado, and Jeffrey Dean. 2013b. Distributed repre- NAACL/HLT. San Diego, California, pages 1463– sentations of words and phrases and their composi- 1472. tionality. In Proceedings of NIPS. pages 3111–3119.

William B. Dolan and Chris Brockett. 2005. Automati- George A. Miller. 1995. Wordnet: a lexical cally constructing a corpus of sentential paraphrases. database for english. Communications of the ACM In Proceedings of the Third International Workshop 31(11):39–41. on Paraphrasing. Jeju Island, Korea, pages 9–16. Courtney Napoles, Matthew Gormley, and Benjamin Van Durme. 2012. Annotated gigaword. In Pro- Allyson Ettinger, Philip Resnik, and Marine Carpuat. ceedings of the Joint Workshop on Automatic Knowl- 2016. Retrofitting sense-specific word vectors using edge Base Construction and Web-scale Knowledge parallel text. In Proceedings of NAACL-HLT. San Extraction (AKBC-WEKEX). Montreal,´ Canada, Diego, California, pages 1378–1383. pages 95–100.

Christiane Fellbaum. 1998. WordNet. Wiley Online Roberto Navigli and Simone Paolo Ponzetto. 2012. Library. Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual se- Lucie Flekova and Iryna Gurevych. 2016. Supersense mantic network. Artificial Intelligence 193:217– Embeddings: A Unified Model for Supersense Inter- 250. pretation, Prediction, and Utilization. In Proceed- ings of ACL. Berlin, Germany, pages 2029–2041. Ellie Pavlick, Johan Bos, Malvina Nissim, Charley Beller, Benjamin Van, and Durme Chris Callison- Juri Ganitkevitch, Benjamin Van Durme, and Chris burch. 2015a. Adding semantics to data-driven para- Callison-Burch. 2013. PPDB: The Paraphrase phrasing. In Proceedings of ACL. Beijing, China, Database. In Proceedings of NAACL/HLT. Atlanta, pages 425–430. Georgia, pages 758–764. Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevich, Ignacio Iacobacci, Mohammed Taher Pilehvar, and and Chris Callison-Burch Ben Van Durme. 2015b. Roberto Navigli. 2015. SensEmbed: Learning PPDB 2.0: Better paraphrase ranking, fine-grained Sense Embeddings for Word and Relational Similar- entailment relations, word embeddings, and style ity. In Proceedings of ACL/IJCNL. Beijing, China, classification. In Proceedings of ACL. Beijing, pages 95–105. China, pages 425–430.

Sujay Kumar Jauhar, Chris Dyer, and Eduard Hovy. Radim Rehˇ u˚rekˇ and Petr Sojka. 2010. Software Frame- 2015. Ontologically Grounded Multi-sense Repre- work for Topic Modelling with Large Corpora. In sentation Learning for Semantic Vector Space Mod- Proceedings of the LREC 2010 Workshop on New els. In Proceedings of NAACL. Denver, Colorado, Challenges for NLP Frameworks. ELRA, Valletta, pages 683–693. Malta, pages 45–50.

89 Stephen Roller and Katrin Erk. 2016. Relations such as hypernymy: Identifying and exploiting hearst pat- terns in distributional vectors for lexical entailment. In Proceedings of EMNLP. pages 2163–2172. Vered Shwartz, Yoav Goldberg, and Ido Dagan. 2016. Improving hypernymy detection with an integrated path-based and distributional method. In Proceed- ings of ACL. Berlin, Germany, pages 2389–2398. Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2006. Semantic taxonomy induction from heterogenous evidence. In Proceedings of ACL. pages 801–808. Florentina Vasilescu, Philippe Langlais, and Guy La- palme. 2004. Evaluating Variants of the Lesk Ap- proach for Disambiguating Words. In Proceedings of LREC. Lisbonne, Portugal, pages 633–636. Marta Vila, Horacio Rodr´ıguez, and M Antonia` Mart´ı. 2015. Relational paraphrase acquisition from wikipedia: The WRPA method and corpus. Natu- ral Language Engineering 21(03):355–389.

90