Evaluating Wordnet Features in Text Classification Models

Evaluating WordNet Features in Text Classification Models Trevor Mansuy and Robert J. Hilderman Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 {mansuy1t, robert.hilderman}@uregina.ca Abstract of the category, and the classification accuracy to be poor. Similarly, a training set may not necessarily be sparse, but it Incorporating semantic features from the WordNet lexical can contain important word relationships that a simple vec- database is among one of the many approaches that have been tried to improve the predictive performance of text classifica- tor of words is not capable of modeling. For example, con- tion models. The intuition behind this is that keywords in sider two documents, one discussing the concept flax and the the training set alone may not be extensive enough to enable other wheat. A reasonable person would most likely clas- generation of a universal model for a category, but if we in- sify both documents as grain-related or agriculture-related corporate the word relationships in WordNet, a more accu- because he/she recognizes the relationship between flax and rate model may be possible. Other researchers have previ- wheat. A simple word vector may not be sufficient for cap- ously evaluated the effectiveness of incorporating WordNet turing this relationship. synonyms, hypernyms, and hyponyms into text classification models. Generally, they have found that improvements in In an attempt to address the issue of related concepts accuracy using features derived from these relationships are in text classification models, a number of researchers have dependent upon the nature of the text corpora from which previously incorporated features derived from word rela- the document collections are extracted. In this paper, we not tionships in the WordNet lexical database (de Buenaga Ro- only reconsider the role of WordNet synonyms, hypernyms, driguez, Gomez-Hidalgo, & Diaz-Agudo 1997), (Scott & and hyponyms in text classification models, we also consider Matwin 1998), (Jensen & Martinez 2000), (Kehagias et al. the role of WordNet meronyms and holonyms. Incorporating 2003), (Hotho & Bloehdorn 2004), (Rosso et al. 2004), these WordNet relationships into a Coordinate Matching clas- (Peng & Choi 2005). WordNet is a database of words con- sifier, a Naive Bayes classifier, and a Support Vector Machine taining a semantic lexicon for the English language that or- classifier, we evaluate our approach on six document collec- ganizes words into groups called synsets (i.e., synonym sets) tions extracted from the Reuters-21578, USENET, and Digi- Trad text corpora. Experimental results show that none of the (Miller 1995). A synset is a collection of synonymouswords WordNet relationships were effective at increasing the accu- linked to other synsets according to a number of different racy of the Naive Bayes classifier. Synonyms, hypernyms, possible relationships between the synsets (e.g., is-a, has-a, and holonyms were effective at increasing the accuracy of the is-part-of, and others). When building a category model for Coordinate Matching classifier, and hypernyms were effec- a document, words related to a feature already in the model tive at increasing the accuracy of the SVM classifier. (and satisfying some desired WordNet relationship) are extracted from the WordNet database and incorporated into the Introduction model. The intuition is that this expanded representation has greater potential to assign semantically similar documents to Supervised text classification, the task of assigning prede- the same class. fined category labels to previously unseen documents based As far as the authors of this paper know, in the area of on learned models, has been the focus of a considerable text classification approaches incorporating WordNet fea- amount of previous and recent research (de Buenaga Ro- tures, previously studied relationships include synonyms (de driguez, Gomez-Hidalgo, & Diaz-Agudo 1997), (Scott & Buenaga Rodriguez, Gomez-Hidalgo, & Diaz-Agudo 1997), Matwin 1998), (Jensen & Martinez 2000), (Kehagias et al. (Scott & Matwin 1998), (Jensen & Martinez 2000), (Ke- 2003), (Hotho & Bloehdorn 2004), (Rosso et al. 2004), hagias et al. 2003), (Hotho & Bloehdorn 2004), (Rosso (Peng & Choi 2005). When performing text classification, et al. 2004), (Peng & Choi 2005), hypernyms (Scott & the classification accuracy we observe on the previously un- Matwin 1998), (Jensen & Martinez 2000), (Hotho & Bloe- seen documents largely depends on the quality of the train- hdorn 2004), (Peng & Choi 2005), and hyponyms (Peng & ing set we have used to build the category models. That is, if Choi 2005). In this paper, we extend the use of WordNet training information for a category model is sparse, then we relationships in text classification models by considering the can expect the category model to be a poor representation role of WordNet meronyms and holonyms. Meronyms and Copyright c 2006, American Association for Artificial Intelli- holonyms are compositional relationships. A concept is a gence (www.aaai.org). All rights reserved. meronym of another if it is a component of that other con- 568 Table 1: Characteristics of previous WordNet classification approaches WordNet Relationships Word Sense Disambiguation Model Type Author Synonyms Hypernyms Hyponyms Manual All Most Likely Context Word Synset (de Buenaga Rodriguez, Gomez-Hidalgo, & Diaz-Agudo 1997) • • • (Scott & Matwin 1998) • • • • • (Jensen & Martinez 2000) • • • • (Kehagias et al. 2003) • • • (Hotho & Bloehdorn 2004) • • • • • • (Rosso et al. 2004) • • • (Peng & Choi 2005) • • • • • cept. Conversely, a concept is a holonym of another if it has for evaluation of their approach, results were mixed, show- that other concept as a component. In particular, we incor- ing both statistically significant increases and decreases on porate words derived from these two WordNet relationships various document collections. into a Coordinate Matching, a Naive Bayes, and a Support A similar approach incorporating both synonyms and hy- Vector Machine classifier to determine whether these “nar- pernyms is proposed in (Jensen & Martinez 2000). Noting rowing” and “broadening” relationships result in increased that words in a synset are organized in occurrence frequency accuracy. We then apply the algorithms to six document col- order, in their approach to word sense disambiguation, they lections extracted from the Reuters-21578 (Hettich & Bay only select the most likely sense for incorporation into the 1999), USENET, and DigiTrad (Digital 2002 ) text corpora category model. Coordinate Matching, TF*IDF, and Naive to determine the effectiveness of these relationships in in- Bayes classification algorithms were used to evaluate their creasing the accuracy of the text classification algorithms. approach, where different combinations of synonyms, hypernyms, and bigrams were incorporated into the category Related Work models. They found that incorporating hypernyms into category models is almost always appropriate. WordNet has been applied to a variety of problems in ma- The work described in (Kehagias et al. 2003) evaluates chine learning, natural language processing, information re- the merits of modeling senses as features rather than words. trieval, and artificial intelligence (WordNet 2005 ). In this The Brown Semantic Corpus, a document collection whose section, we discuss a number of relevant contributions that words have been tagged with the correct word sense, is used describe approaches to incorporating WordNet semantic fea- such that only synsets corresponding to the features found tures into text classifiers. Characteristics of these approaches in the document are incorporated into the category model. are summarized in Table 1 and described in the text that fol- Consequently, hypernyms are not incorporated in this ap- lows. proach. Of course, word sense disambiguation is not nec- One of the first efforts toward the integration of WordNet essary since the document collection has previously been features into a text classifier is described in (de Buenaga Ro- tagged with the correctsense. They used MAP, Naive Bayes, driguez, Gomez-Hidalgo, & Diaz-Agudo 1997). Here it is and kNN classifiers to evaluate their approach. An increase proposed that accuracy may be increased if the category in accuracy was obtained on most document collections con- model for a document is expanded by incorporating Word- sidered. However, the increases were small, leading the au- Net synonyms of the category label. In this work, since thors to conclude that the benefits from using their approach the number of features actually incorporated by WordNet are marginal. expansion was small, manual word sense disambiguation The application of WordNet as an ontology in text classi- was used to determine the correct word sense. To evaluate fication problems is explored in (Hotho & Bloehdorn 2004). their approach, Rocchio and Widrow-Hoff classification al- Their approach incorporates both synonyms and hypernyms gorithms were used. It was found that accuracy, in general, in the category model. Three different word sense disam- was increased by incorporatingsynonyms, and, in particular, biguation strategies are studied in their approach. These in- was increased when the number of categories in the training clude strategies incorporating all senses and the most likely documents was sparse. sense. A third strategy, context, measures the degree of over- In (Scott & Matwin 1998), an approach is described lap of different WordNet features

Evaluating Wordnet Features in Text Classification Models

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support